2019-04-15

#accessibility

Gustav Wengel

179

Apr 15, 2019 ⋅ 6 min read

Programming by voice in 2019

Gustav Wengel Tech Lead at Bambuu. Enthusiastic about coding and dancing, though combining my interests poses a bit of a logistical challenge.

Some software developers may not have the full use of their hands, whether due to a condition like carpal tunnel syndrome or another reason. For these developers, using a keyboard and mouse may be difficult or even impossible. I’ve written about where the state of coding by voice was at in 2017, and unfortunately, most of the things in the article haven’t changed. Voice coding is not a fast-moving field.

If this is your first article looking at voice coding, I’d advise you to read that article, as it covers many of the fundamental technologies and aspects of the field.

Even though things move slowly, they’re not at a standstill. Let’s take a look at some of the highlights from the last two years, and what the future might bring.

The technology is improving

Speech recognition accuracy

There are some fundamental requirements to control your computer via your voice.

One of these requirements is that your computer can understand you. Previously it’s been difficult to get speech accuracy to a level where it didn’t hinder the experience of voice coding. If that hasn’t changed already, it’s about to.

As digital assistants you can speak to (like Siri, Cortana, Alexa and Google Assistant) gain popularity, we’re going to keep seeing improvements in the accuracy of which computers can recognize our voices, even in noisy environments.

To get an accuracy that was good enough, voice coding has traditionally needed pretty expensive microphones. My recommendation has been a $300 Audix OM7 stage microphone — but it’s large, unwieldy, and requires an amplifier.

Digital assistants use a different strategy to reduce noise. They use multiple microphones, and some hefty machine learning, to distinguish noise from voice.

Currently, I’m not aware of any headsets you can buy with multiple microphones, but it won’t be long before we see some inexpensive headsets that let us dictate with a high accuracy in any environment. If you’re looking for something now, apart from the Audix, I’ve been recommended the d:fine headset and the Steno SR microphone.

This technology is evolving fast, and I predict that within five years, we’ll end up with what is essentially close-to perfect speech recognition.

Fundamental software improvements

Today, most of the voice coding software available relies on the Dragon voice-recognition software from Nuance, and the open-source Dragonfly framework.

Dragonfly was previously unmaintained (the last commit was three years ago) — but currently a fork is picking up steam, and has been actively maintained for quite a while.

Dragonfly used to only work with older versions of Dragon, but due to some other recent developments, it now works with the newest versions.
That means that we get all of the previously mentioned improvements in speech recognition accuracy.

These updates might not sound particularly interesting, but the reality is that almost all voice-coding technologies I’ve been able to find are built upon Dragon and Dragonfly. Any improvements to those libraries can potentially ripple through the entire ecosystem, providing everyone with a better experience.

A challenger appears — an introduction to Talon

Talon is currently the most promising project for hands-free coding.
It allows you to control your computer in a variety of ways. Talon’s out-of-the-box voice coding is very reminiscent of how Tavis Rudd does it.

Lots of short words map to letters and syntax, which provides good efficiency, but can be hard to learn. Talon also comes with the ability to define your own rules (when I say “X” do “Y”).

If that’s all it was, it would be much like using many of the Dragonfly-based solutions that are already out there. Talon goes beyond that. For one, it allows you to control your mouse via eye-tracking. While normally eye-tracking has issues replacing the mouse, as eyes have a hard time focusing on small objects, such as links or tiny buttons.

Over 200k developers use LogRocket to create better digital experiences

Learn more →

Talon fixes this by combining the eye-tracking mouse with a voice-controlled zoom, or head tracking for precision movements. You can see the zoom here.

The later releases also come with a built-in speech-recognition engine, so you don’t have to shell out the big bucks for a Dragon license. Ryan Hileman, who is the creator, is working on Talon full-time.

Like anything, Talon has downsides. It’s currently MacOS only, and it’s not as well-documented as I’d like. It does, however, come with a very active Slack community where you can ask questions.

Right now, Talon is definitely the project to watch going forward. You can see it in action here.

What about the rest of my tasks?

Inputting text into an editor is only a small part of coding. We also need to search the web, send mail, open programs and control terminals. These things aren’t as niche as programming, so there are a few different solutions out there to manage this.

Let’s have a look at some of the different tools available.

Of course the real question is, which one will let you search fastest on stackoverflow.

KnowBrainer is a layer on top of Dragon that allows us to extend it with commands to e.g. interact with GUI programs. It allows for very impressive mouse and keyboard simulation. Watch it in action here.

Utter command allows you to efficiently navigate in the Windows file system and applications. It’s a reasonably old product, but it’s completely flown under my radar. I’m not sure whether it still works on modern versions of windows, but their website is a goldmine for knowledge related to speech input.

Intag Speech Interface for VoiceComputer is one of the most promising things I’ve seen yet. It superimposes numbers on top of regular buttons in Windows programs, which means that you’re able to interact with almost anything.
Mouse control is generally one of the weaker points of hands-free computing, but this could be a game-changer for that. Check out the demo here.

Dragon ships with browser plugins that allow you to remote control your browser.

If you’re not using Dragon or you’re looking for something that builds on top of their browser plugins, there’s this wonderful browser plugin by James Stout, of handsfreecoding.org. It’s called modeless-keyboard-navigation, and it lets you control much of your browser with shortcuts.
You can then hook your voice commands up to these shortcuts to control your browser via voice.

Editor & language support

You generally see a lot of voice coding being demonstrated with Python.
Python has a lot of very strong points as a language. It’s concise and the syntax is reasonably human readable.

In my opinion, any language well suited for voice coding needs two different qualities, and as we’ll see, python only has one of them.

Conventions for naming things with human-readable names

The first thing that’s needed is strong conventions. We want a language where there’s a convention for the variable casing, but also where there’s a convention for using “real” words for variable names. Let’s take a look at why.

Let’s say we want to declare a variable:

var clientContextId = // whatever

An example of a voice command that would declare this could be the following:
“variable client context id”

This can “just work” because of these two aspects. A convention for capitalization means we know the variable name will be in camelCase. The convention for using real words for variable names means there’s no friction where we have to spell things out. We just say what we want, and we can let the computer figure out the rest.

Static type systems

Python is a very dynamic language — we don’t always know (or care) what type our variables have. However knowing the types of our variables while writing the code, allows for smarter coding by voice.

Let’s take another hypothetical example, of adding a number to a list, in two different languages. C#, which has a static type system, and Python, which does not.

// C#
// the 'list' variable is defined somewhere else
list.add(5)

One way of accomplishing this could be to say the following: “call list add five”

Which the computer will understand as “call the add method on the list variable, with the number five (5) as a parameter”

This is pretty pleasant, but it requires that our editor understands where we are in our code right now. It needs to understand what variables are in scope, what methods exist on them, and what parameters they take.

Let’s try the same with a Python example:

#Python
#list is defined somewhere else
list.append(5)

If we say “call list append five” as above, we’re unable to figure out whether or not there’s a method called append that takes five as a parameter, or whether there’s a method called append_fiveor even whether “five” is a number or a string.

We’ll end up needing to add some extra words, to explain where the function name stops, and the arguments begin. An example of this slightly longer command could be, “call list append parameters five”.

These examples are all hypothetical. Unfortunately, there’s no editor that works together with voice-coding software right now. But in the future, I think we’ll be able to get a much better experience out of languages that are more analyzable.

We’re also much more resistant to errors in the speech recognition. Even if the computer heard “call list bat” instead of “call list add” in the C# example above, it would be able to call the right method as “bat” is reasonably close to “add”.

Conclusion

Voice-coding is still progressing at a reasonable pace. We are about to reach the point where voice recognition is as good as it gets, and then we’ll only be limited by our editors.

At some point, voice-coding is going to transcend simply pretending our mouth is a keyboard. But one of the prerequisites for this is that our editor, and by extension our languages, are smart enough to let that happen.

Get set up with LogRocket's modern error tracking in minutes:

Visit https://logrocket.com/signup/ to get an app ID

Install LogRocket via npm or script tag. LogRocket.init() must be called client-side, not server-side

npm
Script tag

$ npm i --save logrocket 

// Code:

import LogRocket from 'logrocket'; 
LogRocket.init('app/id');

// Add to your HTML:

<script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script>
<script>window.LogRocket && window.LogRocket.init('app/id');</script>

(Optional) Install plugins for deeper integrations with your stack:
- Redux middleware
- NgRx middleware
- Vuex plugin

Get started now

#accessibility

A guide to wrapper vs. container classes in CSS

A breakdown of the wrapper and container CSS classes, how they’re used in real-world code, and when it makes sense to use one over the other.

Temitope Oyedele

Jul 7, 2025 ⋅ 10 min read

Stagehand and Gemini logos on a gradient background symbolizing AI web automation

How to build a web-based AI agent with Stagehand and Gemini

This guide walks you through creating a web UI for an AI agent that browses, clicks, and extracts info from websites powered by Stagehand and Gemini.

Elijah Asaolu

Jul 4, 2025 ⋅ 8 min read

Getting started with Claude 4 API: A developer’s walkthrough

This guide explores how to use Anthropic’s Claude 4 models, including Opus 4 and Sonnet 4, to build AI-powered applications.

Andrew Baisden

Jul 3, 2025 ⋅ 16 min read

AI dev tool power rankings & comparison [July 2025 edition]

Which AI frontend dev tool reigns supreme in July 2025? Check out our power rankings and use our interactive comparison tool to find out.

Chizaram Ken

Jul 2, 2025 ⋅ 3 min read

View all posts

One Reply to "Programming by voice in 2019"

rajshri says:

December 31, 2019 at 2:18 am

i would like to work on this topic …as i m looking for research topic n it seems interesting to work on such topic…i would like to use nlp/nlg techniques for research

Reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →