Home / Publications / Speech Recognition / Voice-enabling a Web-based Application

Voice-enabling a Web-based Application: Lessons Learned

Copyright © Alan Cantor 2008. All rights reserved.


Speech recognition technology is becoming surprisingly accurate, robust, and versatile. The most mature product on the market, Dragon NaturallySpeaking, excels at recognizing not only words and numbers, but also commands for formatting and revising text, web-browsing, and controlling windows, dialog boxes, and menus.

Out of the box, NaturallySpeaking works well with popular word processors, e-mail programs, spreadsheets, and browsers, but less well (or not at all) with non-standard and proprietary applications. Voice-enabling these programs usually requires scripting. NaturallySpeaking Professional, Legal, and Medical editions include a scripting environment for the development of custom commands ("macros"). This paper highlights the lessons learned while scripting macros for a complex web-based application. We undertook the scripting project while implementing a return-to-work plan for an employee with a disability.

About the client

The client was an office worker with computer-induced repetitive strain injuries. In particular, overusing a pointing device (e.g., mouse, trackball, or touchpad) precipitated painful flair-ups. Much of her job involved entering data into a legacy case management system. Its user interface was 100% keyboard-driven, and did not cause the client significant problems.

Recently the organization switched over to a proprietary case management system with a Microsoft Internet Explorer front end. Keyboard access was possible, but not practical. Most actions required dozens of key presses.

Soon after the introduction of the web-based program, the client, who only knew how to operate it with a mouse, began to experience increasing pain and weakness in her fingers, hands, wrists, arms, elbows, neck, shoulders, and back. Eventually, her physician recommended that she take a six-month medical leave.


When the client was ready, we prepared the return-to-work plan. Some of the recommended accommodations included: coaching on touch typing and keyboard-only techniques; a timer to schedule regular breaks; a split keyboard; work station adjustments; and work area reorganizations.

We also conducted an accessibility/usability study of the case management system, and recommended that the developers improve its keyboard interface. They responded by introducing several hotkeys, including Alt+1 for moving keyboard input focus ("focus") to the first form control on each page.

Despite these enhancements, the client began to experience discomfort from typing. As she had previous experience with ViaVoice speech recognition software, we trained her to use NaturallySpeaking, and suggested that she try driving the case management system by voice.

Difficulties operating the application by voice

Driving the application by voice proved difficult. We showed the client general techniques for interacting with HTML pages. For example, to activate a link, say all or part of the hypertext, and to move focus to a form control, say the caption that appears next to it. These techniques did not work reliably, probably due to HTML and JavaScript problems.

We also taught her commands for navigating to specific form controls. For example, to navigate to any field on a screen, say "Text," "Type Text," "Edit Box," or "Text Field." These commands place a number next to every field; to choose a field, say the corresponding number. (There are similar commands for drop-down lists, radio buttons, checkboxes, pushbuttons, and images.) Unfortunately, these commands failed more often than they worked, even after she trained them repeatedly.

Because of these problems, the employer contracted us to voice-enable about 25 pages of the application. We scripted macros for approximately 250 edit fields, drop-down lists, hypertext links, radio buttons, checkboxes, and pushbuttons. Our goal was to create commands that required little or no memorization by choosing command names that mirrored the words on the screens. For example, "City" moved focus to the "City" edit field; "Send Report" activated the "Send Report" hypertext link.

Each screen had a row of hypertext links for navigating to related pages. Some of these links were identical to form control captions. To differentiate navigation links from form controls, we adopted a convention that navigation commands were spoken as "hypertext + Please." For example, to activate the "Client Information" link, say "Client Information Please."

Two limitations of NaturallySpeaking macros

After scripting several screens, we discovered a NaturallySpeaking bug. In the Internet Explorer application, form controls on different pages had identical captions. For example, two pages had a "Surname" field, a "Country" drop-down list, and a "Confirmation Received" checkbox; four pages had a "Notes" field; and six pages had a "Comments" field. The normal way to script commands with identical names is to make them window-specific. The scope (availability) for a command is restricted to the window in which the command is active. Unfortunately, the NaturallySpeaking 9.5 MyCommands Editor prohibited duplicate command names for Internet Explorer windows. Thus, there was no simple way to create two or more different commands with the same name.

Another shortcoming was that macros that send keystrokes executed very slowly. Although not a problem for simple scripts, many of our scripts sent twenty, thirty, or more keystrokes. We experimented with every macro creation option in the MyCommands Editor, but none yielded noticeably faster scripts.

Our workaround to both problems was to script NaturallySpeaking commands that triggered Macro Express macros. Macro Express is a versatile macro creation utility for Windows. Its "Text Type" command outputs keystrokes significantly faster than the NaturallySpeaking "SendKeys" command, and it recognizes Internet Explorer window titles. Furthermore, the scope for Macro Express commands is easily set.

We eventually learned about a Win32 API (Windows Programming Interface) function that allows NaturallySpeaking commands to inspect the active window title. Nevertheless, we continued to use NaturallySpeaking to trigger Macro Express because Macro Express scripts run faster, handle scope better, are quicker to develop, and easier to test. Appendix A compares the code for determining active window titles in Macro Express and NaturallySpeaking.

We experimented with two ways for NaturallySpeaking to trigger Macro Express: (1) NaturallySpeaking sends keystrokes to launch a Macro Express hotkey macro; (2) NaturallySpeaking activates a shell command to launch a Macro Express macro. In theory, the first method should execute more quickly, while the second method should be more reliable. In practice, both methods worked equally well. A code sample for each approach appears in Appendix B.

Strategies for voice-enabling web-based applications

We encountered many technical hurdles while scripting macros for the web-based application. In meeting the challenges, we learned many ways to simplify development and increase the robustness of voice commands.

Consider the browser and its settings

When scripting a web-based application, tailor macros to one browser. It may even be necessary to script for a particular version (or versions) of a browser.

Browser settings affect script performance. Features such as user preferences, browser tabs, toolbars, add-ins, and extensions may change the appearance, layout, and behaviour of pages. Some settings help while others hinder. Keep browser settings in mind when developing voice macros for web-based applications, and document the necessary browser settings.

Avoid mouse actions unless absolutely necessary

Voice commands that click on screen objects execute quickly, but tend to be undependable. The position of a target may vary with changes to window size, window position, vertical and horizontal scroll offset, screen resolution, and appearance settings. In addition, scripts that perform mouse actions are less portable: a macro that clicks a target on one computer may miss it when run on another.

Know how to interact with the browser by keyboard

Macros that send keyboard commands are generally more reliable than macros that point-and-click. Therefore, knowledge of browser hotkeys and keyboard techniques is essential when scripting web-based applications.

Note that macros that emulate mouse clicks are the only way to perform certain tasks. One of our commands selected a word and tested its value. The only method to select a word on a page rendered by Internet Explorer is to double-click it.

Communicate with the developers

No macro is 100% reliable, especially when it handles information coming through a network. In some situations, no combination of keystrokes and mouse manipulations yields satisfactory results. When this occurs, talk to the application developers. The most effective way to deal with certain accessibility problems is for the developers to make minor coding changes.

During this project, the macro creation process became much more manageable after the developers added new keyboard commands. The most valuable was Alt+1, which moved focus to the first form control on any page. With this hotkey, any control was given focus by sending Alt+1 followed by a fixed number of Tabs or Shift+Tabs. About 80% of scripts included the Alt+1 hotkey.

Discuss accessibility and usability concerns with the developers. Although the developers of this application could not make every change we requested, the modifications they did make greatly simplified the macro creation project.

Send long keystroke sequences via non-NaturallySpeaking macro tools

Consider using macro creation tools in addition to NaturallySpeaking to send long keystroke sequences. The Macro Express "Text Type" command sends keystrokes much faster than the NaturallySpeaking "SendKeys" command.

The only macro utility that we tested was Macro Express, but it is not the only macro utility that can be linked to or driven by NaturallySpeaking. Python is known to work, but other programs listed in the "Resources" section below may also be compatible with NaturallySpeaking.

Learn workarounds for the NaturallySpeaking scope bug

Until the scope bug is fixed, workarounds may be needed when developing NaturallySpeaking commands for Internet Explorer-based applications. (See Appendix A.) Note that in some web-based applications, the purpose of a window can be inferred by analyzing the address line.

Learn to use Advanced Scripting

Advanced Scripting is a Visual Basic environment built into the MyCommands Editor for developing NaturallySpeaking commands. Although macros can also be created using three non-programming methods, the advantages of Advanced Scripting cannot be overstated. In Advanced Scripting, for example, a single "list command" can replace a large number of related commands. One does not need to be a professional programmer to access the basic functionality of Advanced Scripting. Larry Allen's guide is a superb introduction to the topic.

Create substitutes for unreliable NaturallySpeaking commands

Do not hesitate to script substitutes for unreliable commands. Because "button," "list," "link" and similar commands frequently failed, we created easy-to-remember substitutes such as "Show Buttons," Show Lists," and "Show Links." All were created using Advanced Scripting, and include "HeardWord." HeardWord causes NaturallySpeaking to act as though words were spoken as a command. Here is the "Show Links" script:

Sub Main()
	HeardWord "link"
End Sub

Exploit the strengths of different macro tools

Most of our NaturallySpeaking commands consisted of a single line of code that activated a much longer Macro Express script. Some of our voice commands, however, were two-line "hybrids" that interacted synergistically with Macro Express. The hybrid commands did more than either macro creation tool could do itself.

An example of a two-line hybrid command appears in Appendix B. The first line of the NaturallySpeaking command executes a Macro Express script. Macro Express sends 23 keystrokes to activate a link, which opens a new page. Depending on network traffic, the new page takes anywhere from a fraction of a second to a minute to appear. A delay is needed. But rather than guessing the length of the delay, we used the "Wait for Window Title" command to monitor screen activity until the specified window appeared. The second line uses HeardWord to number the links on the new page. Numbering the links saves a step by anticipating the next action.


Using NaturallySpeaking to trigger Macro Express scripts meant that we wrote two macros for almost every voice command. Yet this did not double the work, and in fact, saved time. We spent approximately 50% of project time discovering and refining the scripting techniques. Once we had worked out the techniques, the actual scripting of 250 commands was straightforward.

The use of two complementary macro creation tools allowed us to capitalize on the strengths of both systems. For example:

  1. When sending keystrokes, Macro Express scripts execute faster than NaturallySpeaking commands.
  2. Macro Express scripts recognize Microsoft Internet Explorer window titles without the need for special programming code.
  3. Separating voice commands from the scripts that drive them streamlines testing and debugging.
  4. The scripting of Macro Express commands can be automated using Macro Express. Macro Express can also be used to automate the scripting of NaturallySpeaking commands in the MyCommands Editor.
  5. For non-programmers, the Macro Express Scripting Editor is easier to use than the NaturallySpeaking MyCommands Editor.



NaturallySpeaking scripting resources

Keyboard access to Windows

Macro resources

Macro software


Special thanks to Larry Allen and Jane Berliss-Vincent for their incisive comments on this paper.

Appendix A: Code Samples for Determining Window Titles

Macro Express

// Macro: "Go to Surname Field"
// Get window title text, and place the value in variable T1...
Variable Set String %T1% From Window Title
// Decide what to do with T1...
Switch (T1)
	Case: Client Information - Microsoft Internet Explorer
		// Move to surname field on Client Information page
	End Case
	Case: Mail Merge - Microsoft Internet Explorer
		// Move to surname field on Mail Merge page
	End Case
End Switch


' Declare three Windows functions
Declare Function GetForegroundWindow& Lib "user32" ()
Declare Function GetWindowTextLengthA& Lib "user32" (ByVal hwnd&)
Declare Sub GetWindowTextA Lib "user32" (ByVal hwnd&, ByVal lpsz$, ByVal cbMax&)
' Use functions to obtain a string with the title of the current window. 
' The string is returned as the value of the ActiveWindowTitle function.
Function ActiveWindowTitle$()
	ActiveWindow = GetForegroundWindow()
	TitleLen = GetWindowTextLengthA(ActiveWindow)
	Title$ = Space$(TitleLen)
	GetWindowTextA ActiveWindow,Title$,TitleLen+1
	ActiveWindowTitle$ = Title$
End Function
Sub Main ()
	Select Case ActiveWindowTitle$ 
		Case "Client Information - Microsoft Internet Explorer"
			' Move to surname field on Client Information page
		Case "Mail Merge - Microsoft Internet Explorer"
			' Move to surname field on Mail Merge page
	End Select
End Sub

Appendix B: Code Samples for Activating a Hypertext Link

Macro Express

// Macro: "Do Purge"
// Activation: Ctrl + Alt + F1
// In this application, Alt + 1 moves focus to the first form control on any page...
Text Type "Alt + 1"
// Press Tab 22 times to reach "Purge Record" link...
Repeat Start (22 times)
	Text Type "TAB" 
Repeat End
// Activate the link...
Text Type "Enter"
// Wait for "Confirm Purge" page to appear
Wait for Window Title: Confirm Purge - Microsoft Internet Explorer

NaturallySpeaking: Hotkey method

Sub Main
' Activate by saying "Purge Document"
	' Press Ctrl + Alt + F1 to activate Macro Express "Do Purge"
	SendKeys "^%{F1}"
	' Show all links on "Confirm Purge" page...
	HeardWord "link"
End Main

NaturallySpeaking: Shell command method

Sub Main
' Activate by saying "Purge Document"
	' Use shell command to activate Macro Express "Do Purge"
	' Note: Use /A switch before macro name
	ShellExecute "c:\Macro Express\meproc.exe /ADo Purge"
	' Show all links on "Confirm Purge" page...
	HeardWord "link"
End Main