Impressions of NaturallySpeaking Version 11.x
Copyright © Alan Cantor 2011, 2012. All rights reserved.
About this Review
Nuance released a substantially revised NaturallySpeaking in August 2010. Since then, many of my clients have asked my opinion of it. The question foremost on their minds: "Should I upgrade from Version 10.1 to Version 11?"
To help answer the question, I prepared the following summary. The first draft, which I emailed to a group of my clients in January 2011, was based on testing the product for three months; helping clients with NaturallySpeaking 11; and monitoring online NaturallySpeaking discussions. I expanded the review after receiving excellent suggestions from Jane Berliss-Vincent, Ray Grott, and contributors to the Knowbrainer Speech Recognition Forums.
This review is not comprehensive. I do not describe every new feature, command, enhancement, and bug. For a full product description, the Nuance website is a good place to start.
I will be updating this document throughout Version 11's life-cycle. Feel free to send me comments and suggestions.
Summary of Changes in Version 11.x
Edition changes
Nuance offers five editions of Version 11 — "Home," "Premium," "Professional," "Legal" and "Medical" — each with a different set of features. The "Preferred" Edition is no more; it has been re-branded as the "Premium" Edition.
In January 2011, Nuance announced the release of the UK version of the "Medical" Edition. About six months later, Nuance released Dragon Medical Practice Edition internationally.
In June 2011, Nuance announced the release of Version 11.5. Changes between Version 11 and 11.5 include:
- Support for Internet Explorer 9
- The "Spelling Window" is resizable
- The layout of the "Dragon Sidebar" has been changed, and there are sample commands for more applications
- A free app is available that allows an iPad, iPhone, and iPod Touch to work as a wireless Wi-Fi microphone for NaturallySpeaking (Premium and higher, only)
More software supported
Version 11.x supports more programs than Version 10.1, including Microsoft Office 2010 applications, and OpenOffice Writer. Compatibility with OpenOffice Writer is not 100%, but includes, according to the Nuance website, "dictation, correction, selection, and playback." Note that "formatting" is not listed.
The "Home" and "Premium" Editions now fully support Microsoft Outlook. In the past, Outlook support was available only with the "Professional," "Legal," and "Medical" Editions.
Nuance claims support for Mozilla Firefox, but I notice no improvements over Version 10.1. Browsing by voice is much easier with Internet Explorer.
Accuracy improvements
Version 11.x is noticeably more accurate than Version 10.1. Expect to correct misrecognition errors about half as often as before.
Initial training
When creating a new user profile, there are three documented "Training" options, plus a fourth option that is not. The documented options are:
- Show text with prompting
- Show text without prompting
- Skip training
"Show text with prompting" is for users with standard voices. "Show text without prompting" is for those with non-standard accents, or who have reading difficulties.
"Skip training" corresponds to the "None" (no training) option in Version 10.1, which worked surprisingly well. I regularly created accurate profiles in about five minutes. In Version 11, I find this option works less well; I achieve better accuracy after a short training session. Now, I spend about ten minutes creating a profile instead of five. But given the accuracy improvements in Version 11, this is not a deal-breaker.
I have read reports of people who get good accuracy when they skip training. My suggestion: Try it. But if accuracy is off, do five minutes of "General Training" later.
There is a new, undocumented training option in Version 11. Choose "Show text without prompting," select a reading from the list — it does not matter which — and then ignore it! Start training, but read any text until the counter reaches four minutes. I have found this method yields excellent accuracy.
For people with standard voices, Version 11 appears to need four or five minutes of data to build an accurate profile, regardless of audio source. For example, Nuance claims it now takes four minutes, instead of 15 minutes, to create a profile for a digital recorder.
User Interface changes
The user interface has been redesigned. A new contextual help system, the "Dragon Sidebar," automatically displays commands and tips as you switch between windows. Experienced users will likely choose to hide the Sidebar, but novices may find it helpful.
The new "Results Display" is a streamlined version of the old "Results Box." Instead of showing the results of NaturallySpeaking's ongoing analysis, the "Results Display" provides more subtle feedback: for example, a rotating shape indicates NaturallySpeaking is processing speech. The simpler display is meant to be less distracting, and encourage users to dictate in longer phrases, which improves accuracy. Some people prefer the "Results Display," others the traditional "Results Box." It does not matter. You can choose the one you want, or hide both.
The new User Interface may create barriers to people with low vision and/or learning disabilities. For example, the "Spelling Window" (which replaces the "Spell" dialog box) is not resizable in Version 11.0 (fixed in 11.5); its fonts are hard-coded (so they cannot be changed or enlarged); and the poor contrast between the green typeface and the white background makes text harder to read. For everybody else, the "Spelling Window" takes getting used to, but presents no particular problems.
New commands
Noteworthy commands were introduced in Version 11. To display a list of all open windows, say list all windows. To display a list of application-specific windows, say (for example) list windows for Microsoft Word or list windows for Firefox. In response, NaturallySpeaking displays a numbered list of windows. You can switch to any window on the list without knowing its exact name.
Commands introduced in Version 11.5 include:
- Social networking commands such as Post that to Facebook and Tweet <text>. Here is a description of how to disable these commands.
- "Wrap" quotation marks around selected text (or the last utterance) by saying quote that; and insert a pair of quotation marks, with the cursor between them, by saying empty quotes. Similar commands include bracket that and parentheses that; and empty brackets and empty parentheses
- Undo all reverses the effect of the "choose all" command
New behaviour for editing commands
Editing by voice is more precise. When you say commands like delete <text>, capitalize <text>, and bold <text>, NaturallySpeaking overlays a small number next to each instance of the word or phrase. Pick the one you want by saying its number; or choose them all by saying choose all.
In other words, when editing by voice, you may need to say two commands instead of one. But you are more likely get the result you want. Some users like this new behaviour, others not. There is no option to turn it off.
Microphone performance
Certain microphones will perform better. Version 11.x samples a different frequency range than before. This change will not improve accuracy for every microphone, but will for some.
Notwithstanding this change, I continue to recommend that "serious" users get top-of-the-line microphones. A quality USB microphone makes NaturallySpeaking more responsive, more accurate, and less error-prone. Improved productivity will ensure quick recovery of any additional cost — in some cases, cost recovery will happen in one or two days.
New hardware and software requirements
Be skeptical of Nuance's published hardware requirements for NaturallySpeaking. To take full advantage of Version 11, you may need an up-to-date computer. For a 64-bit Windows 7 PC, an i7 CPU with 8 GB RAM might not be excessive.
NaturallySpeaking 11.x runs on older PCs. In fact, some people report excellent performance on Core 2 Duo CPUs. As with Version 10, Version 11 adjusts itself to match the system, but it may select overly optimistic program settings. To get acceptable performance, you may need to manually change the default program settings. For example, when creating a new user profile on a Core 2 Duo 2.0 GHz PC, you might need to select the "Best Match III" option instead of the more resource-intensive "Best Match IV" option.
Vocabulary editor changes
When adding words or phrases via the "Vocabulary Editor," the "Spoken form" can no longer contain punctuation marks or symbols. For example, if you add power/knowledge to the vocabulary, and want to pronounce the slash, the Version 10 "Spoken form" could be "power / knowledge". In Version 11.x, you must change the symbol to a word: "power slash knowledge".
When importing a file containing custom words and phrases, check the list before importing it into Version 11.x. Edit any "Spoken forms" that contain symbols or punctuation marks:
Written form | Spoken form: Version 10.1 | Spoken form: Version 11.x |
---|---|---|
Ti & Lion Inc. | Tie & Lion Inc. | Tie and Lion Ink |
Midge + Bros | Midge + Brothers | Midge plus Brothers |
Cathy | Cathy with a C. | Cathy with a C (or "with a See") |
colour | color with a U. | color with a U (or "with a You") |
Custom commands
Commands scripted for Versions 9 and 10 should work in Version 11.x, with a few exceptions:
- Under Windows 7, Advanced Scripting SendKeys commands that insert large blocks of text may need to be divided into a series of shorter SendKeys commands, or rewritten as SendDragonKeys or SendSystemKeys commands.
- The spelling of in-line vocabulary commands (e.g., new line, cap, and no caps) are now lowercased. Custom commands that use HeardWord to activate in-line commands will need to be revised. (In-line vocabulary commands do not require a pause before or after.)
Known or suspected bugs
Several problems were reported in Version 11.x:
- Slowdowns. Some users report slowdowns that last 20 seconds, a minute, or who run Version 11 on up-to-date PCs are not immune. The reasons for the slowdowns are unclear. The remedy may be as simple as running a utility to remove "dead links" from the Start Menu, emptying certain folders; and rebooting the computer once a day. Not everybody experiences major slowdowns. (I used to, but don't anymore.)
- Moving the cursor 20 units. Cursor movement commands such as move up twenty and move right twenty move the cursor two units instead of twenty. These commands still work as expected for 1 to 19.
- Cursor jumps. The cursor may jump unexpectedly when using NaturallySpeaking plus the keyboard and/or the mouse.
- Extra or replacement characters. A number of users have reported that NaturallySpeaking occasionally inserts an extra character, or replaces a character somewhere in the viewport.
- Spontaneous wake-ups. When NaturallySpeaking is asleep, environmental sounds may cause Version 11 to "wake up" more readily than in the past.
- Move messages in Outlook. A newly-introduced command, move this to the [folder-name] folder, may, under certain circumstances, permanently delete the message. I experienced the problem myself, and reported it to Nuance. Nuance responded:
"[A]fter numerous attempts, we were never able to reproduce the scenario... That's the nature of software running on different environments. We did find some other glitches — which we will address in future releases of Dragon."
Verdict: Should You Upgrade?
If you are successfully using Version 10.1, the answer is a definite maybe. For me, the improved accuracy has made the upgrade worthwhile, despite the growing pains. I much prefer the new "Results Display" over the "Results Box" (which I have always found obtrusive). Initially, I was skeptical about the changes in the behaviour of editing commands, but I have come to appreciate them. Overall, I am happy with the upgrade. But I have an up-to-date computer running Windows 7.
On an older PC, you may experience slow or halting performance. I have seen Version 11.0 struggle on a Windows XP Pro machine with a Core 2 Duo CPU and 3 GB RAM, even when I created a profile with conservative settings: BestMatch III instead of BestMatch IV; Medium vocabulary instead of Large Vocabulary, Speed vs. Accuracy slider set to 50%. But I have also seen Version 11.5 work well enough on older computers set to BestMatch IV and Large vocabulary.
If you have low-vision, you may find that the "Spelling Window" hard to read without screen magnification software. But if you already use screen magnification software, this will not be an issue.
Resources
My company's NaturallySpeaking training and scripting services.
Other articles on speech recognition.
What's new in Version 11? (Nuance website).
Acknowledgements
Ray Grott and Jane Berliss-Vincent's thoughtful comments on this review helped me make it better. Many thanks also to all who contribute to the lively discussions on the Knowbrainer Speech Recognition Forums.