Voice Commands in Windows Phone 8 - Updating Phrase Lists at Run Time

by Ishai Hachlili 29. October 2012 12:45

In the previous post, I created a simple voice commands file and registered it with the WP OS.
My command was “Find {huntTypes} Hunts”.
huntTypes is a phrase list that was hardcoded in the VCD file with the following values: “New”, “Nearby” and “My”.

This means the user can say “Play The Hunt, Find New Hunts”.
I don’t really want to maintain this phrase list in the VCD file, instead I want to call a web service and get a list of types that my backend search can handle and update the phrase list.

It’s very simple to update the phrase list, you simply need to get the command set that includes the phrase list you want to update and call UpdatePhraseListAsync.

private void AddToPhraseList()
var vcs = VoiceCommandService.InstalledCommandSets["USEnglish"];
vcs.UpdatePhraseListAsync("huntTypes", new[] {"New", "Nearby", "My", "San Francisco", "New York"});

Now when the user says “Play The Hunt, Find San Francisco Hunts” the query string will include the added phrase list item

[0]: {[voiceCommandName, findHunts]}
[1]: {[reco, Play The Hunt Find San Francisco Hunts]}
[2]: {[huntTypes, San Francisco]}

Of course, in the real app I would get the list of phrases from some other source, instead of hardcoding it like I did in this sample.

Tags: , , ,

Speech Recognition | Windows Phone 8

Integrating your app with the Windows Phone 8 Voice Commands

by Ishai Hachlili 29. October 2012 12:19

One of the new APIs in Windows Phone 8 is the Voice Commands API. This API allows you to integrate your own app with the main voice command functionality so when the phone’s start menu is held and the voice prompt comes up your app can be launched from it.

How it works

To add support for voice commands in your app you need to add a Voice Command Definition file to your project and register that file with the OS the first time your app launches. Once you do that, your app can be launched with the commands you defined.

When a command is launched, the target page associated with that command will be opened directly (just like the search extensions work in WP7).

To use this feature you need to add the following capabilities to your app (in the WMAppManifest.xml file)
(like all speech recognition solutions, the actual processing is done on the server side, the phone just streams the audio to the server and gets the text result back. that’s why networking must be enabled  for the app)

The Voice Command Definition File

The Voice Command Definition file, or VCD file, defines the commands your app will support.
You can add a new VCD file from the Add New Item menu.

Here’s the VCD file I’m using in this sample:

<?xml version="1.0" encoding="utf-8"?>
<VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.0">
<CommandSet xml:lang="en-us" Name="USEnglish">
<CommandPrefix> Play The Hunt </CommandPrefix>
<Example> Find Nearby Hunts </Example>
<Command Name="findHunts">
<Example> Find New Hunts </Example>
<ListenFor> Find {huntTypes} [Hunts]</ListenFor>
<Feedback> Searching For Hunts </Feedback>
<Navigate Target="/findhunts.xaml"/>

<PhraseList Label="huntTypes">
<Item> New </Item>
<Item> Nearby </Item>
<Item> My </Item>


In this VCD I’m defining a single command set for English. You can define additional command sets for other languages as well.
Let’s dig into this simple definition

Each command set can have a prefix.  When a user holds the start button and wants to launch a command in your app they will start the command with your prefix.
The prefix for the command set is optional. If it’s omitted your app name will be used as the prefix.
You might want to use a prefix if your app name is too long or hard to pronounce. You can also use it if you have multiple command sets for different languages.

Tip: If your app name is hard to pronounce, try using a prefix that phonetically corresponds to the app’s name.

The example text will show up in that initial dialog (not always, from my tests it seems to show up after the user already used your voice commands)

The voice command prompt


The next part of the XML file is a Command. You can have multiple commands, in this sample I only have one called findHunts.

The example text here will be shown when the command wasn’t found (the user has to tap the Tap Here link to see it)

Voice Command not foundsuppored commands for the app

Now we get to the most important parts of the command, the ListenFor and Navigate fields.

The ListenFor is the text you’re expecting for the command.

As you can see I’m using brackets to signify special meaning. The square brackets mean these words are optional (so in my example the user can say ‘Find New Hunts’ or just ‘Find New’).

The curly brackets are a phrase list. In this case the list is included in the XML file. You can also modify these lists at run time so if for example you want to use a list of products, you can get them from a web service and add them to the phrase list.

The Feedback is the text that will be displayed and spoken (using TTS) when this command is recognized.

The voice command feedback

The Navigate field defines the page you want opened with the recognized parameters

Register the VCD file

You’re supposed to register the VCD file once, on your app’s first launch.
Here’s a piece of code to do that:

private async void RegisterVoiceCommands()
await VoiceCommandService.InstallCommandSetsFromFileAsync(
new Uri("ms-appx:///VoiceCommands.xml", UriKind.RelativeOrAbsolute));

You can also check the InstalledCommandSets to see if you commands were already registered.

Handle the command in the target page

When a voice command for your apps is recognized, the target page will be launched and the URI for that page will include the recognized parameters in it.

The sample from Microsoft uses OnNavigatedTo to extract the query parameters from the NavigationContext.
If the page was opened from a voice command, a parameter named voiceCommandName will be in the query string. Testing for it’s existence is a good way to check if you got voice recognition results.

Here are the query string parameters you get when the user says “Play The Hunt, Find New Hunts”

[0]: {[voiceCommandName, findHunts]}
[1]: {[reco, open Play The Hunt Find New Hunts]}
[2]: {[huntTypes, New]}


If you have more than one command with the same target page, you can check voiceCommandName to figure out which command the user launched.

The reco parameter contains the whole recognized text. But it can only contain text that was an expected command. You can’t just get the text spoken by the user.
This is a big limitation, it means you have to launch the app with a command first and use TTS and Speech Recognition to carry on the conversation with your user. (I would prefer being able to say AppName, Remind me to call Someone tomorrow at 10 AM, instead it will have to be AppName, Add Call Reminder, or Remind Me To Call but the name and time will have to be asked for)
This is the same way hands free texting works in WP7, which works great but feels to verbose sometimes.

The huntTypes parameter contains the recognized word from my phrase list. I haven’t tried multiple phrase lists in one command, if that works, you’ll probably get more than one here. This means you don’t have to check the reco string at all. you can just use the phrase list to figure out what you want to do.

At this point, you can execute whatever code you need based on this command.

I’m using Caliburn Microfor my projects, so I won’t be overloading OnNavigatedTo in real projects. Instead, I’ll just add these three query string parameters as properties to my view model and Caliburn Micro will automagically parse the query string.
If you’re not using Caliburn Micro, you’re writing more code than you have to…

Tags: , , ,

Speech Recognition | Windows Phone 8

Is Microsoft learning from Apple?

by Ishai Hachlili 6. October 2012 17:11

In the last few years, Apple has passed Microsoft like a Formula 1 car passing a minivan on their way to becoming the largest tech company in the world.
There are several factors that helped them get there.

They realized the future is in smart mobile devices in consumer ends while everyone was still trying to sell them only to business users (that was obvious in the UI, hardware and cost of smartphones and tablets that existed before the iPhone came out)

They didn’t play by the mobile industry rules. They didn’t bend to appease carriers demands. They make more money on each iPhone sold than Android/Windows Phone manufacturer make on their phones. A lot more.

They did a great job at marketing their products. Every small feature was revolutionary, invented and patented by them. Even when it was far from the truth. They said it, the local news repeated it and even tech savvy people actually believe it. It’s hard to be surprised when a jury that heard that on TV or read it in the papers finds that Samsung copied from Apple when most of those patents had prior art.

A big part of their marketing plan is announcing the new revolutionary phone(usually having some surprises in the announcement) and having it available soon after. They also limit the availability, because it’s proven that a hot product that’s flying off the shelves will created more demand (see the Nintendo Wii, Kinect and any iDevice when they were released). Faking your way to long lines and empty shelves might feel like lying, but so does saying you have the greatest maps app when you only started working on it less that a year before that statement. It’s just what Apple does.

They also pick a couple of features and hammer them in with ads. With the iPhone 4S it was siri (even though it wasn’t really new, it was available as an app long before Apple bought it) with the iPhone 5 it’s the size and the panorama feature (both are things that Android and WP7 did before)


So is Microsoft learning from Apple?

well, they are trying.

They scheduled the Windows Phone 8 launch event for October 29th, The phone will be in stores the weekend after. That’s the kind of announcement to release window Apple does.

They have great new features in the WP8 OS, features Apple will call revolutionary when ever they get around to implementing them.

They have OEM partners, making the cost factor irrelevant for them, it’s not how they’re making money right now.

They’re trying to keep things under wraps with WP8 , but they already announced almost everything there was to announce. They want to keep things secret so badly, they’re telling developers they don’t care about them or their apps (even tough the main point against WP is actually some key apps missing, not total numbers, but key apps). Not that it helped (the top secret SDK was leaked very quickly, and didn’t have anything new really)

Maybe the fact that no local news channel picked up on any of the WP announcements (because it wasn’t Apple) means it doesn’t matter that all the new features and device details are already known because their target audience doesn’t really know about it, it’s only us tech geeks. Maybe they’ll be able to get that local news coverage with the 10/29 event. The Windows 8 launch the Friday before (on the 26th) will help too, reporters and analysts are already confused by Windows 8 and WP8. Potential buyers might come in to stores to look at the new Windows 8 tablets and that confusion could at least get them to look at Windows phones.

So Microsoft is trying, but they’re kind messing it up. Maybe next time they’ll learn. Maybe they will also have the new OS and devices ready for a September launch to compete with the next iPhone. Because what really hurts WP8 is the lack of devices in the carrier stores when the big draw of the iPhone brings all these people in. The Lumia 920 and iPhone 5 side by side? I gotta think it will convert at least some users to WP.

So either the confusion gets people looking at WP8, or it’s a slow haul for Microsoft. One thing for sure, with the Android licensing fees they’re getting, they can keep working on WP for a while, and if Nokia can’t stay with it and Samsung/HTC decide to stop supporting it, Microsoft can always go with a The Surface Windows Phone. you know you want one…

Tags: , ,

About Me

Ishai Hachlili is a web and mobile application developer.

Currently working on Play The Hunt and The Next Line

Recent Tweets

Twitter October 23, 05:22
@BenThePCGuy a standard where that doesn't matter is better. One more reason to get the #Lumia920, wireless charging, no need for microUSB

Twitter October 23, 05:21
@ManMadeMoon where they dance around the issues and don't really talk about them

Twitter October 23, 05:20
@BenThePCGuy are you a @wpdev ?

Twitter October 23, 04:17
@JonahLupton But if it's black it's usually better

Twitter October 23, 02:58
@jongalloway next time ask your 5 year old how to spell