CallXML 3.0 Development GuideHome  |  Frameset Home

  Recognizing User Input  |  TOC  |  Introduction to Document Navigation  

Tutorial: Hello World with Choices

This Lesson is based on the things you accomplished in Lesson 1. If you haven't yet done this Lesson, you'll need to go through it first so that you aren't left in the dust with the high-falutin' concepts that we discuss here.

In this tutorial, we will:


Step 1: New event handler elements and input attributes

In our previous tutorials, we ended up with a callxml file that was pretty one-sided: Your caller dials the application, the application plays some audio, then the application hangs up. However, we will eventually want to take caller input, and have our CallXML application respond appropriately, so we will detail the many uses of the new CallXML 3.0 'choice' attribute.
Those familiar with callXML 2.0 are used to seeing the 'termdigits' attribute that 'choice' replaces. So why the change? CallXML 3.0 uses a new format of recognition that allows the devloper to accept both dtmf AND voice input. As such, sticking with the nomenclature of 'termdigit' would be somewhat misleading. We will also look at the new <on> element, which eclipses the older <on(eventName)> handler elements, and offers a much more flexible, and powerful option for event and error handling.

Step 2: Starting out with simple dtmf recognition

As you remember from our last tutorial, the basic starting point for any CallXML 3.0 application begins with our initial xml declarations, and their corresponding closing tags where appropriate. Remember, that as we are using the 3.0 version of the markup, the only suitable 'version' setting will be "3.0", (but you already have this committed to memory, right?).


<?xml version="1.0" encoding="UTF-8"?>
<callxml version="3.0">

</callxml>


Building on concepts previously illustrated, we will now embed a <do> container element to hold our content, and we will introduce our 'choice' element with some numeric values defined:


<?xml version="1.0" encoding="UTF-8"?>
<callxml version="3.0">
  <do choices="1,2,3">

    <say>
      welcome to the virtual shopping mall.
      you may press the floor you wish to visit by pressing the number on your telephone keypad.
    </say>

    <say>
      you can press one, two or three.
    </say>
         
    <wait value="5s"/>

  </do>
</callxml>


As promised, we have a container-level 'choice' attribute defined with a comma-delimited list of numeric values. This listing of values defines the DTMF, (telephone keypad), inputs that are considered valid for this container. We can enter any single dtmf tone here, including both "*" and "#", but note that multiple combinations such as <do choices ="11, 12, 13"> are not doable in this manner. Thus, a caller can press "1", "2", or "3", and the system will recognize it. While we now have an inkling as to how to set up a list of available inputs for our callers, we still need to trap the event itself, and respond appropriately, right? Let's take a look at how the new <on> event handling element fits the bill:

Step 3: Event Handling in CallXML 3.0

As mentioned, we will use the <on> element to filter our caller's input actions, and handle them as we see fit. For those of us used to seeing the <ontermdigit> handler from CallXML 2.0, the new <on> element will seem pretty familiar. Those of us that are super-familiar with CallXML2.0 will doubtlessly see why this new element offers a much better method and syntax of handling our application events than the now-deprecated <ontermdigit/onerror/onexternalevent/etc> elements. For those of you who aren't familiar with Jack, let's take an in-depth look at handling our recognition events:

<on event="choice:(choice name)"

The <on> element has a required 'event' attribute which defines what occurance we want to trap and handle. The 'choice:' prefix in the value defines the specific voice input or dtmf choice that we will execute this handler on. From this explanation, it seems that including some handlers for the events of '1,2,3' would be pretty intuitive for us:


<?xml version="1.0" encoding="UTF-8"?>
<callxml version="3.0">
  <do choices="1,2,3">

    <say>
      welcome to the virtual shopping mall.
      you may press the floor you wish to visit by pressing the number on your telephone keypad.
    </say>

    <say>
      you can press one, two or three.
    </say>
         
    <wait value="5s"/>

    <on event="choice:1">
      <say> going up. first floor.</say>
    </on>

    <on event="choice:2">
      <say> going up. second floor.</say>
    </on>

    <on event="choice:3">
      <say> going up. third floor.</say>
    </on>

  </do>
</callxml>



Step 4: Adding Voice Recognition

CallXML 3.0 is the first incarnation of the markup that allows for both dtmf recognition, and voice recognition. Coupled with the easy to learn syntax, and the low cost, this makes CallXML one of our premeir platform offerings. In this section of our humble tutorial, we will surgically remove the dtmf grammars and replace them with some simple voice commands for navigation. Of course, this also implies that we will need to change our <on> handlers so that they are looking for a voice command, and not a keypress:


<?xml version="1.0" encoding="UTF-8"?>
<callxml version="3.0">
  <do choices="housewares, bed and bath, sporting goods">

    <say>
      welcome to the virtual shopping mall.
      you may say the department store you wish to visit in order to start the elevator.
    </say>
         

    <say>
      you may choose housewares, bed and bath, or sporting goods.
    </say>

    <wait value="5s"/>

    <on event="choice:housewares">
      <say> now arriving at first floor, housewares.</say>
    </on>

    <on event="choice:bed and bath">
      <say> now arriving at second floor, bed and bath. </say>
    </on>

    <on event="choice:sporting goods">
      <say> now arriving at third floor, sporting goods. </say>
    </on>

  </do>
</callxml>


There shouldn't be anything too confusing in the above CallXML document; all we did was replace "1,2,3" with "housewares,bed and bath,sporting goods". Note especially our second voice option, in that we have defined a multi-word utterance. Our caller can say 'bed and bath', and this will be recognized. If a caller says just 'bed', then you can guess at what the results will be. If we wanted to define multiple possible utterances, then we would need to create a grammar that looks a little more fancy, (but still comparitively simple, next to the SRGS grammar formats that VoiceXML uses). Since we will indeed be waxing 'fancypants' in our next Step of the tutorial, let's go whole-hog, and create an app that takes dtmf, and multiple possibilites for voice utterances:


Step 5: Semi-Complex Grammars in CallXML 3.0

Once again, we will be taking our 'beginner' file from Step 3, and making some surgical edits to the 'choice' container element attribute, and the 'event' attribute of the <on> element. Hey, not so fast there, pal. Performing surgery without studying the course material is like...performing liver surgery. With a Chilton's guide for a '78 Buick as our only reference. Brrrr........


<do choices="ReturnValue1 (user utterance one A, user utterance one B, 1),
              ReturnValue2 (user utterance two A, user utterance two B, 2),
              ReturnValue3 (user utterance three A, user utterance three B, 3)">


As one could readily intuit, the 'ReturnValueX' value defines what will be returned to the application, (and what we will insert within our <on event="choice:"> element), upon a recognized user utterance. We define our valid voice commands ("user utterance one A", and the like), immediately after the 'return value' declaration, in a comma-delimited list contained with parenthesese. We can define any number of valid user utterances, or return values within our 'choice' attribute, as long as we got our syntax right, (watch those commas, Ladies and Gents!). Now that we have wrapped our collective heads around these concepts, let's put engage in making the Word into Flesh:


<?xml version="1.0" encoding="UTF-8"?>
<callxml version="3.0">

  <do choices="1st Floor (first, housewares, 1),
                2nd Floor (second, bed and bath, 2),
                3rd Floor (third, sporting goods, 3)
">

    <say>
    welcome to the virtual shopping mall.
    you may say the department store you wish to visit to start the elevator,
    or you may simply press the floor number on your telephone keypad.
    </say>
         

    <say>
      you may choose housewares on the first floor,
      bed and bath on the second floor,
      or sporting goods on the third floor.
    </say>

    <wait value="5s"/>

    <on event="choice:1st Floor">
    <say> now arriving at $session.lastchoice; </say>
    </on>

    <on event="choice:2nd Floor">
    <say> now arriving at $session.lastchoice;</say>
    </on>

    <on event="choice:3rd Floor">
    <say> now arriving at $session.lastchoice;</say>
    </on>

  </do>
</callxml>


Just like we said we would, we have defined a series of voice commands, ("first", "housewares"), and dtmf grammars,("1"), that share the same return value, ("First Floor"). We modeled our event handlers to key off of the aforementioned return value, and even added in a session variable that holds the last recognized choice that the caller input, ("$session.lastchoice;"). Now, all we need to do is to upload our test script, map a CallXML 3.0 Beta number to it, and test it out!

And once again, us  folks at Voxeo have predicted any MTV generation attention span deficit, and have provided not one, but *all three* versions of this tutorial that you can download if you are feeling lazy.

  CallXML 3.0 source code.




  ANNOTATIONS: EXISTING POSTS
0 posts - click the button below to add a note to this page

login
  Recognizing User Input  |  TOC  |  Introduction to Document Navigation  

© 2008 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site