VoiceXML 2.1 Development Guide Home  |  Frameset Home

  tutorial Foreach and Data  |  TOC  |  tutorial Dynamic Grammars  

VoiceXML Tutorial: Using the VoiceXML 2.1 "marktime" Shadow Variable to Simplify Dialog Choices

This month's Voxeo tutorial touches on the usage of the marktime shadow variable which is a new addition to the VoiceXML specification. So what is this shadow variable all about, anyway? Per the specification:

"When these properties are set on the application.lastresult$ object, if an input item (as defined in section 2.3 of [VXML2]) has also been filled and has its shadow variables assigned, the interpreter must also assign markname and marktime shadow variables, the values of which equal the corresponding properties of the application.lastresult$ object."

In layman's terms, marktime allows us to clearly detect where a user barged in on a prompt residing in a voice recognition field. A reco field gets a valid input called  lastresult$.marktime, which will be populated with a time value (in milliseconds) showing us where the user interrupted our <prompt>. Leveraging this shadow variable, we can then assign a universal grammar ("that one," or even a dtmf key) to choose items from a list and then use some simple mathematical calculations to determine which choice the caller wants.


Note: This tutorial assumes that the user is using the Prophecy VoiceXML platform. Other VXML platforms and partitions on the Voxeo network do not support this new addition to the specification.

Note: At the time of this writing, the only shadow variable supported for this new element is lastresult$.marktime. To be clear, lastresult$.markname is not presently supported but will be added in future software releases of the platform.


Step 1: Write the Main VoiceXML File

Our VoiceXML dialog is going to be pretty simple, comparatively speaking: All we really need to get started is a simple form/block/field/filled structure that most of you have worked with already:


<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xmlns:voxeo="http://community.voxeo.com/xmlns/vxml" >
<meta name="maintainer" content="your_email_address@somewhere.com"/>

  <form>

  <block>
  <voxeo:recordcall value="100" info="marktest"/>
  </block>

  <field name="LenandBarrys">

  <prompt bargein="false">
    Welcome to Len and Barry's Ice Cream Ordering System.
    You can select from the following menu your favorite ice cream of the day.
    At any time you can say "that one."
  </prompt>

  <prompt>
    <break time="1000ms"/>
    Cocoa Banana Cabana
    <break time="2500ms"/>

    Banana Caramel Crunch
    <break time="2500ms"/>

    Chocolate Devotion
    <break time="2500ms"/>

    German Chocolate
    <break time="1000ms"/>
  </prompt>

  <grammar mode="voice" root="LenBarry">
    <rule id="LenBarry" scope="public">
      <one-of>
      <item>that one</item>
      </one-of>
    </rule>
    </grammar>

      <filled>
      </filled>

    </field>

  </form>

</vxml>


Nothing too fancy in the code above. We have a very basic field, containing a very basic SRGS grammar with a single choice defined. We learned all about this stuff in VoiceXML kindergarden, right? There are a few things that savvy readers will notice about our starting snippet. First, we have defined a <voxeo:recordcall> element in our <block>. Second, we have defined a Voxeo XML namespace within the <vxml> element. These are important for our next step in the tutorial: In order to effectively set up demarcations for our "universal choice" grammar, we will need to determine where the different choices of ice cream are output and how much time they take to render to the caller. More importantly, you'll note that we have some pauses added after each flavor of ice cream listed in our TTS. This allows our caller a "window" to declare their choice and have the system determine what choice they really wanted.


Step 2: Fun with Editing Utilities

With this basic code in place, we will now map our application and give it a test call. I know, it's incomplete at this moment, but we need to make an assessment as to where we can "bookmark" our choices on a visual timeline that is representative of the audio. We now place a test call to the app as it currently stands and stay totally quiet; putting the phone or mic on "mute" helps us out quite a bit. Once the default "noinput" prompt is played, we can hang up and grab the audio file that is generated from the <voxeo:recordcall> element.


If you are running Prophecy on the hosted network, then you can find your audio file uploaded to your Voxeo File Manager, within the 'root/recordings' directory. Your filename will look something like this:

123-123-123abc123abc123abc123abc12-0-marktest.wav


If you are running Prophecy locally, then the audio from the <voxeo:recordcall> element will be dumped into the "C:\Program Files\Voxeo\www\MRCP\Recordings" directory, and will look something like this.

1-1-71ee1fce0f3d516ed2c3beb139ad4609-0-marktest.wav

Now the fun begins, as promised. You will next want to grab a free copy of Goldwave or Audacity (OS X), which are highly functional audio editing utilities. Once you have this downloaded and installed, you will then want to grab the audio recording generated by the <voxeo:recordcall> and drag it into the window. This will give us a tidy representation of where the audio is output and how much time there is between the rendering of the different ice cream choices. An annotated screenshot of what this will look like is found below for reference:



From the annotations, we can see several distinctive areas of text-to-speech output that occurred when we queried the caller, each followed by some user-defined pauses. Checking the 'timeline' at the bottom of our window, we can then denote some timing "bookmarks":




Step 3: Timing is Everything

Our diagram and grocery list of timing values above give us several "windows" that are open for input from our callers where we will be able to use some conditional statements to determine what the caller chose. As we are filtering caller input based on the point in time where it was uttered, we can use some simple conditional statements to determine which flavor our caller chose. One thing to keep in mind is that the code does not start the timer for the mark element until the initial prompt is read. Therefore, we can forget about the initial prompt, add the pause times and the times it takes to render the message, and come up with 6700ms. Following this logic, we then start putting in place our conditional logic:


  <if cond="LenandBarrys$.marktime &lt;= 6700">
    <!-- USER MUST HAVE CHOSEN "Cocoa Banana Cabana" -->



For the rest of our choices, we base our logic on the starting point of the choice in question and the endpoint of the <break> tag immediately following it. To find the times, we just keep adding the pause and render times to the time above. Remember, greater-than and less-than values need to be encoded, as the "<" and ">" brackets would cause us a parse error. As such, they need to be represented with "&gt;" and "&lt;" within the XML file:



<filled>
  <log expr="'****** LenandBarrys$.marktime' + LenandBarrys$.marktime + '********'" />

  <if cond="LenandBarrys$.marktime &lt;=6700">
  <prompt>You chose Cocoa Banana Cabana.</prompt>

  <elseif cond="LenandBarrys$.marktime &gt;=6700 && LenandBarrys$.marktime &lt;=12000" />
    <prompt>You chose Banana Caramel Crunch.</prompt>

  <elseif cond="LenandBarrys$.marktime &gt;=12000 && LenandBarrys$.marktime &lt;=16000" />
    <prompt>You chose Chocolate Devotion.</prompt>

  <else />
    <prompt>You chose German Chocolate.</prompt>

  </if>
  </filled>



You'll note that we also added the inevitable <log> statement, as it is going to give us some external validation on our math skills and show the actual marktime value for us in the Voxeo debugger. And we always keep the debugger open when making test calls, right? Right?


Step 4: Code Complete!

Ahh yes, "code complete." This is a phrase that is, to any developer, a joyous thing to hear. All that remains now is to add some basic event handling (for our OOGNI events) and we are in a state of development Nirvana. Picture Siddhartha himself with a day off work and you'll get the picture. Let's bask in the enlightenment of our fully developed, code-complete application:


<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xmlns:voxeo="http://community.voxeo.com/xmlns/vxml">

<meta name="maintainer" content="your_email_address@somewhere.com"/>

  <form>

  <block>
  <voxeo:recordcall value="100" info="marktest"/>
  </block>

  <field name="LenandBarrys">

  <prompt bargein="false">
    Welcome to Len and Barry's Ice Cream ordering system.
    You can select from the following menu your favorite ice cream of the day.
    At any time you can say "that one."
  </prompt>


  <prompt>
    <break time="1000ms"/>

    Cocoa Banana Cabana
    <break time="2500ms"/>

    Banana Caramel Crunch
    <break time="2500ms"/>

    Chocolate Devotion
    <break time="2500ms"/>

    German Chocolate
    <break time="1000ms"/>
  </prompt>


  <grammar mode="voice" root="LenBarry">
    <rule id="LenBarry" scope="public">
      <one-of>
      <item>that one</item>
      </one-of>
    </rule>
    </grammar>


    <catch event="noinput nomatch">
      <log expr="'***** OOGNI event caught: ' + _event"/>
      <prompt>
        I'm sorry, but I didn't quite catch that.
        Let's try again.
      </prompt>
      <reprompt/>
    </catch>

    <filled>
      <log expr="'***** LenandBarrys$.marktime' + LenandBarrys$.marktime + '********'"/>

      <if cond="LenandBarrys$.marktime &lt;=6700" >
        <prompt>
          You chose Cocoa Banana Cabana.
        </prompt>

      <elseif cond="LenandBarrys$.marktime &gt;=6700 &amp;&amp; LenandBarrys$.marktime &lt;=12000"/>
        <prompt>
        You chose Banana Caramel Crunch.
        </prompt>

      <elseif cond="LenandBarrys$.marktime &gt;=12000 &amp;&amp; LenandBarrys$.marktime &lt;=16000"/>
        <prompt>
        You chose Chocolate Devotion.
        </prompt>

        <else/>
          <prompt>
            You chose German Chocolate.
          </prompt>

        </if>
      </filled>

    </field>

  </form>

</vxml>



Step 5: Dial In and Test Out Your Code

All that remains now is to test your code. If you are using the Voxeo hosted platform, simply use the File Manager to upload your code onto our free hosting environment, or you can host it on your own webserver if you like. If you are using a local instance of the Prophecy platform, then you can place the files in your 'www' subdirectory and change your call routing to point to this file.

If you have any difficulties with your hand-typed code from this tutorial, you can always grab the zip file below to use as a comparison.


Download the Code!

  Prophecy source code


  ANNOTATIONS: EXISTING POSTS
ebaker
2/14/2012 12:19 PM (EST)
It seems the milliseconds that the marktimes are compared to are off in the downloadable source code (and less so in the above code).  Examining the recording using Audacity gives me time demarcations for the beginning of each next choice of about 6000ms, 10500ms and 14500ms.  Even if you were to allow "that one" to select the last choice halfway into the prompting of the next one, it would still only be about 6700ms, 11000ms and 15000ms.

Also, there is a simplification for the <if> block in that the "greater than" portions of the <elseif> "cond=" are unnecessary as the preceding conditions have accounted for them.  So, the new <if> block would be (including the updated timings):

        <if cond="LenandBarrys$.marktime &lt;=6000" >
          <prompt>
            You chose Cocoa Banana Cabana.
          </prompt>
        <elseif cond="LenandBarrys$.marktime &lt;=10500"/>
          <prompt>
            You chose Banana Caramel Crunch.
          </prompt>
        <elseif cond="LenandBarrys$.marktime &lt;=14500"/>
          <prompt>
          You chose Chocolate Devotion.
          </prompt>
        <else/>
          <prompt>
            You chose German Chocolate.
          </prompt>
        </if>
MattHenry
2/14/2012 6:29 PM (EST)


Hiya Eric,

Thanks for the heads-up on this. This tutorial was written a few years ago as you my have noticed, and it really could stand to be updated to be more in-line with the new prophecy software versions which have slightly different endpointer settings: I'll make sure that our doc writing team puts this on-deck in the interest of providing clear, accurate sample code for our developers.

Cheers,

~Matthew Henry

login
  tutorial Foreach and Data  |  TOC  |  tutorial Dynamic Grammars  

© 2012 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site