VoiceXML 2.1 Development Guide Home  |  Frameset Home

  tutorial Document Navigation   |  TOC  |  tutorial Call Transfer  

Tutorial: Using Audio Files

This Lesson is based on the things you accomplished in tutorials 1, 2, 3, and 4. If you have not completed those tutorials, you'll need to go through them first.

Step 1: Record Some Audio Files

Prerecorded audio files take more time to setup than just using text-to-speech, but invariably sound more professional and polished than even the best TTS engine. Recording your own audio files is often preferred for commercially deployed applications.

So, in the interests of quality audio output, use your favorite sound recording utility to record two files that say:

You can use the Windows Sound Recorder to do this, as illustrated below. After you have recorded the files, save them in u-law format and call them "helloworld.wav" and "menu.wav". If you're using the Windows Sound Recorder, select "File" "Save as"..., press the "Change..." button, and change the attributes to "8,000 Hz, 8-Bit, Mono", as shown below:






Step 2: Creating an Initial VoiceXML Structure

From our previous tutorials, we now recognize the following structure as a normal starting point:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">


</vxml>



Step 3: Creating Grammar Files

We already know how to make a grammar files, so this should be a piece of cake:

<![CDATA[
[
  [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  }
  [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  }
]]]>


  <link next="">
    <grammar type="text/gsl">
            [(main ?menu)]
      </grammar>
  </link>



Wait a minute, did we make mean to put that question mark ("?") in the grammar file? In fact, we did. That tells VoiceXML that the word is optional for determining matches. Remember, the parentheses tie multiple words together, thus the caller could say "main" or "main menu" and it will recognize the utterance as valid.


Step 4: Creating The Menu

Next we need to make our main menu. We are essentially replacing the text-to-speech portions with references to the audio files we recorded back in step 1. We will now insert our inline grammars and add in our newly recorded audio files:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >

  <link next="#MainMenu">
    <grammar type="text/gsl">
            [(main ?menu)]
      </grammar>
  </link>

  <form id="MainMenu">
    <block>
      <audio src="helloworld.wav"/>
    </block>

    <field name="CatOrDog">
      <audio src="menu.wav"/>
      <grammar type="text/gsl">
        <![CDATA[[
            [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  }
            [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  }
        ]]]>
      </grammar>
    </field>
  </form>


Look how simple that is. One little <audio> with a "src" attribute pointing to the audio file. Now your file will be played over the telephone instead of TTS. Notice that we have inherently scoped our 'Main Menu' grammar to the document level, (via the <link> tag), so that a caller will be able to say "main" at any time in the application in order to be transferred to the main menu. And our grammar for the 'CatOrDog' field has the default scope of 'dialog', so it will only be active in that particular field. Keep in mind, that attempts to explicitly scope grammars contained in either a <link> or <field> element is not permitted.

Now let's fill out the rest of our menu by adding some event handlers, and conditional logic to transition the caller to the appropriate destination based on whichever pet preference they specify:


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">

<property name="universals" value="help"/>

  <link next="#MainMenu">
    <grammar type="text/gsl">
            [(main ?menu)]
      </grammar>
  </link>

  <form id="MainMenu">
    <block>
      <audio src="helloworld.wav"/>
    </block>

    <field name="CatOrDog">
      <audio src="menu.wav"/>

      <grammar type="text/gsl">
      <![CDATA[[
            [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  }
            [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  }
        ]]]>
      </grammar>
      <noinput>
      <prompt>
        I did not hear anything.  Please try again.
      </prompt>
        <reprompt/>
      </noinput>

      <nomatch>
      <prompt>
        I did not recognize that pet choice.  Please try again.
      </prompt>
      <reprompt/>
      </nomatch>

      <help>
      <prompt>
        Just say "Cat" or "Dog".
      </prompt>
        <reprompt/>
      </help>


    </field>
    <filled namelist="CatOrDog">
      <if cond="CatOrDog == 'Cat'">
        <goto next="#Cat"/>
      <elseif cond="CatOrDog == 'Dog'"/>
        <goto next="#Dog"/>
      </if>
    </filled>
  </form>
</vxml>


Whoa there. What is this <help> tag all about? Well, VoiceXML defaults to having the verbally spoken word "help" as always generating a match, but you need to use the <help> tag to incorporate it into your script. Like <nomatch> and <noinput>, <help> is an element that belongs in most well-coded voice recognition menus. When the caller says "help", the VoiceXML interpreter will execute what is inside the <help> segment of code. In this case, we tell them to say "cat" or "dog" and then return to the start of the menu (via the <reprompt> element). You'll also note that we need to explicitly enable the 'help' option via the <property> setting, as this feature is not enabled by default for VoiceXML 2.1 applications.

Now we can finish the rest of our application by inserting the sub-menus:


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">

<property name="universals" value="help"/>

  <link next="#MainMenu">
    <grammar type="text/gsl">
            [(main ?menu)]
      </grammar>
  </link>

  <form id="MainMenu">
    <block>
      <audio src="helloworld.wav"/>
    </block>

    <field name="CatOrDog">
      <audio src="menu.wav"/>
      <grammar type="text/gsl">
        <![CDATA[[
            [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  }
            [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  }
        ]]]>
      </grammar>

      <noinput>
      <prompt>
        I did not hear anything.  Please try again.
      </prompt>
        <reprompt/>
      </noinput>

      <nomatch>
      <prompt>
        I did not recognize that pet choice.  Please try again.
      </prompt>
      <reprompt/>
      </nomatch>

      <help>
      <prompt>
        Just say "Cat" or "Dog".
      </prompt>
    <reprompt/>
      </help>
    </field>
    <filled namelist="CatOrDog">
      <if cond="CatOrDog == 'Cat'">
        <goto next="#Cat"/>
      <elseif cond="CatOrDog == 'Dog'"/>
        <goto next="#Dog"/>
      </if>
    </filled>
  </form>

  <form id="Cat">
    <field name="BackToMain">
      Cats rule.  They are the superior lifeform on earth.
      If you wish to try again, please say "Main".
    </field>
    <filled namelist="BackToMain">
    </filled>
  </form>

  <form id="Dog">
    <field name="BackToMain">
      Dogs.  One wonders how they became so popular...
      If you wish to try again, please say "Main".
    </field>
    <filled namelist="BackToMain">
    </filled>
  </form>
</vxml>



Step 5: Upload, and Try It Out

All that remains now is to upload our new hello world VoiceXML application. In keeping with our naming scheme, we might save this file as http://www.myserver.com/helloworld/helloworld5.xml.

Now you can provision a number to your simple menu application with built in help commands and call the associated number to hear the results. This time, you can listen to prerecorded audio files instead of text-to-speech.

Download the Code!

  Source code


What we covered:




  ANNOTATIONS: EXISTING POSTS
dgeiregat
10/13/2004 6:19 AM (EDT)
Hello,

I have 2 remarks for the coding example.

1) a property element needs to be added in order for the 'help' utterance to be recognized: <property name="universals" value="all"/>. I added it to the field CatOrDog, not at the global level. And it works!

2) Same remark as my 2nd remark on the CallFlow Tutorial: remove the condition on each of the forms <form id="Cat"/"Dog">. They are not needed.

Regards,

Dirk
Michael.Book
10/13/2004 11:10 AM (EDT)
Howdy Dirk,

Nice catch!  Thank you for your valuable feedback...

I have corrected the tutorial code.  Please allow a couple of days for the changes to be pushed out the live doc-set...


Thanks Again,

~ Michael
neelima.ch
1/31/2006 2:16 AM (EST)
Hello,
My doubt is, here in the following code:


<grammar type="text/gsl">
      <![CDATA[[
            [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  } 
            [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  } 
        ]]]>
      </grammar>

What I want to know here is, when we give the voice input, how is that input stored. I mean which is holding the value or the voice input here, is it CatOrDog???or anything else. and also, does "Cat" and "Dog" refer to the respective forms which hold the response for the given voice input????

Regards,
Neelima.
raja_emmadi
1/31/2006 9:36 AM (EST)
Hello Neelima,

According to my understanding

[cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  }

the voice input can be : "cat", "kitty", "kitten", "meow", or "cat person"

If the voice input is from any of these the value 'Cat' is assigned to field "CatOrDog".

But I am not sure.

-- Rajesh
rajesh_thota_kumar@yahoo.com
Michael.Book
2/1/2006 2:06 PM (EST)
Howdy All,

Regarding the value assigned to a field name from a given recognition, there are some shadow variables that prove very useful when trying to visualize what's going on "under the covers" and when troubleshooting recognition related issues - 'confidence', 'inputmode', 'interpretation', and 'utterance'.  These are further explained at 'http://docs.voxeo.com/voicexml/2.0/mot_sessionvars.htm#start', and I strongly recommend including the following log lines in all <filled> blocks during development.  These will give instant insight as to what exactly was recognized and what value will be assigned to the field name.
__________________________

<filled>
  <log expr="'*** [field name] VALUE = ' + [field name] + ' ***'"/>
  <log expr="'*** INPUTMODE = ' + [field name]$.inputmode + ' ***'"/>
  <log expr="'*** CONFIDENCE = ' + [field name]$.confidence + ' ***'"/>
  <log expr="'*** UTTERANCE = ' + [field name]$.utterance + ' ***'"/>
  <log expr="'*** INTERPRETATION = ' + [field name]$.interpretation.[slot name] + ' ***'"/>
</filled>
__________________________

Now, the shadow variables most relevant to this specific thread are 'utterance' and 'interpretation'.  The utterance is the word or phrase, listed in an active grammar, that the end-user actually said (that was "matched").  The interpretation is the grammar slot value for said utterance (if there is one).  For instance:
__________________________

<grammar scope="document" type="text/gsl">
  <![CDATA[ 
    .MYRULE
      [
        yes {<mySlot "affirmative">}
      ]
  ]]>
</grammar>
__________________________
 
Given the above grammar, "yes" would be the utterance, and "affirmative" would be the interpretation of the slot named 'mySlot'.  So which gets assigned to the field name?

- If there is not a slot/interpretation value available, the utterance value will be assigned to the field name.  Example:
__________________________

<field name="champs">
  <grammar scope="document" type="text/gsl">
    [ (?seattle ?(sea hawks)) (?pittsburgh ?steelers) ]
  </grammar>

  <prompt>
    Who will win super bowl forty.
  </prompt>

  <filled>
    <log expr="'*** champs VALUE = ' + champs + ' ***'"/>
    <log expr="'*** UTTERANCE = ' + champs$.utterance + ' ***'"/>
    <log expr="'*** INTERPRETATION = ' + champs$.interpretation + ' ***'"/>
  </filled>
</field>
__________________________

- If a slot/interpretation value is available, but we do not explicitly specify the available slot's name in our <field> tag, the interpreter will assign the utterance value to the field name, *unless* a slot is present with the same name as the field itself.  Example:
__________________________

<field name="champs">
  <grammar scope="document" type="text/gsl">
    <![CDATA[ 
      .MYRULE
        [
          seattle {<champs "seahawks">}
          pittsburgh {<champs "steelers">}
        ]
    ]]>
  </grammar>

  <prompt>
    Who will win super bowl forty.
    Seattle or Pittsburgh.
  </prompt>

  <filled>
    <log expr="'*** champs VALUE = ' + champs + ' ***'"/>
    <log expr="'*** UTTERANCE = ' + champs$.utterance + ' ***'"/>
    <log expr="'*** INTERPRETATION = ' + champs$.interpretation,champs + ' ***'"/>
  </filled>
</field>
__________________________

- If a slot/interpretation value is available, and we have indeed explicitly specified that specific slot name in our <field> tag, that slot value will be assigned to the field name.  Example:
__________________________

<field name="champs" slot="mySlot">
  <grammar scope="document" type="text/gsl">
    <![CDATA[ 
      .MYRULE
        [
          seattle {<mySlot "seahawks">}
          pittsburgh {<mySlot "steelers">}
        ]
    ]]>
  </grammar>

  <prompt>
    Who will win super bowl forty.
    Seattle or Pittsburgh.
  </prompt>

  <filled>
    <log expr="'*** champs VALUE = ' + champs + ' ***'"/>
    <log expr="'*** UTTERANCE = ' + champs$.utterance + ' ***'"/>
    <log expr="'*** INTERPRETATION = ' + champs$.interpretation.mySlot + ' ***'"/>
  </filled>
</field>
__________________________


I hope these examples help to illustrate how field values are "filled."  Play around with them a bit; you'll see what I am talking about...


Have Fun,

~ Michael
pacific_is_me
2/6/2006 2:53 AM (EST)
How can I let vxml determine my .wav file, so it won't read my text by TTS engine ? Example: I have a text "Hello, World", and a "Hello World" wave file. If I make a control panel site and upload my wave file, the vxml file will process this wave file instead of reading the text. Thank you and sorry if my english is not good.
MattHenry
2/6/2006 11:35 AM (EST)

Hi there,

If you are looking to simply output an audio file, and keep TTS as a backup in the event that your wav file returns a '404' error, then you will want to use the following syntax:

<form>
<block>
  <prompt>
  <audio src="helloworld.wav">
  Hello world
  </audio>
  </prompt>
</block>
</form>

You may also find it useful to review the following links for additional clarification:

http://docs.voxeo.com/voicexml/2.0/frame.jsp?page=audioformats.htm
http://docs.voxeo.com/voicexml/2.0/audio.htm
http://docs.voxeo.com/voicexml/2.0/prompt.htm

~Matt
movomobile_prod
3/8/2006 10:54 AM (EST)
If I wanted to use an audio file (nomatch.wav) instead of TTS for:

<nomatch>
  <prompt>
    I did not recognize that pet choice.  Please try again.
  </prompt>
  <reprompt/>
</nomatch>

How could I go about doing this if the <audio> tag is not allowed inside a field element.  I thought about redirecting to another form but I didn't know if this was the correct method.
mikethompson
3/8/2006 1:11 PM (EST)
Hello there,

The audio tag actually *is* allowed with <field> as the direct parent element.  If you check out our documentation for the <audio> element here:

http://docs.voxeo.com/voicexml/2.0/frame.jsp?page=audio.htm

You will notice that there is a list of the specific child and parent elements for <audio>.  Notice <field> and <prompt> are both legitimate parents of <audio>.  In short, you could have your code snippet look as follows:

<nomatch>
  <prompt>
  <audio src="nomatch.wav">
    I did not recognize that pet choice.  Please try again.
  </audio>
  </prompt>
  <reprompt/>
</nomatch>

Hope this helps,
Mike Thompson
Voxeo Extreme Support
yousafriaz
4/24/2007 11:34 PM (EDT)
if i use coldfusion embedd with voicehml and save the file lets say at http://someip/application.cfm and want to use audio files how could i posibbly do that ? audio files just to greet / announcement not for holding values or any thing ?
jbassett
4/25/2007 4:29 AM (EDT)
Hello,

There would be no difference in the way you call the audio file. As long as you have valid XML code embedded in your document, you would call an audio file the same way you would in an .XML file.

Let me know if I did not understand you correcntly.

Jesse Bassett
Voxeo Support
yousafriaz
4/25/2007 7:47 PM (EDT)
jesse thanks for reply .

i am testing some application which is hosted at domain other then voxeo domain and i mentioned this in ACCOUNT > APPLICAION > SITE URL . http://mydomain.com/voice/test.cfm

now here is the code for the test.cfm (coldfusion) file

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1">

<var name="CallerID" expr="session.callerid"/>

<form id="form_Main">
<field name="digit1" type="digits">

  <block>
      <prompt>
  <audio src="GTracking.wav"/>
      </prompt>

</block>

<filled>
  <log expr="'*** FILLED ***'"/>
  <log expr="'*** digit1 =' + digit1 + '***'"/>
  <submit next="AddDigits.cfm" method="get" namelist="digit1 CallerID"/>
</filled>
</field>
</form>
</vxml>

now i am confused how to call this .wav file . i have uploaded it to login space with voxeo also and uploaded it to the directory also where my .cfm files is located . but still when i am running application could not find it ,

VoxeoTony
4/26/2007 12:01 AM (EDT)
Hello,

In looking at your question, we have to ask if you are looking for help with the ColdFusion portions of the code, or with locating your Wav file.  If you are asking how to get the audio file saved to your server, then you use the submit tag to send namelist values to your ColdFusion page and then use cfset to assign the values to a CF variable.  You may also consider FTPing the file to your server as long as you have permissions to do so.

If you would like more assistance with finding your wav files, we suggest setting up an account ticket as we would need log to assist you, and we prefer not sending that information on the public pages.

Tony~
Denell
12/6/2007 5:33 PM (EST)
In this code is the ** namelist="CatOrDog" ** needed? Can't the if statement be executed without that part?



<filled namelist="CatOrDog">
      <if cond="CatOrDog == 'Cat'">
        <goto next="#Cat"/>
      <elseif cond="CatOrDog == 'Dog'"/>
        <goto next="#Dog"/>
      </if>
</filled>
voxeojeremy
12/6/2007 7:56 PM (EST)
Hey there,


Yes, it could be executed without the namelist attribute, because there is only one field.  It is definitely a best practice to use namelist, though.  Please let us know if you have any other questions.  Happy testing!


Regards,

Jeremy McCall
Voxeo Extreme Support
group_7
1/2/2009 5:52 AM (EST)
i need to introduce a passkey in my applications
example:  a caller would have to say his username and password
before the applications responds
please how to i acheive this


i know i would need a db for allowed usernames and passwords
if so how does my application check this list.
thanks
voxeoAlexBring
1/2/2009 6:36 AM (EST)
Hello,
    If you would like to implement a name and password system, you could us a [url=http://docs.voxeo.com/voicexml/2.0/grammar.htm]grammar[/url], along with a [code]filled[/code], and a [url=http://docs.voxeo.com/voicexml/2.0/if.htm]conditional if[/url] statement.

You could have a grammar for the name and a grammar for the password, once the caller has spoken the credentials you can place an [code]if[/code] with the filled and have it check the password to see if it is correct.

An example of what you may want to achieve:
[code]
<form>
    <field name = "name">
          <prompt> What is the name? </prompt>
          <grammar type="application/grammar+xml" src= "name.grxml"/>
    </field>
    <field name = "pin" type="digits">
          <prompt> What is your pin? </prompt>
    </field>
    <filled>
          <if cond = "pin != '1234'">
                <prompt> That is the wrong pin </prompt>
                <reprompt/>
                <clear namelist = "pin"/>
            </if>
      </filled>
</form>
[/code]

Please let us know if you have any further questions or concerns as we will be ready to assist you.

Regards,

Alex Bring
Voxeo Support
group_7
1/6/2009 9:06 AM (EST)
....looked at the syntax of the code, i haven't tried it yet, but what about the database, that will enable the if condtion, i guess i would have to upload it as well, but which database format is compatible with voxeo applications.
am just going through the gxml tutorial, i wld prefer if a working demo can be illustrated in full

thanks
group_7
1/6/2009 9:06 AM (EST)
....looked at the syntax of the code, i haven't tried it yet, but what about the database, that will enable the if condtion, i guess i would have to upload it as well, but which database format is compatible with voxeo applications.
am just going through the grxml tutorial, i wld prefer if a working demo can be illustrated in full

thanks
voxeojeremyr
1/6/2009 9:53 AM (EST)
Hi,

Because VXML is built more as a presentation layer, it does not have any native access to databases.  What you would have to use is some server side scripting like JSP or PHP to access the Database and then return the values that you would need.

Because of the many different flavors of server side language we do not have any exact examples of how this is done, but you should be able to google your code of choice to find out about doing database retrievals.

You can find more information about server side scripting here:
http://docs.voxeo.com/voicexml/2.0/intro_serverside.htm

Thanks,
Jeremy Richmond
Voxeo Support

login
  tutorial Document Navigation   |  TOC  |  tutorial Call Transfer  

© 2012 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site