VoiceXML 2.1 Development Guide Home  |  Frameset Home

  The Voxeo Audio Library  |  TOC  |  Recording Audio  

Audio Tools

If you are not certain how to go about getting your files in the proper format, Windows users can use the Sound Recorder to check the file and make changes to it; simply open the file, then select "File" "Save as"..., press the "Change..." button, and change the attributes to "8,000 Hz, 8 Bit, Mono" and you should be ready to launch. However, for the folks that prefer more powerful sound editing software, there are a number of freeware programs that can fit the bill. A listing, (by no means complete), of some good offerings are as follows:

Audio Links

If you are looking for professional voice talent for your application, and you don't see files that meet your specific needs in our download library, you would do well to contact one of these fine companies:

Converting user input for audio playback

The topic of taking numeric input from a caller is pretty common for IVR applications: A great majority of applications require a user access code, and a PIN to be entered prior to allowing users to access the "guts" of the IVR application. What can be less-than-elegant is the scenario where the application leverages professionally recorded audio prompts, and we then play back the user input string using the text-to-speech engine. Don't get me wrong, there are valid uses for using TTS: Scenarios where it isn't economically feasible to record all possibilities, such as when our app renders firstname/lastname data back to the caller would result in hundreds of thousands of recordings, and consequently, a voice studio that makes a killing recording all these for your application. In this case, it's a no-brainer to use TTS. But in cases where we are dealing with a reasonably static category, pre-recorded audio is always going to sound better, especially when preceded by a confirmation prompt whose "persona" is the same as the character-by-character data playback. The audible change from smooth, pre-recorded audio to understandable, but still inhuman-sounding TTS can be jarring in context.

To illustrate how developers can easily get around this, we will present the common scenario mentioned above: The IVR asks for a PIN code, the user enters a digit input, and we then confirm to the caller that the recognition result was correct. But in this case, we will be using pre-recorded audio in every step, and not a bit of TTS. Note that all the audio used in this code example is 100% free for develo0per use, and can be downloaded in our pre-recorded audio library.

The VoiceXML TTS2Wav.xml file


<?xml version="1.0" encoding="UTF-8"?>

<vxml version="2.0">
<form>

<field name="myField" type="digits?maxlength=3">
  <prompt>
    <audio src="enterPin.wav">
      please enter your pin code followed by the pound key.
    </audio>
  </prompt>

  <filled>
  <log expr="'*** myField = ' + myField"/>
  <prompt>
    <audio src="message.wav">
      you have a message from
    </audio>
    <break time="1s"/>
  </prompt>

 


  <foreach item="convert" array="arraymyField">
    <log expr="'*** ' + convert + '.wav'"/>
    <prompt bargein="false">

      <audio expr="convert + '.wav'">
        <value expr="convert"/>
      </audio>
        <break/>
      </prompt>
  </foreach>


  </filled>
</field>
</form>

</vxml>


Yep, it's just that simple. The ECMAScript above can easily be modified to playback alphabetical characters, date or time values, or any number of outputs that one can think of. Try your mettle, and see what you can cook up when using this as a starting point!

Download the Code!

  Download the source code



  ANNOTATIONS: EXISTING POSTS
awirtz
12/3/2004 1:08 PM (EST)
SoX is a very useful command-line tool capable of converting between numerous audio file formats and applying a number of effects.  I use it to downsample 48kHz/16bit/PCM Wav files into 8kHz/8bit/u-Law Wav files.
Versions are available for most modern operating systems from the following site:
http://sox.sf.net

(NB: SoX currently misinterprets volume adjustment values specified in dB.  For some reason it is convinced that +6 dB == 200%.  Easy enough to work around once you know it is there, but an odd quirk when unexpected. Just feed it twice the dB value you truly mean.)

The following example command line will take an input audio file (the format can be autodeteced for wav files, so there is no need to specify the input bitrate, etc.), resample the audio using a polyphase algorithm into 8Khz/8bit/mono/u-Law (much better than most other conversion techniques out there), adjust the gain level up by 5.5dB, and output it to another wav file:

sox original.wav -r8000 -b -c1 -U new.wav polyphase vol +11 dB
awirtz
8/30/2005 2:38 PM (EDT)
Audacity is recording, mixing, and editing tool, with many of the same features as professional packages like CakeWalk or Digital Performer, but completely free and open-source.
Versions are available for Windows, Linux and Mac OS/X
http://audacity.sf.net
PremiseScienceMuseum
8/30/2006 5:19 PM (EDT)
Audacity is great for recording, editing, and mixing your files.  Alas, it  does not export u-Law encoded .wav files.  You can only use the linear PCM ecoder.  This causes lots of pops and clicks in the background of your file.  You will need some sort of conversion utility to change your high quality wavs to the u-Law encoded 8bit wavs.
chrishansen
8/20/2008 8:47 PM (EDT)
Audacity can export to 8-bit u-law by selecting "WAV, AIFF, and other uncompressed types" in the Export File dialog and then pressing the Options button to select the Format of "AU (Sun/Next 8-bit u-law)".

This is working fine for me with Audacity 1.3.5 and they sound fine.
jdyer
8/20/2008 8:51 PM (EDT)
Thank you very much for the confirmation, we hope your development is progressing smoothly!  Please do not hesitate to let us know if we can assist!

Regards,

John
Customer Engineer
declanh
3/3/2009 12:24 PM (EST)
Hi,
I'm using a Foreach to play back a Credit Card number using .wav files, but I'm having a problem with years that get returned in the format Month Year e.g. '09 2011' (September 2009).
Do you have an example of how to read that out as "O nine, two thousand and nine" using wav. files?
Thanks,
Declan
MattHenry
3/3/2009 12:54 PM (EST)


Hello Declan,

I don't happen to have any pre-built ECMAScript for this particular scenario, but I think that I can point you in the right direction. I would suggest that you start out by doing the following:

1 - Use the "credit card expiration" downloadable grammar as a starting point:

http://docs.voxeo.com/opensource/credit_card_expiration.grammar

2 - Within the VXML make sure that you are accessing both slots: month and year

3 - Write a "pre-processer" ECMAScript function that checks the first character of a given array value by checking the string length, and then doing a substring operation: If the value starts with "0", then break up the value prior to rendering it in the <foreach> loop.

Example: "06" becomes "0" and "6"

4 - Reinsert these two values into the array, overwriting the original.

Example: "[06, 2010]" becomes [0, 6, 2010]

5 - From there, you can edit the existing ECMASCript to break up the data in a slightly different manner, based on conditional statements that key off the string length.

Example:

A)If the string is 4 characters in length, AND the first two characters are "20", then render "2000.wav"

B) If the last two characters start with a 1, then render "1+[next entry].wav, so that a value of "16" becomes "16.wav"

C) If the value is "0" then render "0.wav" + [lastcharacter].wav], so that "02" becomes "0.wav, 2.wav"

D) If the last two characters dont start with a "0" or "1", then add a "0" to the first character in the string, and play the last character as-is, so that "31" becomes "30.wav, 1.wav"

I hope that this helps get you started!

~Matthew Henry
declanh
3/5/2009 6:27 AM (EST)
Hi Matthew,
Thanks for that, I was able to get that working.
In the interests of sharing, here is the Javascript.
It presumes an input of "month comma year" e.g. 09,2009 or 11,2011. in a variable "str"
<script>
      <![CDATA[
DigitsArray = str.split(',');
var month;
var year;
month = DigitsArray[0];
year = DigitsArray[1];
if (month.substr(0,1) == '0')
  {
DigitsArray.splice(0,1,(month.substr(0,1)));
DigitsArray.splice(1,0,(month.substr(1,1)));

DigitsArray.splice(2,0,(year.substr(0,2)) + '00');
DigitsArray.splice(3,1,(year.substr(2,2)));
}
else
{
DigitsArray.splice(1,0,(year.substr(0,2)) + '00');
DigitsArray.splice(2,1,(year.substr(2,2)));
}
if (year.substr(2,1) == '0')
{
DigitsArray.pop();
DigitsArray.push(year.substr(3,1));
}

      ]]>
</script>

Regards,
Declan
voxeoJeffK
3/5/2009 6:35 AM (EST)
Hello Declan,

Thank you very much posting that back for the benefit of all. Please feel free to ask if you have any other questions in the course of your development. We'll be glad to help.

Regards,
Jeff Kustermann
Voxeo Support
sambhav
8/13/2009 1:08 PM (EDT)
Hello,

This may be wrong place to ask this Q, pardon me for that if.

How I can input a sentence from user on one go?
VoxeoDustin
8/13/2009 1:17 PM (EDT)
Hello,

I'm not sure I completely understand your question. Would you mind clarifying a little on what you're trying to accomplish?

Regards,
Dustin Hayre
Customer Support Engineer II
Voxeo Support
sambhav
8/31/2009 12:42 PM (EDT)
Hi

I want to take input a sentence in one go.

For example the user says :- "my name is sam and  i live in india".

So i should be able to collect this whole sentence in a variable or something so that i can process/parse or perform some action on that complete sentence.

Thanks
VoxeoDustin
8/31/2009 12:51 PM (EDT)
Hello,

There are a number of ways of doing this depending on how you'd like to collect the information. If you're simply looking for that exact sentence, you can place that as an utterance in your grammar:

<grammar type="application/grammar+xml" xml:lang="en-us" root="MAIN">
  <rule id="MAIN">
    <one-of>
      <item> my name is sam and  i live in india </item>
      <item> my name is roger and  i live in cleveland </item>
    </one-of>
  </rule>
</grammar>

Then you could retrieve the response with the variable lastresult$.interpretation

However, if you're looking for input that is not necessarily a static sentence like the above, you may need to implement a <record> step and transcribe the input later.

http://www.vxml.org/record.htm

Another option, if you're looking for name and address recognition are the use of Targus grammar, which use a Targus subdialog to verify name and address input. If this is something you're interested in, our sales team can get you more information on the costs associated with this. You can reach them at sales@voxeo.com.

Let me know if we can be of further assistance.

Regards,
Dustin Hayre
Customer Support Engineer II
Voxeo Support

login
  The Voxeo Audio Library  |  TOC  |  Recording Audio  

© 2012 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site