CallXML 2.0 Development GuideHome  |  Frameset Home

  Logger Key -- CallXML  |  TOC  |  G: Rhetorical TTS Guide  
This documentation is for CallXML 2, which has been superceded by CallXML 3. The CallXML 2 language is not being updated any longer. CallXML 3, however, has many new features and is actively being enhanced. If you're writing a new CallXML application, you should use CallXML 3. Click here for the CallXML 3.0 documentation.

Voxeo's Enhanced Text-To-Speech Guide

If you are looking to further refine your TTS output, this guide will give you all the knowledge that you need to fine-tune your applications for more naturalistic voice prompts. Voxeo currently uses the Speechify 2.0 engine for the Staging and Production environments, and this section will show you how to use TTS markup to add the final polish to your applications.


TTS Voices

Currently, we offer not only US-English voices, but also French and Spanish, (female), should you desire to deploy a multi-lingual application. The current options available are:

To change the default voice in CallXML, you will use the <text voice=”Name”> property as follows:

<text voice="English-Male2"> This is my English male voice </text>
<wait value="200ms"/>
<text voice="Spanish-Female1"> Escuche mi voz atractiva  </text>




Language-Specific Character Encoding

You should be aware that when using non-English TTS, some language-specific characters will cause a parse error, such as the following:

'¿','é', 'ó' 

The simplest way to get around this in multilingual applications is to specify the xml document encoding that will allow these characters to parse:


<?xml version="1.0" encoding="iso-8859-1"?>
<callxml version="2.0">
<block>
  <text voice="Spanish-Female1">
        ¿Qué sucedió a mis pantalones?
  </text>
</block>
</callxml>




Embedded Tags

Embedded tags are special codes that can be inserted into input text to customize the TTS behavior in a variety of ways. You can use embedded tags to:
MarkupExplanation
\!p300Synthesize a pause of 300 ms.


MarkupExplanation
\!tsc Pronounce all characters individually by name.



Separate a tag from any preceding input by at least one unit of white space. A tag cannot be followed immediately by an alphanumeric character, though most tags may be followed immediately by punctuation, parentheses, and similar symbols.

Important: There must be a space before and after the tag in order for it to work properly.  If you follow the tag immediately by a carriage return or line feed, the tag will not work and cause the entire sentence to fail.  Put at least one space after that tag before that line feed.

Any sequence of non-white-space characters beginning with the prefix \! that is not a recognizable TTS tag is ignored in the speech output.


Creating Pauses

Use a pause tag to create a pause of a particular duration at a specified point in the speech output.

Pause Tag
MarkupExplanation
\!pN  Create a pause N milliseconds long.


The maximum value of the integer argument in a pause tag is 32767. To create a longer pause, use a series of consecutive pause tags.  The smallest possible value of the pause tag is 1.

The behavior produced by the pause tag varies depending on its location in the text.  When a pause tag is placed immediately before a punctuation mark, the standard pause duration triggered by the punctuation is replaced by the pause duration specified in the tag. In other locations, the tag creates an additional pause.

For example, sentence (a) has a default 150 ms pause at the comma. Sentences (b) and (c) replace the default pause with a longer and shorter pause, respectively, while sentence (d) inserts an additional pause of 300 ms, resulting in a total pause duration of 450 ms. In sentence (e) a 25 ms pause is inserted in a location where no pause would otherwise occur.

If a token begins with “http” or “www”, then up until the next white-space, the following symbols are expanded.  The last / in a URL is omitted.

InputPronounciation
Tom is a good swimmer, because he took lessons as a child.Tom is a good swimmer (150 ms pause), because he took lessons as a child.



InputPronounciation
Tom is a good swimmer \!p300, because he took lessons as a child. Tom is a good swimmer (300 ms pause), because he took lessons as a child.



InputPronounciation
Tom is a good swimmer \!p100, because he took lessons as a child.Tom is a good swimmer (100 ms pause), because he took lessons as a child.



InputPronounciation
Tom is a good swimmer, \!p300 because he took lessons as a child.Tom is a good swimmer (150 ms pause), (300 ms pause) because he took lessons as a child.



InputPronounciation
Tom is a good swimmer, because he took lessons \!p25 as a child.
Tom is a good swimmer (150 ms pause), because he took lessons (25 ms pause) as a child.

 

Customizing pronunciations

In certain cases, you may want to specify a pronunciation that differs from the one generated by the internal text analysis rules. The tags described in this section are used to modify the default text processing behavior in a variety of ways:
Such tags are described in the Language Supplement.


Character Spellout Modes

The following tags are used to trigger character-by-character spellout of subsequent text.

MarkupExplanation
\!ts0 Default mode.



MarkupExplanation
\!tsc All-character spellout mode: pronounce all characters individually by name



MarkupExplanation
\!tsaAlphanumeric spellout mode: pronounce only alphanumeric characters by name.



MarkupExplanation
\!tsrRadio spellout mode: like alphanumeric mode, but alphabetic characters are spelled out according to the International Radio Alphabet.  Supported in English only.

 

For example:

InputPronounciation
My account number is \!tsa 487-B12.My account number is four eight seven bee one two.



InputPronounciation
My account number is \!tsc 487-B12.My account number is four eight seven dash bee one two.



InputPronounciation
The last name is spelled \!tsr Dvorak.The last name is spelled delta victor oscar romeo alpha kilo.



Spellout modes remain in effect until they are turned off by the tag \!ts0, which restores the engine to its default processing mode. For example:

InputPronounciation
The composer’s name is spelled \!tsa Franz Liszt \!ts0 and pronounced "Franz Liszt."The composer’s name is spelled eff ar ay en zee ell aye ess zee tee and pronounced Franz Liszt.



There are many words which are spelled out as a result of the default text processing behavior. In such cases, the use of a spellout mode tag may have no additional effect. For example:


InputPronounciation
He works for either the CIA or the FBI.He works for either the cee aye ay or the eff bee aye.



InputPronounciation
He works for either the \!tsa CIA \!ts0 or the \!tsa FBI \!ts0. He works for either the cee aye ay or the eff bee aye.



In alphanumeric and radio spellout modes, punctuation is interpreted exactly as it is in the default (non-spellout) mode; i.e., in most contexts it triggers a phrase break. In all-character spellout mode, punctuation is spelled out like any other character, rather than being interpreted as punctuation. Speech output continues without pause until the mode is turned off. For example:


InputPronounciation
\!tsa 3, 2, 1 \!ts0 blastoff. three, two, one, blastoff



InputPronounciation
\!tsc 3, 2, 1 \!ts0, blastoff.three comma two comma one, blastoff



Pronouncing Numbers and Years

In ordinary English text, a four digit numeric sequence with no internal commas or trailing decimal digits, like 1984, can be interpreted either as a year (nineteen eighty four) or as a quantity (one thousand nine hundred eighty four). The engine applies the year interpretation by default, as in:


InputPronounciation
He was born in May 1945.He was born in May nineteen forty five.
 


To override the default year interpretation, and to restore the default interpretation, use the following tags:

Year Mode TagPronounciation
\!ny0Quantity interpretation.



Year Mode TagPronounciation
\!ny1Year interpretation (default).



For example:

InputPronounciation
In May \!ny0 1945 people emigrated.In May one thousand nine hundred forty five people emigrated.



Each tag remains in effect until the interpretation is toggled by the use of the other tag. For example:


InputPronounciation
\!ny0 1945 \!ny1 people emigrated in 1945.One thousand nine hundred forty five people emigrated in nineteen forty five.



Note: These tags have no effect in French or Spanish, where there is no difference in pronunciation between the year and quantity interpretations.


Syllable Boundaries

Use periods to delimit syllables in an SPR in order to enhance readability – they are not required, and do not have any effect on the way the word is syllabified in the speech output. The TTS engine's internal syllabification rules apply as usual to divide the word into syllables.


Syllable Stress

Syllables can be marked for stress with a digit. Use 1 to indicate primary stress and 0 to indicate no stress. Some languages support secondary stress, indicated by the digit 2. See the appropriate Language Supplement for details.

If a word has more than one syllable, at least one of these syllables must be marked for primary stress, or the SPR is considered invalid. Other syllables may be marked for secondary or no stress, or left unmarked. A syllable that is not marked for stress is assumed to have no stress, unless it is the only syllable of a word, in which case it is assigned a primary stress.


Speech Sound Symbols

Each language uses its own inventory of SPR symbols for representing its speech sounds. See the appropriate Language Supplement for a table of SPR symbols and examples of words in which each sound occurs. These tables show valid symbols for vowels, consonants, syllable stresses, and syllable boundaries.

Letters are case-sensitive, so \![e] and \![E] represent two distinct sounds. Multi-character symbols must be contained in single quotes; for example, French peu is represented \![p’eu’]. SPRs containing sound symbols that do not belong to the inventory of the current language are considered invalid, and ignored.

Some speech sounds have limited distributional patterns in specific languages. For example, in English, the sound [G] of sing \![.1sIG] does not occur at the beginning of a word. Other US English sounds that have a particularly narrow distribution are the flap [F], and the syllabic nasal [N]. Entering a sound symbol in a context where it does not normally occur may result in unnatural-sounding speech.

The TTS engine applies a sophisticated set of linguistic rules to its input to reflect the processes by which sounds change in specific contexts in natural language. For example, in US English, the sound [t] of write \![.1rYt] is pronounced as a flap [F] in writer \![.1rY.0FR]. SPR input undergoes these modifications just as ordinary text input does. In this example, whether you enter \![.1rY.0tR] or \![.1rY.0FR], the speech output is the same.


Text Pre-Processing

Text pre-processing is required in order to modify digits, abbreviations, and special sequences of characters so that they more closely resemble the words that should be read out. An example of this is the character string etc., which is spoken as “et cetera”, so the text pre-processor expands the abbreviation into the string et cetera.

A more complex example is the string $100, where we want the dollar sign to be expanded to the words “dollars”, and then to move the “dollars” expansion after the number. On top of that, we need the digit 100 to be expanded into the words “one hundred”. The text pre-processor expands this string to one hundred dollars.

Text pre-processing falls into nine categories:
If a token begins with “http” or “www”, then up until the next white-space, the following symbols are expanded.  The last / in a URL is omitted.


Symbol Expansion
/ slash
. dot
- dash
_ underscore
: ,
~ tilde



Example:

InputPronounciation
http://voxeo.com/ is where the demo can be found “aitch tee tee pee slash slash voxeo dot com is where the demo can be found”



Pathnames

If a token begins with “file” or an upper case letter followed by a colon (e.g. C: D: etc.), then until the next white space, the following symbols are expanded. The last / in a pathname is omitted..

Symbol Expansion
/ slash
. dot
- dash
: ,
~ tilde



Example:

InputPronounciation
file:/C:/tts/dev is where to look for the executable.“file colon slash see colon slash tee tee ehs slash dev, is where to look for the executable.”
C:\tts\dev\preproc.exe is the executable “C colon slash tee tee ehs slash dev slash preproc dot exe, is the executable.”



Email Addresses

Symbols in an e-mail address are expanded as follows:

Symbol Expansion
@ , at
. dot
- dash
_ underscore



Example:

InputPronounciation
email:john.smith@voxeo.com “email, john dot smith, at voxeo dot com”



Telephone Numbers

US format telephone numbers read out as a sequence of cardinal numbers separated by pauses. The only exception to this is if 800 appears in the area code portion of the number, in which case the expansion is “eight hundred”.

Example:
InputPronounciation
1-800-428-5555“one, eight hundred, four two eight, five five five five”
428-4444“four two eight, four four four four”



Extension telephone numbers are also expanded.  The following all produce the result “Extension number, one two three four”

ext. 1234    ext.1234    ext 1234    xt. 1234    xt.1234    xt 1234    x. 1234    x.1234    x 1234


Years

Four digit figures greater than 1000 and less than 3000 are context dependently converted into year format.


Example:
InputPronounciation
1200 “twelve hundred”
1801“eighteen owe one”
1999“nineteen ninety nine”
3487“three four eight seven”



Dates

Dates in the following US formats are converted. If a date has been written in European format (e.g. dd/mm/yy), it is either not expanded, or it will be expanded as if it were a US format address (e.g. 02/01/99 will always be read out as “February first nineteen ninety nine”, rather than “January the second nineteen ninety nine”)..

Example:
FormatExampleOutput
mm/dd/yyyy03/31/2000March thirty first, two thousand
mm/d/yyyy12/7/1908December seventh, nineteen oh eight
m/d/yyyy6/22/1981June twenty second, nineteen eighty one
mm/dd/yy05/27/99May twenty seventh, nineteen ninety nine
m/d/yyyy9/1/1969September first, nineteen sixty nine
mm/d/yy06/1/55June first, nineteen fifty five
m/dd/yy5/12/97May twelfth, nineteen ninety seven
m/d/yy9/9/99September ninth, nineteen ninety nine



Times

Multiple time formats are converted:

Example:
MarkupExplanation
11.07 pmEleven owe seven pee ehm
7.30.33 AMSeven thirty ey ehm
13:45 pmThirteen forty five pee ehm
5:40 PMFive forty pee ehm
10:20:59Ten twenty
9:05:03Nine owe five
13:36Thirteen thirty six
8:10Eight ten



Rates

Rates in the form QUANTITY1/QUANTITY2 are expanded.

Quantity1 is currently any of:
Quantity2 is currently any of:
Any combination of these will be expanded. The / is expanded to “per“.

Example:

InputPronounciation
24 ins/yearTwenty four inches per year
m/s is a better measurement than km/hMeters per second is a better measurement than kilometers per hour


Note: Seconds are not expanded at all.  Times delimited by periods are only expanded inf the sequence is followed by am/AM or pm/PM.


User Tags

Escape sequences are embedded in the input text, and must be surrounded by white space. For tags that require numeric input, percentages should be represented as values between 0 and 1 (e.g., 66% = 0.66). Escape sequences are divided into five categories:
Note: All tags must always be preceded and followed by white space or they may not be detected correctly. This is true even if the string begins with a tag. In that case, you must begin the string with a space.


Special Case Tags

Certain types of text require special treatment – they need to be read out in a non-default manner.

Special CaseCodeDescription
AddressesaPostal address format
Datesd  mm/dd/yy for example
Fractionsf½, ¾, for example
MeasurementslImperial & Metric units
MoneymCurrency units
Proper NamesnPeople/Company names etc.
Ordinal NumbersoFirst, Second, Third, etc
Telephone NumberspTelephone number format
TimestTimes of day



Special cases can be difficult to detect. The TTS engine may think it has found a special case when it hasn’t (called a false positive) or instances may be missed. Italicized items above are particularly difficult to make work correctly. For each of the special cases above, the user can define how the TTS engine decides whether or not a text string is special. The default behavior is to have all special case risk modes set to conservative.  The Risk options are:

Risk SettingCodeDescription
OffoNo matches
ConservativecMight miss a few
RiskyrPossible false-positives



Once the TTS Engine has identified one of the special cases (if the risk mode is not off), it treats it intelligently. The engine does not treat special cases differently in risky mode compared to conservative mode – the risk setting just dictates how many instances of a special case it is likely to detect.

The tag is written \!nCASERISK where CASE is one of the special case codes above, and RISK is one of the risk codes above. The case is treated in the specified risk mode after the tag. Below are some example tags with a broad description of the effect that each tag has on the processing. Note the special tag at the bottom of the table.

MarkupExplanation
\!naoDon’t try to spot postal address formatting
\!ndcIdentify strings that look a lot like dates
\!nfrIdentify anything that looks like a fraction
\!nioDon’t try to spot measurements
\!nmcIdentify things that look a lot like currency units
\!nnrIdentify anything that looks like a proper noun/name
\!npcIdentify things that look a lot like telephone numbers
\!ntrIdentify anything that looks like a time
\!deSet ALL modes to conservative/default



Expansion Setting Tags

In TTS, expansion means noticing some sort of abbreviation (e.g. sec) and expanding it into a full word that represents how the abbreviation would be spoken (e.g. seconds). Expansion settings tags can de-activate expansion modes, which are turned on by default. This section describes the four modes that may be set: Acronym expansion, Abbreviation expansion, Numeric hyphen expansion, and Alphabetic hyphen expansion.


Acronym Expansion

Certain acronyms are spoken as if they were whole words (like OPEC), rather than being spelled out one letter at a time (like USA). By default, the TTS engine spells acronyms. Using the tags described below, it is possible to turn the acronym spell mode off for a given acronym, so that it is pronounced as a whole word. The tags are:

MarkupExplanation
\!abStart acronym spell mode
\!aeStop acronym spell mode



Example:
InputPronounciation
The USA has donated a million dollars to \!ae UNICEF \!ab in the UKThe yew es ey has donated a million dollars to unicef in the yew key



Abbreviation Expansion

By default, the TTS engine attempts to expand abbreviations that it knows about.  If you want to turn this feature off, use the tag described below:

MarkupExplanation
\!ebStart abbreviation expansion mode
\!eeStop abbreviation expansion mode



Examples:
InputPronounciation
Give me another 30 \! ee secs \!eb and I’ll be ready for Tue.Give me another thirty secs, and I’ll be ready for Tuesday.



Alphabetic Hyphen Expansion

An alphabetic hyphen is any hyphen other than those defined in the “Numeric hyphen expansion”

Example:

Police saved a 7-month old baby

By default the alphabetic hyphen mode is terse which means that the hyphen is not pronounced. If the alphabetic hyphen mode is set to verbose, the hyphen will be pronounced as “dash”. The tags are:

MarkupExplanation
\!havMake alphabetic hyphen mode verbose
\!hatMake alphabetic hyphen mode terse



Examples:
InputPronounciation
John was \!hav twenty-one for the \!hat thirty-third time yesterday.John was twenty dash one for the thirty third time yesterday.



Other Tags

There are two useful miscellaneous tags:

“Proper nouns”
“Force sentence breaks”

Proper Nouns
MarkupExplanation
\!pnTreat the next word like a proper noun



Example:
InputPronounciation
\!pn Begin was re-elected last Sunday.BE-gin was re-elected last Sunday.



'Begin' will be treated as a proper noun in the TTS modules. Otherwise, it is possible (albeit unlikely) that it may be treated as a verb form. In this example, it is clear that knowing Begin as a proper noun results in a completely different pronunciation of the word: BE-gin versus be-GIN

Knowing that a word is a proper noun as opposed to another part of speech is also useful for other TTS processing modules. Using this tag ensures corrects handling on a one-off basis


Force Sentence Breaks

If the user wishes to override punctuation, and to force a sentence break at a particular point in the text string, it is possible to do so with this tag:.

MarkupExplanation
\!brForce a sentence break here



Example:
InputPronounciation
It has to stop \!br now.it has to stop. now



Abbreviations

The following tables enumerate the abbreviations recognized by the TTS engine.


AbbreviationExpansion
AKAlaska
AprApril
ARArkansas
AugAugust
AZArizona
CACalifornia
cmcentimeters(s)
Ctrcenter
c/o, C/Ocare of
dbdecibel(s)
DC, Dc, dcDC
DecDecember
degdegree(s)
degsdegrees
eeast
EGE G
etcetcetera
FebFebruary
FriFriday
GA, GaGeorgia
geG E
gmgram(s)
hrshours
Hzhertz
IaIowa
IBMI B M
IL, ilIllonois
JanJanuary
Jrjunior
JunJun
kgkilogram(s)
kmkilometer(s)
lbpound(s)
lbspounds
MarMarch
MdMaryland
mgmtmanagement
miscmiscellaneous
MNMinnesota
Mr., MR.mister
Mrs., MRS.missus
msecmillisecond(s)
NCNorth Carolina
NDNorth Dakota
NHNew Hampshire
NJNew Jersey
NMNew Mexico
NovNovember
NVNevada
NYNew York
OctOctober
PHD, Phd, phdP H D
Pres.president
RIRhode Island
Rm.room
SCSouth Carolina
SDSouth Dakota
SepSeptember
SeptSeptember
Sgt.sergeant
Tbsp., tbsp.tablespoon(s)
Tue.Tuesday
Tues.Tuesday
USAU S A
UTUtah
VaVirginia
VtVermont
WIWisconsin
WVWest Virginia
WYWyoming
yds.yards
yryear(s)
yrsyears


 

Multiple Expansions

AbbreviationExpansion 1-Expansion 2
aptapartmentapt
CtConneticutcourt
Drdoctordrive
FLFloridafloor
ftfootfeet
KY, KyKentuckykey
MTMontanamount
Stesaintsuite
USUSus




Links to the Language Supplements

Language Supplement for US English (en-US)
Language Supplement for Spanish (Mexican) (es-MX)
Language Supplement for French (Continental) (fr-FR)





  ANNOTATIONS: EXISTING POSTS
0 posts - click the button below to add a note to this page

login
  Logger Key -- CallXML  |  TOC  |  G: Rhetorical TTS Guide  

© 2008 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site