Click here to Subscribe

BPL
LMDS
GPU
VoP
OLED
DSP
Opera Browser
The FCC
More...

View this feed in your browser

Other Services:


Search All Issues, Conference Reports and Tutorials

Web Services Summit

Fair Use or Copyright?

Deregulation Smoke and Mirrors

More...

 

Text-to-Speech Tutorial


Technology

Speech synthesis programs convert written input to spoken output by generating synthetic speech. These are often referred to as Text-to-Speech conversions (TTS).

There are several ways to perform speech synthesis:

1. Record the voice of a person saying the required phrases

2. The use of algorithms that split speech into smaller pieces. Often pieces are split into 35-50 phonemes (smallest linguistic unit). This decreases the quality though, due to the complexity of combining them once again in a fluent speech pattern.

3. The most developed method is the use of diphones, which splits phrases not at the transition but at the center of the phonemes, which leave the transition intact. This results in 400 separate usable elements and a better quality product.

Performing speech synthesis with the methods above is said to be using concatenative processes. Concatenative TTS uses human quality wave files to generate the speech into a TTS string. These systems can be large in size and require lots of drive space to run, but offer a more natural sounding output.

Another method, synthesized TTS, creates speech by generating sounds through a digitized speech format. This output sounds more like a computer than a human, but can be run using just a few megabytes of space.

Products, whether concatenative or synthesized, are usually measured by their intelligibility, naturalness and test preprocessing capabilities (ability to convert acronyms into normal speech).

Applications

There are many software and hardware applications using TTS. Some are outlined below.

Lernout & Hauspie - RealSpeak
The RealSpeak engine reads and converts computer text into natural sounding voice. The technology is based on concatenation algorithms, in which human voice segments are stored and used for the computer text to audio conversion. Speech segments used include diphones, syllables and larger phoneme sequences.

Fonix - iSpeak
iSpeak Personal Text Reader is a stand alone Windows application that reads computer text aloud. Recently Fonix announced an agreement with Mitsubishi Electric Corporation to integrate Fonix speech technology into Mitsubishi's products for the automotive telematrics market - more details coming soon...

SpeechWorks - TTS products
This technology has been adopted by telematics vendor OnStar to help deliver e-mail and other information to drivers of GM automobiles as well as select Acura, Lexus and Audi models.

 

 

 

Additional sources of information*

Report edited by Ronald A. Cole et. al. with a section on TTS

Museum of Speech Analysis and Synthesis

Bell Laboratories Projects

Company pages

AT&T Labs TTS Demo

 





*The WAVE Report is not responsible for content on additional sites 8/2/01

Comments?
E-mail webmaster
Page updated 1/24/07
Copyright 4th Wave Inc, 2007