"Phoneme Segmentation for Unit Selection Synthesis" John Kominek As part of improved support for building unit selection voices, Festival now includes two algorithms for automatic labeling of wavefile data. The first method employs dynamic time warping to align a given wavefile against a known reference. This usually requires having a synthesizer already built for the target language -- a restriction to be averted if possible. The second, more recent addition makes use of the HMM-based acoustic modeling component from Sphinx-2. We have found that one technique is not clearly superior to the other but that the error characteristics are distinctly different. DTW is the more accurate method in 60-70% of cases, but is also more prone to gross labeling errors. Gross label errors are disastrous in a synthetic voice and need to be corrected before an acceptable voice can be constructed. This talk will illustrate these findings and indicate how a hybrid approach can eliminate such outliers without compromising overall accuracy.