We have built a prototypical automatic dictation system with multimodal error recovery capabilities. The underlying recognizers are JANUS as continuous speech recognizer for the Wall Street Journal task, NSpell and NPen++ as isolated word handwriting and connected letter recognizers on 20 K vocabularies, respectively.
In a typical interaction, the user will dictate a sentence, using a push-to-talk button to activate recording. Upon completion of decoding, he corrects errors one at a time by highlighting them, and then repeating the misrecognized word using a modality of his choice (continuous speech, handwriting or spelling). The following series of screen snapshots demonstrates deletion repair by gesture, and substitution repair by handwriting.

Screen after the user pressed the SPEAK button and said "Republicans send a balanced budget plan to the senate floor" - three errors occured in continuous speech decoding (insertion of "the", substitution of "send" by "and", and deletion of "a"):

The user deletes "the" by gesture (crossing out using a pen on the touch sensitive display):

After completion of delete repair:

The user highlights the error "and", automatically bringing up N-best choices:

Since the N-best doesn't contain the correct choice, the user repairs by handwriting:

After completion of handwrite repair:


(Patent Pending)
Back to the first page
to topback to top

Site maintained by: Céline Morel