International Workshop on Spoken Language Translation

   
 

Evaluation campaign
>> Data Tracks & Restrictions

Translation conditions

  • Manual transcription (plain sentences in BTEC)

  • ASR output (1-best, N-best, lattices in HTK format) of spoken BTEC sentences

Restrictions for Available Resources for Each Evaluation Track

We will only accept submissions whose system descriptions do not conflict with the data conditions of each evaluation track given by the following table. Please make sure that your system follows the conditions.

The following table gives you an overview of what kind of linguistic resources are permitted/not-permitted for the respective data set conditions.

 

Resources

Supplied

Supplied+Tools

Unrestricted

C-STAR

Corpus

+

+

+

+

Tagger/
Chunker/
Parser

-

+

+

+

Public data

-

-

+

+

Proprietary data

-

-

-

+

+: permitted, -: not permitted

 

Supplied Data Track:

The training data of the MT systems is limited to the supplied corpus only


Supplied Data + Tools Track:

The training data of the MT systems is limited to the supplied corpus only, but you are allowed to use your own and publicly available Parser/Chunker and Tagger tools.


Unrestricted Data Track:

All public data is allowed. This means data available through organizations like LDC or ELRA as well as data crawled from the WWW.


C-STAR Track:

There are no limitations on the linguistic resources used to train the MT systems.

Full BTEC corpus and proprietary data can be used.

 

 

 
        Copyright(c) 2005 interACT All rights reserved.