Evaluation campaign >> Data Tracks & Restrictions
Translation conditions
- Manual transcription (plain sentences in BTEC)
- ASR output (1-best, N-best, lattices in HTK format) of spoken BTEC sentences
Restrictions for Available Resources for Each Evaluation Track
We will only accept submissions whose system descriptions do not
conflict with the data conditions of each evaluation track given by the
following table. Please make sure that your system follows the
conditions.
The following table gives you an
overview of what kind of linguistic resources are
permitted/not-permitted for the respective data set conditions.
Resources |
Supplied |
Supplied+Tools |
Unrestricted |
C-STAR |
Corpus |
+ |
+ |
+ |
+ |
Tagger/ Chunker/ Parser |
- |
+ |
+ |
+ |
Public data |
- |
- |
+ |
+ |
Proprietary data |
- |
- |
- |
+ |
+: permitted, -: not permitted
Supplied Data Track:
The training data of the MT systems is limited to the supplied corpus only
Supplied Data + Tools Track:
The training data of the MT systems is limited to the supplied corpus only, but you are allowed to use your own and publicly available Parser/Chunker and Tagger tools.
Unrestricted Data Track:
All public data is allowed. This means data available through organizations like LDC or ELRA as well as data crawled from the WWW.
C-STAR Track:
There are no limitations on the linguistic resources used to train the MT systems.
Full BTEC corpus and proprietary data can be used.
|