Curva Fin Bloque
NEWS 30 AUGUST, 2021

Machine Learning Datasets and Neural Engines available from NTEU Consortium

Machine Learning Datasets and Neural Engines available from NTEU Consortium

The NTEU Consortium that Pangeanic has been leading since 2019 has completed the massive data upload to ELRC, with neural engines being available for European Public Administrations via the European Language Grid.

The NTEU project goals were the gathering and re-use of many of the language resources of several European CEF projects to create near-human quality machine translation engines for use by Public Administrations by EU Member States. This massive engine-building endeavor encompassed all possible combinations among all EU official languages, in combinations ranging from English to Spanish, German or French to low-resource languages such as Latvian, Finnish or Bulgarian into Greek, Croatian or Maltese.

Every engine has been tested using the project’s specific evaluation tool MTET (Machine Translation Evaluation Tool), which has been specifically developed for the project. MTET ranked the performance of direct combination engines (eg, not “pivoting” through English) versus a set of free online engines. Two graders had to rank every single engine (language combination) in order to normalize human judgement and asses how close the engines’ ouput was to a reference human expression.

A view of Machine Translation Evaluation tool MTET

A view of Machine Translation Evaluation tool MTET

Human graders could leave some unclear evaluations unfinished (if they needed to stop and come back later), although segment evaluation done consecutively, one sentence after another was preferable.

As we can see below, some language combinations (Irish Gaelic into Greek) were a challenge!

Typical Evaluation Screen

Fig. 2 Typical Evaluation Screen


In order to guarantee final quality, human graders did not know which input came from the NTEU engines and which input came from a second translation by a generalist, online MT provider that was used as benchmark). They ranked each input by moving a slider from right  to left and from 0 to 100.  The aim was that during the evaluation, they could assess whether the machine-generated sentence adequately expressed the meaning contained in the source, that is, how close it was to how a human would have written it.

Evaluation Criteria

Another challenge was to standardize human criteria. Different people may have different linguistic preferences which can affect sentence evaluation. Thus, it was important from the beginning to follow the same scoring guidelines.

To standardize criteria, Pangeanic laid out a set of instructions, together with the Barcelona SuperComputing Centre, and that had been proven as academic methods to guarantee all evaluators follow the same scoring methods across languages.

Unlike SMT methods (based on BLEU scores) NMT needed to be ranked on accuracy, fluency and terminology. Those 3 key items were defined as followed

Accuracy: defined as a sentence containing the meaning of the original, even though synonyms may have been used.

Fluency: the grammatical correctness of the sentence (gender agreements, plural / singular, case declension, etc.)

Adequacy [Terminology]: the proper use of in-domain terms agreed by the client and the developer and that are for use in production but may not be standard or general terms (the specific jargon).

When ranking a sentence, the following weights were typically applied:

  • Accuracy : 33%
  • Fluency : 33%
  • Adequacy [terminology] : 33%

In general, we human graders evaluated from 5 to 10 points for every serious error. The evaluation was the result of applying these discounts.

For instance, one grader might have found two accuracy errors in a sentence (some information is missing and non-related additional information had been added). The grader then subtracted 5% for the small error and 20% for the serious error from the Accuracy total. If the grader (evaluator) additionally found a small fluency error, he/she could decide to additionally deduct -5%, too.

We are very happy this massive effort has crystalized into tangible results for the potential users, the European Public Administrations, which now can run MT privately as an internal infrastructure. These engines can also serve as a benchmarking tool for the wider academic MT community” said Manuel Herranz, Pangeanic’s CEO.

Leave a Reply

Your email address will not be published. Required fields are marked *

Where we are



One Boston Place
Suite 2600
Boston MA 02108
(617) 621-4084

New York

228 E 45TH St Rm 9E
10017-3337 New York, NY  



Pangeanic Headquarters

Av. Cortes Valencianas, 26-5,

Ofi 107

46015 Valencia (Spain)

(+34) 96 333 63 33


Flat8, 279 Church Road,
Crystal Palace
SE19 2QQ
United Kingdom
+44 203 5400 256


Castellana 91
Madrid 28046
(+34) 91 326 29 33


Hong Kong

21st Floor, CMA Building
64 Connaught Road Central
Hong Kong
Toll Free: +852 2157 3950


Ogawa Building 3F

3-37 Kanda Sakuma-cho

Chiyoda-ku, Tokyo



Tomson Commercial Building,
Room 316-317
710 Dong Fang Road
Pu Dong, Shanghai 200122, China