Curva Fin Bloque
NEWS 9 DECEMBER, 2020

Pangeanic introduces anonymization and neural machine translation at META-Forum 2020

Pangeanic has been an active partner at the META-Forum 2020 conference introducing two projects it is currently leading, with deliverables to the European Commission CEF program. The first presentation was about anonymization software, the MAPA CEF project on Wednesday 2nd and the second one about the largest ever direct combination neural engine farm, the NTEU CEF project. Due to current traveling restrictions, the conference was held online and hosted from Berlin in Germany from 1-3 December 2020.

The conference was organized by the European Language Grid (ELG) and included presentations of the ELG Pilot projects, current EU-funded projects in the Language Technology area, state-of-the art and future work, as well as reports from language technology companies and industry. The talks allowed the community to share current knowledge and efforts from different AI communities in Europe.

Pangeanic was in charge of 2 presentations:

MAPA presentation

MAPA stands for Multilingual Anonymization for Public Administrations. The goal of this CEF project is to develop an open-source de-identification toolkit for all official European Union languages. Pangeanic designed the concept and presented to proposal to the EC. It now leads the consortium, which including recognized European data organizations, Spanish government Data Agency and the French National Research Center, among others. The MAPA anonymization toolkit will rely on Named Entity Recognition and Classification (NERC) techniques using the latest neural networks and deep learning techniques. MAPA will manage a large multilingual annotation data collection activity and provide the necessary training and testing data for the toolkit development as a docker. Data is currently being identified and collected for the 24 official European languages. As part of the project, a connection to eTranslation, the online machine translation service provided by the European Commission will be established to foster the provision of machine translation services to and by public administrations with the possibility to anonymize the content. The toolkit, in its basic form, will be publicly available to European Public Administrations and EU institutions themselves. It will also foster the growth of language technologies as a key component of new digital and AI societies, helping to ensure personal data anonymization. MAPA is particularly targeted to public administrations in the health and legal domains, as a result of the specific use cases addressed during the development of the project. Partners will be able to customize the package further for particular national or commercial use.

 

NTEU presentation

The objective of the NTEU project is to build a neural engine farm with all the 24 European official language combinations for eTranslation, without the necessity to pivot through a high-resourced language. This project is creating 506 near-human quality neural translation engines in total in all EU official language combinations. NTEU stands for Neural Translation for the European Union which Pangeanic leads together KantanMT and Tilde, two leading language technology companies in Europe and Spanish government agency SEDIA.

NTEU will provide a capacity service to eTranslation by building a near-human, professional-quality neural engine farm that can be deployed as an infrastructure for machine translation. Lower-resourced languages are a known challenge, and more effort is required to obtain well-performing engines for them. Techniques to supplement the original data, such as generating synthetic data and transfer learning are performed. The machine translation output from the engines is manually evaluated following industry and WMT practices in an open-source platform created by the consortium.

In addition, the NTEU consortium will gather and clean a large data set from all language combinations so that the engines can be retrained with other technologies in the future.

Both Pangeanic presentations raised a lot of interest in the language community and the MAPA anonymization digital booth was the most popular as anonymization services are in high demand by the public and private sector. Deliveries of the projects will be shared in the ELG and ELRC-SHARE repositories.

Leave a Reply

Your email address will not be published. Required fields are marked *

Where we are

USA

Boston

One Boston Place
Suite 2600
Boston MA 02108
(617) 621-4084
boston@pangeanic.com

New York

228 E 45TH St Rm 9E
10017-3337 New York, NY

info@pangeanic.com  

Miami

429 Lenox Ave

Miami Beach FL 33139

(305) 853-8416

info@pangeanic.com

Europe

Valencia

Pangeanic Headquarters

Av. Cortes Valencianas, 26-5,

Ofi 107

46015 Valencia (Spain)

(+34) 96 333 63 33
info@pangeanic.com

London

Flat8, 279 Church Road,
Crystal Palace
SE19 2QQ
United Kingdom
+44 203 5400 256

london@pangeanic.net

Madrid

Atrium
Castellana 91
Madrid 28046
Spain
(+34) 91 326 29 33
info@pangeanic.com

Asia

Hong Kong

21st Floor, CMA Building
64 Connaught Road Central
Hong Kong
Toll Free: +852 2157 3950
info@pangeanic.hk

Tokyo

Ogawa Building 3F

3-37 Kanda Sakuma-cho

Chiyoda-ku, Tokyo

101-0025

tokyo@pangeanic.net

Shanghai

Tomson Commercial Building,
Room 316-317
710 Dong Fang Road
Pu Dong, Shanghai 200122, China

shanghai@pangeanic.net