Curva Fin Bloque
NEWS 20 MARCH, 2020

Pangeanic wins contract to lead European-wide anonymization project

INEA has awarded Pangeanic’s consortium almost €1M to develop a multilingual anonymization toolkit based on AI processing of health, life science, and legal texts for Public Administrations.

The MAPA Project (Multilingual Anonymisation toolkit for Public Administrations) will make use of state-of-the-art Natural Language Processing tools to develop the open source toolkit with a focus on the medical and legal domains, deploying it at several Public Administrations in Europe.

“The aim of MAPA is to provide data anonymization so language data can be shared across and between organizations, while protecting private or sensitive data. Implementation cases will focus on de-identifying, obfuscating or pseudo-anonymizing personally identifiable information to prove not matter which language a Public Administration or user deals with, the solution can cope. MAPA will enable PA’s to comply with GDPR to a high degree of accuracy and protect an individual’s private details while maintaining the usefulness of the source data.” – Manuel Herranz, CEO

Some of Pangeanic’s development team at PangeaMT, Innsomnia Accelerator facilities at Valencia’s Port.

The toolkit developed by the MAPA partners (Pangeanic, Tilde, the National French Center for Scientific Research (LIMSI at CNRS), the language resource center ELDA, the University of Malta, R&D transfer center Vicomtech, and Spanish Language Plan Government Office SEAD via the Barcelona Supercomputer Center) will address all EU official languages. The challenge of working with under-resourced languages such as Latvian, Lithuanian, Estonian, Slovenian or Croatian will be tackled by a multilingual NERC approach, to also benefit ultra-under-resourced languages such as Maltese and Irish.

Why Anonymize Data?

GDPR obliges organizations to protect citizens’ data so it is not released to 3rd parties (see this video on Pangeanic’s anonymization technologies). The MAPA data anonymization toolkit will provide the means to share language data while protecting personal or sensitive data. Being able to release large amounts of anonymized data can help the community to have more training data. On a more practical level, justice departments, health authorities, healthcare companies will be able to provide access to data and manage an anonymization strategy. Most importantly, MAPA will satisfy GDPR requirements at scale. Although no software can guarantee 100% accuracy in anonymization, just as perfect machine translation does not exist (yet), it will make document sharing much easier.

Technical Approach to Anonymization

At its core, the MAPA anonymisation toolkit will use Named-Entity Recognition and Classification (NERC) techniques using both Deep Learning techniques and neural networks.

In addition, thanks to the transfer learning capabilities shown by new types of Deep-Learning models, new systems can be trained using relatively small datasets of manually labelled data. The knowledge acquired for a given domain or language can be transferred and re-used cross-language  or cross-domain. MAPA will be trained to detect named entities that involve sensitive information.

MAPA will be feature-rich and the NERC approach will be complemented with other configurable mechanisms such as pattern detection based on regular expressions (passport or ID numbers, telephone numbers, street addresses, blood groups, age, sex, marital status, email addresses, bank accounts, etc.)

User-definable dictionaries for particular applications will also cater for specific usages of entity names known in advance.

Use cases

MAPA includes several specific deployments/use cases for public institutions at several EU countries: one for the health domain and one for the legal domain. Both domains were selected given their strong anonymization requirements prior to any publication and sharing of the data. In each deployment case, the system will be tailored to the specific needs of the relevant institution.

MAPA is funded by the Connecting Europe Facility (CEF) programme, under grant No A2019/1927065, and will run from January 2020 until December 2021.

European Commission logo



Leave a Reply

Your email address will not be published.

Where we are



Pangeanic Headquarters

Av. Cortes Valencianas, 26-5,

Ofi 107

46015 Valencia (Spain)

(+34) 917 94 45 64 / (+34) 96 333 63 33
[email protected]


Flat8, 279 Church Road,
Crystal Palace
SE19 2QQ
United Kingdom
+44 203 5400 256

[email protected]


Castellana 91
Madrid 28046
(+34) 91 326 29 33
[email protected]



One Boston Place
Suite 2600
Boston MA 02108
(617) 621-4084
[email protected]

New York

228 E 45TH St Rm 9E
10017-3337 New York, NY

[email protected]  


Hong Kong

21st Floor, CMA Building
64 Connaught Road Central
Hong Kong
Toll Free: +852 2157 3950
[email protected]


Ogawa Building 3F

3-37 Kanda Sakuma-cho

Chiyoda-ku, Tokyo


[email protected]


Tomson Commercial Building,
Room 316-317
710 Dong Fang Road
Pu Dong, Shanghai 200122, China

[email protected]