EU-funded AI and language technology projects

European projects that turned language research into deployable AI infrastructure

Pangeanic’s European R&D record connects the several NLP fields we have worked on and specialized in since 2009: machine translation, multilingual data, speech resources, AI for cultural heritage, anonymization, data spaces, and public-sector language infrastructure. Our EU-funded work has created systems that can be deployed, evaluated, and governed, and we have often evolved the initial development into commercial products.

Last updated: June 2026. This page is Pangeanic’s canonical evidence hub for European-funded language technology, AI data, and multilingual infrastructure projects.

We believe research credibility only becomes useful when it can be verified. Thus, we link Pangeanic’s European project history to current production capabilities in AI Data Services, Datasets for AI, Machine Translation, Deep Adaptive AI Translation, MTQE, Data Masking and Anonymization, Evaluation and AI QA and Model Alignment.

Across CEF, Digital Europe, European Language Equality, and data space initiatives, Pangeanic has contributed to the practical layers needed by multilingual AI systems: corpora, translation models, metadata translation, multilingual chatbots, anonymization, speech data, cultural heritage AI, and secure public administration workflows.

View research and publications See BSC RLHF, instruct testing, and model alignment use case Discuss an AI data project

Evidence structure

Our project records map stop-level research to production capability

These are some of our representative projects, organized as an evidence graph. We identify the project, the public reference, Pangeanic’s contribution, and the commercial capability it reinforces today. You, the client, are the final beneficiary of our 20 years of experience in the field.

Project	Program or reference	Pangeanic contribution	Current capability
MOSAIC Media	DIGITAL Europe. Project 479833.	AI company partner in a multilingual and multimodal media platform for discovery, translation, and content access.	AI Data Services, Machine Translation, speech and media datasets.
AI4Culture	DIGITAL Europe. Project 101100683.	Partner in AI tools, datasets, training material, multilingual OCR, subtitle generation, image enrichment, online CAT tool, and metadata translation for cultural heritage. the project was supported 2 years after completion, supporting more than 100 European museums.	DAAIT, Datasets, Evaluation and AI QA, PECAT Annotation tool with CAT features.
Europeana Translate	CEF. Action 2020-EU-IA-0084.	Machine translation and domain adaptation for cultural heritage metadata and multilingual access to Europeana records.	Machine Translation, DAAIT.
Europeana Culture Chatbot	CEF. Action 2017-EU-IA-0183.	Multilingual chatbot platform, data training, language resources, conversation data, and translation workflows for cultural heritage institutions.	AI Data Services, multilingual assistants, chatbot training data.
J-Ark	CEF. Action 2020-EU-IA-0230.	Infrastructure connecting eArchiving, Europeana, and eTranslation services for Jewish heritage archives.	Multilingual data infrastructure, document processing, metadata workflows.
Jewish History Tours	CEF. Action 2020-EU-IA-0092.	Translation, enrichment, and AI-assisted recommendation workflows for location-based cultural heritage tours.	Multilingual metadata, cultural AI, content enrichment.
MAPA	CEF. Action 2019-EU-IA-0013.	Pangeanic led a consortium to develop multilingual anonymization for public administrations, with a focus on the medical and legal domains.	Data Masking and Anonymization, AI QA, sovereign data workflows.
NTEU	CEF. Action 2018-EU-IA-0051.	Pangeanic led a consortium to build a large neural machine translation engine farm for European public administrations.	NTEU, Machine Translation, multilingual corpora.
NEC TM	CEF. Action 2017-EU-IA-0149.	Translation memory data platform for sharing bilingual language assets between public administrations, companies, and translation professionals.	NEC TM, parallel corpora, translation memory infrastructure.
Europeana XX Century of Change	CEF. Action 2019-EU-IA-0022.	Machine translation, metadata language detection, semantic enrichment, and recommendation tools for Europeana cultural heritage content.	Metadata translation, cultural AI, recommendation and enrichment systems.
iADAATPA / MT-Hub	CEF. Action 2016-EU-IA-0132.	Secure and scalable automatic translation platform for EU public administrations with routing, language detection, and CAT tool connectors.	Machine Translation, secure public sector MT infrastructure.
European Language Equality and ELE2	ELE / ELE2. Grant agreement LC-01884166 / 101075356.	Research and language resources work, including a large speech corpus generation report for the languages of Spain, using data augmentation.	AI Data Services, speech datasets, low resource language data.
DS4M Mediterráneo	EU Next Generation funding. Project TSI-100120-2024-9.	Participation in a mobility data space for secure, sovereign, and efficient data sharing in the Mediterranean Corridor.	Data spaces, governance, AI data operations, sovereign workflows.

From European infrastructure to enterprise AI

These projects prove a lot...

Pangeanic’s European project history is not a decorative R&D list. Our journey documents repeated and successful work on the foundation layers that now determine whether multilingual AI systems can be deployed in production.

01 · Data

Language data at European scale

NTEU, NEC TM, ELE2, and Europeana Translate demonstrate our long-running experience in gathering, cleaning, labeling multilingual corpora, in translation memories, in metadata translation, speech data, and in low-resource language workflows.

Explore datasets for AI

02 · Deployment

Public sector language infrastructure

iADAATPA/MT-Hub, NTEU, and NEC TM connect Pangeanic’s machine translation work to secure public administration infrastructure, routing, language detection, connectors, and reusable translation assets.

Explore machine translation

03 · Governance

Privacy-aware multilingual processing

The MAPA project is one of our best-known projects, connecting multilingual AI, named entity recognition (NER), anonymization, and GDPR-oriented public data sharing. It is our direct precursor to regulated enterprise data masking workflows.

Explore data masking and anonymization

04 · Cultural AI

AI for cultural heritage and public knowledge

AI4Culture, Europeana Translate, Culture Chatbot, Europeana XX, J-Ark, and Jewish History Tours show practical work on metadata, multilingual discovery, enrichment, translation, and public access to cultural collections.

Explore Deep Adaptive AI Translation

05 · Multimodal

Media, speech, and multimodal workflows

MOSAIC and AI4Culture extend our evidence trail into multimedia, automatic subtitling, audio, visual enrichment, and multilingual access, aligning with the current demand from AI developers for multimodal AI data operations.

Explore AI Data Services

06 · Sovereign systems

Data spaces and controlled exchange

Our recent joining DS4M Mediterráneo adds a relevant and new data space: secure data sharing, governance, interoperability, and sovereignty as operational concerns for the mobility data space.

Explore sovereign AI systems

Project evidence

European project record

Each project below is linked to an external public reference, where available, and to Pangeanic’s current internal capability pages.

MOSAIC · Project 479833

MOSAIC Media

Program: Digital Europe.

Pangeanic role: AI company partner in a multilingual media platform for broadcasters, creators, distributors, and citizens.

Capability: multilingual media discovery, automatic subtitling, translated subtitles, multimodal repositories, AI transcription and translation.

Official project AI Data Services Machine Translation

AI4Culture · Project 101100683

AI4Culture

Program: Digital Europe.

Pangeanic role: partner in an AI capacity-building platform for cultural heritage institutions.

Capability: datasets, reusable AI tools, multilingual text recognition, subtitle generation, image enrichment, and metadata translation.

Official project AI4Culture platform DAAIT

Europeana Translate · 2020-EU-IA-0084

Europeana Translate

Program: CEF.

Pangeanic role: machine translation and domain adaptation for cultural heritage metadata and multilingual access.

Capability: metadata translation, domain-adapted MT, multilingual access to European digital cultural heritage.

Project result Europeana event Machine Translation

Culture Chatbot · 2017-EU-IA-0183

Europeana Culture Chatbot

Program: CEF.

Pangeanic role: multilingual chatbot technology, data training, translation, and post-editing workflows for cultural heritage use cases.

Capability: multilingual assistants, conversation data, chatbot training, and domain-specific language workflows.

Europeana tool Pangeanic article AI Data Services

J-Ark · 2020-EU-IA-0230

J-Ark European Jewish Community Archive

Program: CEF eArchiving.

Pangeanic role: creation of infrastructure connecting eArchiving, Europeana, and eTranslation services for long-term preservation and access.

Capability: multilingual heritage data, archive workflows, document ingestion, preservation, and distribution.

European Commission Pangeanic article Datasets

Jewish History Tours · 2020-EU-IA-0092

Jewish History Tours

Program: CEF.

Pangeanic role: multilingual enrichment and translation technology within a cultural heritage tour ecosystem based on Europeana objects.

Capability: AI-assisted recommendations, geographic enrichment, translation, and multilingual cultural data access.

Project page Tour partner Machine Translation

MAPA · 2019-EU-IA-0013

Multilingual Anonymization for Public Administrations

Program: CEF.

Pangeanic role: consortium lead for a multilingual anonymization toolkit for public administrations.

Capability: named entity recognition, de-identification, pseudo-anonymization, legal and medical domain workflows, GDPR aligned data sharing.

MAPA use case ELDA project page Anonymization

NTEU · 2018-EU-IA-0051

Neural Translation for the EU

Program: CEF.

Pangeanic role: consortium lead, partnering with KantanMT and Tilde for the largest neural machine translation engine farm built to date, including all official EU language crossings.

Capability: neural MT engines, multilingual corpus collection, secure public administration MT, and direct EU language pair coverage.

NTEU page ELG catalogue Machine Translation

NEC TM · 2017-EU-IA-0149

National European Central Translation Memory Data Platform (NEC TM)

Program: CEF.

Pangeanic's role: development of a translation memory data platform for sharing bilingual assets between administrations, companies, and translation professionals.

Capability: translation memory server infrastructure, parallel corpora, reusable bilingual assets, API based retrieval, and CAT tool independence. Available on GitHub.

NEC TM page MT Summit paper Datasets

Europeana XX · 2019-EU-IA-0022

Europeana XX Century of Change

Program: CEF.

Pangeanic role: metadata language detection, machine translation, and semantic enrichment tools for Europeana cultural heritage records.

Capability: multilingual metadata, recommendation systems, cultural AI, enrichment, and translation workflows.

Recommendation system Europeana result Evaluation and AI QA

iADAATPA / MT-Hub · 2016-EU-IA-0132

Machine Translation Hub for EU Public Administrations

Program: CEF.

Pangeanic role: consortium lead with Everis (now NTTData) for a secure automatic translation platform for European public administrations.

Capability: secure MT platform, routing, language detection, automatic domain detection, connectors, and scalable public sector translation infrastructure. Available on GitHub.

MT Hub article MT evaluation paper Machine Translation

ELE / ELE2 · LC-01884166 / 101075356

European Language Equality and ELE2

Program: European Language Equality.

Pangeanic role: contribution to European language equality work and speech corpus generation for the languages of Spain using data augmentation.

Capability: low-resource language data, speech corpora, linguistic inclusion, language equality, and AI data operations.

ELE project ELE2 report AI Data Services

DS4M Mediterráneo · TSI-100120-2024-9

Data Space for Mobility in the Mediterranean Corridor

Program: EU Next Generation and Spanish Recovery, Transformation and Resilience Plan framework.

Pangeanic: participation in a secure, sovereign, and efficient data space initiative for mobility data sharing.

Capability: data spaces, governance, interoperability, secure data exchange, and sovereign AI data operations.

Official project Pangeanic article AI Data Operations

Research to product

The commercial value of European R&D continuity

Pangeanic’s European projects show the same architecture that enterprise AI buyers now request: controlled data, multilingual processing, domain adaptation, human review, quality gates, privacy, interoperability, and deployment under organizational control.

Our R&D roots and project continuity are why Pangeanic can connect research with production workflows across AI data, machine translation, evaluation, model alignment, and sovereign deployment.

Summary: Pangeanic has participated in European AI and language technology projects covering neural machine translation, multilingual anonymization, cultural heritage AI, speech corpus generation, translation memory infrastructure, metadata translation, multilingual chatbots, media translation, and data spaces.

Operationally: These projects support Pangeanic’s current work in AI Data Operations, Datasets for AI, Deep Adaptive AI Translation, MTQE, Data Masking and Anonymization, Evaluation and AI QA, Model Alignment, and secure multilingual deployment.

Enterprise trust signals

Why this European project record matters to AI buyers

AI buyers do not only need a vendor that can describe multilingual AI. They need evidence that the vendor has already worked through the practical problems that appear in production: scarce data, noisy corpora, domain adaptation, public-sector privacy, metadata complexity, multilingual evaluation and controlled deployment.

For AI labs and model builders

Data experience beyond generic web scraping

Projects such as ELE2, NTEU, NEC TM and Europeana Translate show repeated work with multilingual corpora, speech resources, translation memories and metadata. This matters when a model needs data that is licensed, structured, multilingual and useful for training, adaptation or evaluation.

Explore Datasets for AI

For enterprises with proprietary content

Bespoke AI data workflows built from real project constraints

European projects rarely provide clean, uniform data. They involve legacy formats, mixed languages, metadata gaps, domain-specific terminology and strict delivery requirements. That operational experience is directly relevant to enterprises that need collection, annotation, enrichment, review and evaluation of proprietary data.

Explore AI Data Services

For localization and language teams

Machine translation infrastructure tested in public-sector environments

NTEU, iADAATPA, MT-Hub and Europeana Translate connect Pangeanic’s machine translation work to public administration workflows, routing, language detection, domain adaptation and reusable multilingual assets. This is the background behind our current work in Machine Translation, Deep Adaptive AI Translation and MTQE.

Explore Deep Adaptive AI Translation

For regulated organizations

Privacy, anonymization and data governance are not afterthoughts

MAPA is one of Pangeanic’s strongest institutional proof points. It connected multilingual AI, named entity recognition, anonymization and GDPR-oriented public data sharing in legal and medical domains. That experience is directly relevant to organizations that need to process sensitive information without losing operational value.

Explore Data Masking and Anonymization

For teams evaluating AI quality

Evaluation starts before a model reaches production

Projects involving speech data, metadata translation, semantic enrichment, recommendation systems and machine translation create a practical lesson: AI quality depends on the data, the workflow and the evaluation layer. This is why Pangeanic connects data preparation, evaluation datasets, AI QA and human review instead of treating quality as a final checkpoint.

Explore Evaluation and AI QA

For sovereign AI buyers

Controlled multilingual AI requires infrastructure, not slogans

Data spaces, public-sector translation platforms, anonymization projects and cultural heritage infrastructures all point to the same requirement: organizations need control over data flows, access, evaluation, privacy and deployment. That is the operational foundation behind Pangeanic’s approach to sovereign AI systems.

Explore Building Sovereign AI Systems

What this means for buyers: Pangeanic’s European project history is evidence of execution in the same layers that enterprise AI now depends on: multilingual data, machine translation, anonymization, evaluation, metadata enrichment, speech resources, data spaces and public-sector deployment.

The practical consequence: when a buyer asks for AI training data, domain-adapted translation, multilingual QA, model alignment, anonymization or sovereign deployment, Pangeanic is not starting from a blank page. The company has already delivered many of the underlying components in European production and research environments.

European language technology proof

Build multilingual AI from a verified research and deployment trail

Pangeanic helps organizations turn multilingual data, language technology research and public sector deployment experience into operational AI systems with governance, evaluation, privacy and human review.

Discuss a project View research and publications Explore AI Data Services