Research and Publications

The research trail behind Pangeanic’s multilingual data and AI systems

Academic research is useful only when it becomes operational infrastructure. Pangeanic’s work in machine translation, multilingual data, speech corpora, evaluation, MTQE and model alignment feeds directly into the systems enterprises use today.

Explore Pangeanic’s academic publications, European R&D projects and scientific presentations to product development: AI Data Services, Datasets, Machine Translation, Deep Adaptive AI Translation, MTQE, Evaluation and AI QA and Model Alignment.

Our journey is simple: research produces methods, methods become product features, and product features are tested in operational use cases. 

Applied research

The architecture of applied research

Academic papers solve technical problems. Production AI requires solving workflow failures. Pangeanic’s research agenda has been shaped by practical constraints: multilingual data scarcity, translation quality estimation, terminology control, speech data generation, model adaptation, and the evaluation of outputs before they reach real users.

The value of research is not the publication itself. The value appears when research becomes a dataset, an engine, a control layer, a human review workflow, or a deployable system. We document that path publicly here.

Research profiles

The people behind Pangeanic’s language technology stack

These profiles connect research work, publications and technical presentations to Pangeanic’s current products in multilingual AI data, machine translation, MTQE, model alignment and sovereign AI systems.

Founder and research direction

Manuel Herranz

CEO and founder of Pangeanic. Manuel’s research record connects Pangeanic’s early machine translation work with today’s product lines in adaptive MT, MTQE, multilingual AI data, enterprise document translation and sovereign deployment.

Google Scholar →

Machine learning and data spaces

José Miguel Herrera Maldonado

Head of Machine Learning. His work includes speech corpus generation, information retrieval and multimodal data spaces, connecting Pangeanic’s speech datasets, QA retrieval, data processing and AI platform work.

ELE2 speech corpus report →

Speech and multilingual ASR

Moisés Barrios Torres

His work on speech corpora, code-switching transcription and multilingual text classification connects directly with Pangeanic’s speech datasets, ASR evaluation, language identification and multilingual NLP pipelines.

DAAIT and MTQE

Marina Albert Girona

Machine translation researcher presenting Pangeanic work on multi-agent MT, post-editing, MTQE and Deep Adaptive AI Translation at AMTA, LREC and AI4Culture contexts.

AMTA 2025 →

LLMs and NLP research

Juan Luis García Mendoza

Head of LLMs and AI Research. His publications cover sentiment analysis, persuasion detection, distant supervision, relation extraction, evaluation methodology and machine translation adaptation.

Google Scholar →

European MT infrastructure

Amando Estela

Research and engineering contributions connected to European machine translation infrastructure, NTEU data collection work and the creation of more than 500 neural machine translation engines for EU languages.

Google Scholar →

From research to production

How scientific work becomes operational AI infrastructure

Pangeanic’s research record is most relevant when it explains what the company can build. The same work that produced speech corpora, neural MT engines, evaluation methods and adaptive translation research now supports commercial AI data workflows, MTQE, DAAIT and sovereign deployment.

01 · Speech and low-resource data

From ELE2 speech corpora to AI Data Services

Research: Large speech corpus generation for the languages of Spain using data augmentation.

Product capability: Speech datasets, data augmentation, ASR evaluation and low-resource language collection.

Commercial path: Bespoke speech and audio datasets for public administrations, AI labs and media organizations.

02 · Neural MT infrastructure

From NTEU data collection to 500+ neural MT engines

Research: Large-scale multilingual corpus design and neural machine translation engine creation.

Product capability: Machine Translation, parallel corpora, domain adaptation and multilingual engine deployment.

Commercial path: Secure MT and multilingual AI data infrastructure for enterprise and public-sector workflows.

03 · Adaptive translation

From MT adaptation research to DAAIT

Research: Incremental NMT adaptation, post-editing workflows and domain-specific machine translation.

Product capability: Deep Adaptive AI Translation using client terminology, translation memories and domain resources.

Commercial path: Adaptive translation workflows for organizations that need tone, terminology and domain control.

04 · MTQE and review routing

From quality estimation to translation control layers

Research: Multi-agent translation, post-editing automation and quality estimation presented in AMTA and LREC contexts.

Product capability: MTQE, automatic corrective loops and review routing by quality threshold.

Commercial path: Lower review waste and stronger quality control in high-volume translation operations.

05 · Evaluation and NLP

From relation extraction to Evaluation and AI QA

Research: Distant supervision, noise reduction, sentiment analysis, classification and evaluation methodology.

Product capability: Evaluation datasets, model diagnostics, data quality workflows and AI QA.

Commercial path: Higher-quality data and better evaluation for task-specific multilingual models.

06 · Human feedback and alignment

From research workflows to PECAT and model alignment

Research: Annotation, ranking, evaluation, preference data and human review across multilingual projects.

Product capability: PECAT, RLHF, model alignment and human-governed AI operations.

Commercial path: Human-supervised data and alignment workflows for enterprise AI systems.

Selected publications and presentations

Evidence organized by product relevance

The entries below show how research outputs connect to Pangeanic’s current commercial and technical capabilities.

Deep Adaptive AI Translation (DAAIT)

Manuel Herranz, Marina Albert · LREC 2026 Industry Day

Presentation of Pangeanic’s Deep Adaptive AI Translation work, connecting LLM-based translation, terminology adaptation, domain control and multi-agent language workflows.

LREC 2026 programme →

Enhancing Machine Translation through Multi-Agent Communication: A Post-Editing Approach

Marina Albert Girona · AMTA 2025

Presentation connected to Pangeanic’s development of DAAIT, MTQE, automatic post-editing and quality routing through specialized translation agents.

AMTA 2025 programme →

Machine Translation Workshop AI4Culture

Marina Albert Girona · AI4Culture 2025, Vilnius

Workshop on machine translation and CAT tooling developed in the context of the Europeana and AI4Culture ecosystem, connecting cultural-sector AI with practical translation workflows.

AI4Culture resource →

Generation of a Large Speech Corpus for the Languages of Spain using Data Augmentation

José Miguel Herrera, Moisés Barrios · European Language Equality 2

Technical report connecting Pangeanic’s data generation work to speech datasets, data augmentation and low-resource language support.

Read report →

NTEU Neural Translation for the EU

Pangeanic, KantanMT, Tilde · Connecting Europe Facility

Large-scale European machine translation project involving multilingual data collection and the creation of more than 500 neural translation engines for EU languages.

NTEU project →

Eco.pangeamt: Industrializing Neural MT

Mercedes García Martínez, Manuel Herranz, Amando Estela, Laurent Bié, A. Franco · IWLT

Early evidence of Pangeanic’s platform approach to industrial neural machine translation, engine management and client-facing MT workflows.

Read paper →

Incremental Adaptation of NMT for Professional Post-editors

Miguel Domingo, Mercedes García Martínez, Álvaro Peris, Alexandre Helle, Amando Estela, Laurent Bié, Francisco Casacuberta, Manuel Herranz

Research into adaptive NMT and professional post-editing, directly connected to Pangeanic’s long-standing work on human feedback, model adaptation and translation productivity.

Read paper →

Learning to Leverage Microblog Information for QA Retrieval

José Miguel Herrera, Barbara Poblete, Denis Parra · ECIR 2018

Research on ranking and information retrieval, relevant to semantic search, QA retrieval and retrieval-based AI workflows.

Springer →

Improving Machine Translation in the E-commerce Luxury Space: A Case Study

J. M. de la Torre Vilariño, Juan Luis García Mendoza, A. Petrucci · EAMT 2023

Case study on domain adaptation in luxury e-commerce translation, relevant to Pangeanic’s Deep Adaptive AI Translation and client-specific MT work.

Towards Automatic Principles of Persuasion Detection Using Machine Learning Approach

L. Bustio-Martínez, V. Herrera-Semenets, Juan Luis García Mendoza et al. · IWAIPR 2023

Research on automatic detection of textual patterns using machine learning, connected to classification, evaluation and content intelligence workflows.

DOI →

Evaluation of a New Representation for Noise Reduction in Distant Supervision

Juan Luis García Mendoza, L. Villaseñor Pineda, D. Buscaldi, L. Bustio-Martínez, F. Orihuela Espina · MICAI 2022

Work on noise reduction for relation extraction, relevant to data quality, annotation consistency and NLP evaluation.

DOI →

Risks of Misinterpretation in the Evaluation of Distant Supervision for Relation Extraction

Juan Luis García Mendoza, L. Villaseñor Pineda, F. Orihuela Espina · Procesamiento del Lenguaje Natural

Evaluation methodology work relevant to Pangeanic’s AI QA, benchmarking, data quality and model evaluation services.

DOI →

From paper to deployment

The commercial value of research continuity

Research becomes commercially useful when it lowers operational uncertainty. Pangeanic’s research record connects directly to production systems that handle multilingual data, machine translation, evaluation, human feedback and controlled deployment.

This is the reason our proof structure links academic work, European projects, scientific events, products and use cases in one sequence.

Research pathways into Pangeanic products

Data collection and corpora Datasets, speech data, low-resource languages, parallel corpora and AI Data Services.
Machine translation research Machine Translation, Deep Adaptive AI Translation, MTQE and automatic post-editing.
NLP and evaluation Evaluation and AI QA, relation extraction, classification, ranking and model diagnostics.
Human feedback and alignment RLHF, human review, preference data and model alignment workflows through PECAT.
Controlled deployment Sovereign AI systems, private infrastructure, on-premises workflows and regulated environments.

Research-driven AI systems

Build from the people who have worked on the data, models and evaluation layers

Pangeanic helps organizations move from multilingual data and research prototypes to production AI workflows with governance, human review and secure deployment.