Head of Machine Learning

José Miguel Herrera Maldonado

José Miguel Herrera Maldonado is Head of Machine Learning at Pangeanic. His work connects machine learning, natural language processing, speech corpus generation, information retrieval, multimodal data spaces and applied R&D for multilingual AI systems.

He has been involved in European language technology initiatives including European Language Equality and ELE2, and has led strategic R&D work connected to Madrid's Polytechnic Spanish Data Space (InesData within PERTE de la Lengua) and Pangeanic’s latest CDTI projects. His role is central to the bridge between research, data operations and production-grade multilingual AI.

Profile

Machine learning for multilingual AI systems

José Miguel Herrera works at the layer where machine learning research becomes operational infrastructure: speech resources, retrieval systems, data spaces, evaluation sets and multilingual AI workflows that can be deployed beyond laboratory conditions.

Speech corpus generation

His work includes speech corpus generation and data augmentation for the languages of Spain, supporting Pangeanic’s speech datasets, low-resource language data and multilingual ASR evaluation capabilities.

Information retrieval

His research background in information retrieval and QA retrieval connects directly with semantic search, vector retrieval, grounding, question answering and enterprise knowledge systems.

Multimodal data spaces

José Miguel’s work also connects with multimodal data spaces, where text, speech, metadata, documents and structured signals need to be prepared for AI systems with governance and reuse in mind.

Research and European projects

From European language equality to applied R&D

All of José Miguel’s work reinforces Pangeanic’s role in European multilingual AI infrastructure. His projects connect digital language equality, speech resources, machine learning, public research programs, R&D,  and the operational data layer behind deployable AI.

Project or area Role and contribution Related Pangeanic capability
European Language Equality and ELE2 Work connected to digital language equality, speech corpus generation, and data augmentation for the languages of Spain. EU-funded AI and language technology projects
UPM PERTE de la Lengua Leadership of Pangeanic work connected to the Universidad Politécnica de Madrid and Spain’s strategic language technology initiative. Research and publications
CDTI R&D projects Leadership of Pangeanic’s latest CDTI work, connecting applied research, machine learning development and production-oriented language technology. R&D and product provenance
Information retrieval research Research into QA retrieval and ranking, relevant to semantic search, grounding, enterprise knowledge retrieval, and AI question answering. Publication record
Speech and audio data Work connected to data augmentation, multilingual speech corpora, and low-resource language resources for AI training and evaluation. Speech datasets
AI systems that can be evaluated

Research translated into production controls

José Miguel’s work is particularly relevant because Pangeanic’s mission depends on more than model access. Pangeanic is focused on the full journey from data generation, evaluation, retrieval, metadata, speech resources, and measurable quality gates that make multilingual AI systems reliable in real environments. He is completely aligned with Pangeanic's vision for Sovereign AI systems

Low-resource language data

European AI cannot rely only on well-resourced languages. Speech corpus generation and data augmentation help extend AI capabilities to languages and language varieties that are often digitally underrepresented.

Retrieval and grounding

Information retrieval research is highly relevant to enterprise AI because reliable systems need to retrieve the right evidence, rank it properly and connect outputs to controlled knowledge sources.

Evaluation and data quality

Production AI requires evaluation sets, error analysis, regression testing and continuous quality control. Research becomes valuable when it becomes part of the operating system of AI delivery.

Technical leadership

Building the layers that make AI governable

Machine learning leadership at Pangeanic is not limited to model experimentation. It involves the data structures, training material, retrieval logic, evaluation workflows and linguistic resources that allow enterprise AI systems to operate with control.

Data generation

Creating and extending data resources for multilingual AI training, evaluation and adaptation, especially where available public datasets are insufficient.

Model evaluation

Connecting machine learning with measurable quality controls so multilingual systems can be compared, improved and deployed with greater confidence.

Research continuity

Preserving the line between Pangeanic’s research work, EU-funded projects, commercial data operations and production AI platforms.