José Miguel Herrera Maldonado
José Miguel Herrera Maldonado is Head of Machine Learning at Pangeanic. His work connects machine learning, natural language processing, speech corpus generation, information retrieval, multimodal data spaces and applied R&D for multilingual AI systems.
He has been involved in European language technology initiatives including European Language Equality and ELE2, and has led strategic R&D work connected to Madrid's Polytechnic Spanish Data Space (InesData within PERTE de la Lengua) and Pangeanic’s latest CDTI projects. His role is central to the bridge between research, data operations and production-grade multilingual AI.
Machine learning for multilingual AI systems
José Miguel Herrera works at the layer where machine learning research becomes operational infrastructure: speech resources, retrieval systems, data spaces, evaluation sets and multilingual AI workflows that can be deployed beyond laboratory conditions.
Speech corpus generation
His work includes speech corpus generation and data augmentation for the languages of Spain, supporting Pangeanic’s speech datasets, low-resource language data and multilingual ASR evaluation capabilities.
Information retrieval
His research background in information retrieval and QA retrieval connects directly with semantic search, vector retrieval, grounding, question answering and enterprise knowledge systems.
Multimodal data spaces
José Miguel’s work also connects with multimodal data spaces, where text, speech, metadata, documents and structured signals need to be prepared for AI systems with governance and reuse in mind.
From European language equality to applied R&D
All of José Miguel’s work reinforces Pangeanic’s role in European multilingual AI infrastructure. His projects connect digital language equality, speech resources, machine learning, public research programs, R&D, and the operational data layer behind deployable AI.
| Project or area | Role and contribution | Related Pangeanic capability |
|---|---|---|
| European Language Equality and ELE2 | Work connected to digital language equality, speech corpus generation, and data augmentation for the languages of Spain. | EU-funded AI and language technology projects |
| UPM PERTE de la Lengua | Leadership of Pangeanic work connected to the Universidad Politécnica de Madrid and Spain’s strategic language technology initiative. | Research and publications |
| CDTI R&D projects | Leadership of Pangeanic’s latest CDTI work, connecting applied research, machine learning development and production-oriented language technology. | R&D and product provenance |
| Information retrieval research | Research into QA retrieval and ranking, relevant to semantic search, grounding, enterprise knowledge retrieval, and AI question answering. | Publication record |
| Speech and audio data | Work connected to data augmentation, multilingual speech corpora, and low-resource language resources for AI training and evaluation. | Speech datasets |
Research translated into production controls
José Miguel’s work is particularly relevant because Pangeanic’s mission depends on more than model access. Pangeanic is focused on the full journey from data generation, evaluation, retrieval, metadata, speech resources, and measurable quality gates that make multilingual AI systems reliable in real environments. He is completely aligned with Pangeanic's vision for Sovereign AI systems.
Low-resource language data
European AI cannot rely only on well-resourced languages. Speech corpus generation and data augmentation help extend AI capabilities to languages and language varieties that are often digitally underrepresented.
Retrieval and grounding
Information retrieval research is highly relevant to enterprise AI because reliable systems need to retrieve the right evidence, rank it properly and connect outputs to controlled knowledge sources.
Evaluation and data quality
Production AI requires evaluation sets, error analysis, regression testing and continuous quality control. Research becomes valuable when it becomes part of the operating system of AI delivery.
Building the layers that make AI governable
Machine learning leadership at Pangeanic is not limited to model experimentation. It involves the data structures, training material, retrieval logic, evaluation workflows and linguistic resources that allow enterprise AI systems to operate with control.
Data generation
Creating and extending data resources for multilingual AI training, evaluation and adaptation, especially where available public datasets are insufficient.
Model evaluation
Connecting machine learning with measurable quality controls so multilingual systems can be compared, improved and deployed with greater confidence.
Research continuity
Preserving the line between Pangeanic’s research work, EU-funded projects, commercial data operations and production AI platforms.
Selected research and project links
These pages provide technical and institutional context for José Miguel Herrera’s work at Pangeanic.
Pangeanic Research and Publications
Scientific publications and technical reports connecting Pangeanic’s research record with multilingual AI, data generation, MT, retrieval and evaluation.
European projectsEU-funded AI and language technology projects
Pangeanic’s work across European language technology, multilingual infrastructure, speech resources, data spaces and AI research.
ELEEuropean Language Equality contribution
Pangeanic’s participation in the European Language Equality project and its work on digital language equality in Europe.
Speech dataAudio data augmentation techniques and methods
Background on Pangeanic’s work in audio data augmentation and speech data for underrepresented languages.
Speech datasetsSpeech and audio datasets for AI
Speech recordings, transcription, metadata, ASR evaluation resources and multilingual audio data for AI training.
AI Data OperationsThe operating layer between data, models and deployment
Data preparation, annotation, evaluation, governance and human feedback workflows for production AI systems.

