CEF chooses the Europeana Translate project so that Pangeanic’s ECO platform (machine translation, anonymization, etc.) can translate the content and metadata of over 25 million records available in the European digital library.
In the year 2000, the digital preservation of the heritage of the European Community was launched, which led to the creation of Europeana. The digitization of millions of documents provided by renowned cultural institutions from the 27 member states of the European Union sets out the aim of facilitating access to Europe’s cultural and scientific heritage.
The platform is currently available in 30 languages of Europeana’s cultural community, which are English, Spanish, German, French, Portuguese, Bosnian, Bulgarian, Catalan, Czech, Danish, Slovakian, Slovenian, Estonian, Finnish, Greek, Dutch, Hungarian, Irish, Icelandic, Italian, Latvian, Lithuanian, Maltese, Norwegian, Polish, Romanian, Russian, Swedish, Ukrainian, and Basque. It also stores more than 25 million documents in 45 different languages.
As a result of this, Europeana Translate was born. In order to provide access to all of this cultural heritage and remove the source language barriers, this project aims to connect the digital service infrastructures (DSI) of the platform with those of machine translation.
As a result, the project will translate the metadata of these millions of records available on Europeana into English and will send them back to the Europeana core service platform as enrichments.
Pangeanic, as a technology partner of the project, will contribute by bringing the experience it already gained in the customization of machine translation engines from the previous NTEU project for European Public Administrations. The machine translation engines were developed by Pangeanic’s technology division, PangeaMT. The aim of the project is to achieve translations of the cultural content and metadata with a KPI close to human parity (90%), which will be verified by the national content managers and annotators themselves. This will enable hundreds of millions of words to be translated in a scalable way in the future and will ensure that Europe’s cultural content is available.
As a team of language technology professionals, Pangeanic works on a daily basis on the implementation of language processing tools that allow to structure data so that humans or machines can extract actionable insights. These tools allow companies and institutions from all industries (law, finance, culture and tourism, among many others) to improve their services by gaining greater knowledge and saving time and resources when managing their projects.
Deep Adaptive technology for machine translation allows to clone engines that learn from content previously generated by the user, or similar content, and that imitate vocabulary and style. Each learning level produces deep learning algorithms that enable the data to be weighted and allow the engines to become a fundamental tool for those users that have specialized terminology and/or large-scale language content generation or processing needs.
This technology will help Europeana, which is currently in its fourth DSI operational cycle, to overcome language barriers and be able to offer European citizens and institutions access to its content according to their multilingual needs.
This project falls within the framework of Europe’s commitment to becoming one of the most competitive and dynamic knowledge-based economies. With just one click, it offers the world a cultural background that includes books, paintings, films, audio material, maps and newspapers, as well as other kinds of highly valuable records that soon we will be able to enjoy and understand with no restrictions.