The COVID-19 pandemic crisis has affected countless aspects of society, from our day-to-day lives and the way the vast majority of companies work to the way we relate to each other. COVID-19 posed a great challenge to the scientific community, including the research field, which was plunged into a situation that was progressing differently from day to day in multiple locations around the globe and entailed an overwhelming need for data extraction and collection.
One of these needs was to aggregate and summarize sources of information to resolve inconsistencies and avoid crucial misinformation in order to, in this particular crisis, make further progress in the fight against the virus. Initiatives such as Covid-19 MLIA Eval arise at critical times like this, when professionals from all over the world join forces for the benefit of all.
Mercedes Garcia, Pangeanic’s Research Department Leader, joins us today in this interview to teach us more about Covid-19 MLIA Eval.
What is Covid-19 MLIA Eval and what is its mission?
Covid-19 MLIA Eval aims to organize a community assessment effort to accelerate the development of resources and tools to improve multilingual information access (MLIA) in the current crisis situation. This initiative has three natural language processing tasks: 1) information extraction, 2) multilingual semantic search and 3) machine translation. This action was supported by the European Commission, ELRC, ELRA, CLARIN and CLEF (European language resource institutions).
How did this initiative come about? How did you come to be part of it and manage the machine translation area?
During the first COVID-19 quarantine (March 2020), when we could not leave our homes, natural language processing researchers also thought about how we could help the crisis by teleworking. Different research groups from all over Europe (ELDA, CLARIN and LIMSI in France, the University of Padua in Italy, ILSP in Greece, DFKI in Germany and PRHLT and Pangeanic in Spain) quickly got in touch to form this initiative.
Pangeanic, as a language technology company specializing in translations, and the PRHLT research group of the Polytechnic University of Valencia, as machine translation researchers, are the organizers of the machine translation task of the Covid-19 MLIA Eval initiative. As head of the research department, I am in charge of leading the Pangeanic initiative
We see that the team is made up of professionals from a multitude of countries and disciplines. What has the experience of being part of this initiative been like, both personally and professionally?
Yes, the team is made up of researchers from all over Europe, as mentioned previously. The experience is very enriching. We’re writing scientific articles on the initiative and carrying out workshops to present the work that has been done.
What have been the biggest challenges and obstacles in carrying out this project?
The most difficult thing is preparing the data and organizing the teams in record time for such a new topic. We did not have collected data because, although it deals with a very important topic, it started in 2020.
Can you tell us about the achievements and objectives that have been met successfully in the project?
In the first round we organized a workshop where participants presented their machine translation systems. The workshop for the second round was held on February 17 from 15:00 to 18:00 CET.
In addition, we are writing a joint scientific paper on the initiative and another one specific to the machine translation task. In this project we have experimented with specific data on COVID-19, and we have concluded that the most successful models have been the multilingual ones, which used data from multiple language pairs to obtain more information.
Perhaps our public has not been aware of this initiative until now. What do you think, as an industry expert, is the greatest benefit of this project for the general public and companies? What about companies in the language processing sector?
The systems that have been developed have been optimized for the task and have performed well. The articles describing these systems are publicly available. The general public and companies can benefit from the findings of this initiative.