Today we have the pleasure of chatting with Mercedes García, our Chief Research Scientist, about her experience in the field of neural machine translation and her role within Pangeanic.
With a long career in the development and study of factored machine translation models, Mercedes also shares with us her vision of the world of research as a female scientist and expert in the sector.
What constitutes neural machine translation and factored machine translation models and what benefits do they provide?
Neural machine translation consists of automatically translating using models based on neural networks. The architectures of these models are deep and take into account more context than the statistical machine translation methods used in previous technology. Therefore, the translations generated by the neural machine translation models are more fluid and achieve a better quality.
Factored machine translation models use the grammatical and morphological decomposition of words instead of inflected words. This is very useful for morphologically rich languages which use a lot of person, gender and number inflections to conjugate verbs, where we can learn a language’s conjugation simply by using the base word (e.g. the infinitive of a verb such as “ir” in Spanish (go)) indicating its grammatical factors (e.g. first-person singular present indicative) instead of the conjugation “voy” (I go). In this way, we can generate words that we do not have in our data corpus or that are not frequent, achieving a higher quality translation.
As an expert in the field, how would you define the impact of neural machine translation on our daily lives?
Neural machine translation has improved our lives by making it possible for us to translate languages we don’t understand without human help and in a record time that not even all the human translators in the world could do. The quality of neural machine translation is reaching near-human quality in many areas; sometimes it is not possible to distinguish whether the translation has been done by a person or a machine.
This allows you to go and live in or visit another country in which people speak another language you don’t know, and you can understand it without any human help. It also allows corporations to translate large quantities of documents in almost no time at all. I also remember when, during the crisis in Haiti caused by a hurricane, they could not find translators and through the use of machine translation they were able to understand the native population and rescue them.
Let’s talk about Pangeanic. What is your role as leader of the research department?
In the research department we experiment with new techniques to improve our products and to be able to offer new functionalities. We have several lines of research such as neural machine translation with additional external information, summarisation which consists of making text summaries automatically and the insertion of information with huge language models trained with large amounts of data.
We carry out projects with the Polytechnic University of Valencia on multilingual word processing, and we have several projects with European universities, administrations and other international companies in the language industry. We also attend international conferences where we present our work and write scientific papers. In my role as department leader, I coordinate and supervise all these ambitious activities and projects.
PangeaMT, as the scientific division of Pangeanic, is constantly developing new technologies. What would you say is the most ambitious project of your team?
We would like to be able to train huge models that allow for superior quality and generate very fluid text that makes it very difficult for someone to distinguish whether it was written by a person or a machine.
As a leader in a research field such as neural machine translation, what has your experience been like in terms of studying and making progress in such a specialised field that is little known to the general public?
I have more than 10 years of work experience in machine translation. I started studying for a master’s degree in artificial intelligence at the Polytechnic University of Valencia and working at the ITI, which is the technological institute of computer science in Valencia, where Pangeanic was one of our clients with a very ambitious project which was to bring statistical translation models to the language industry, the most innovative at that time. Later, I continued doing research with the Group of Natural Language Engineering and Pattern Recognition in the computer science department and went on to the Copenhagen Business School in Denmark where I continued my training in translation technology courses and participated in a European project on cognitive research on a computer-assisted translation tool. I continued my studies by doing my doctorate at the University of Le Mans in France on neural machine translation using factored models, taking specialised courses in deep learning in Canada and participating in conferences such as the International Workshop on Machine Translation.
Finally, I returned to Valencia to work at Pangeanic where I continue to write articles and train myself in new methods for neural machine translation as it is a field in continuous development, where big companies such as Google, Microsoft, Apple, Amazon, Facebook, etc. contribute.
What is your opinion on research in this sector? Are there enough resources and forums that allow you to grow as a professional? (At national, European and international level)
Yes, it is a very interesting and ambitious sector, with many resources at both a global and European level (because there are many official languages in Europe), and also at a national level. Nowadays, there are many forums and courses open on the web that allow you to receive training and share your research. In addition, the same architecture of neural machine translation models is being used for other natural language processing tasks, such as summarisation, which consists of making text summaries automatically, opening up new lines of research and business possibilities.
Tell us about your experience as a female scientist, from your training as a doctor at the computer science laboratory of the University of Le Mans in France, to the publication of scientific papers and specialisation in a field such as neural machine translation.
This world is exciting because it is improving by leaps and bounds; there are times when each month you find a new architecture that works better than the previous one. In the computer lab at the University of Le Mans my professor had some connections with Facebook, and we had very powerful machines that allowed us to create and experiment with many kinds of neural models. Publishing articles in international journals gives you a lot of visibility and publishing in conferences allows you to meet people from all over the world who do research in this field.
Is the scientific field still largely a male domain? Do you think that situation is changing? If so, why do you think that is?
I studied computer engineering, which is a very masculine world. The vast majority of students are men and the same happens with computer scientists in technology companies and administrations. This makes it more difficult for a woman and you have to work harder. In university education we can see more women, but there is still a lot to change. There is a stigma that does not make it an attractive field for women. In other engineering fields you can see more women and new careers such as biotechnology or the degree in data engineering. Simply by removing the words “computer science”, you already have more women. It is important to realise that women are essential in the development of technology because otherwise it will become designed for only men, and we’re talking about the future that awaits us all. In general, there are many female scientists in Spain in other fields but in other countries they are conspicuous by their absence, and in some countries women do not even have the option of studying.
What is your message to future generations of female scientists and researchers?
We must continue to be present, to be more and more visible and to participate in the development of new technologies that allow us to achieve a better world.
If we look into the future of neural machine translation, how do you think technology will evolve? How far do you think innovation in translation and artificial intelligence will go?
Neural models will continue to improve and learn with more and more data and more and more parameters as the hardware improves and all the mathematical calculations required can be done. Machine translation will continue to improve but human supervision will continue to be needed for the foreseeable future to get good translations in more complicated texts such as literature or poetry. It is also expected that there will be more interaction between human and machine translators feeding off each other.
In the end, machines will surely be able to learn by themselves by observing data and experience, changing the world and creating other kinds of jobs for people.