This is a summary of how a German machine translation engine worked wonders for one of our clients translating documentation for the electronics field.
Imagine you or one of your clients have been translating into a given subject for years. Now you see that machine translation can help speed your publication processes and perhaps even reduce cost in translation. You have tried general engines and you are convinced that with some customization work, you could run or own a machine translation engine which would enable you to post-edit.
However, despite the time, cost and investment into translation over the last few years, you look at your data and you are told you do not have enough data, or the data is dirty. It is going to take a few weeks to clean it and even if it was clean and consistent, you do not have enough in-domain content. You cannot play the game, you are stuck with translation memories and general services too eager to “suck your data”…. your publication and translation workflows are 20th century.
This is the untold story of how a small machine translation engine, despite relatively low BLEU has worked wonders in translation productivity from our German department. This is the “how to create machine translation engines without massive amounts of data“.
First of all, let us remember a few points
– Bigger is better: BIG NO. More data in itself does not guarantee better results. Domain-specific data ensures future translations will be handled more efficiently as the engine has more examples to look at. Adding Moby Dick or War and Peace will not in itself improve your engine’s performance unless you plan to cover everything.
Lesson: stick to what you want the engine to do best.
– Unclean data can seriously harm your health and those around you: BIG YES. Small amounts of unclean data can lead the algorithm to think the wrong sequence or chain of words have in fact the highest possibility of matching your new translation query. It can also dilute the statistics of something happening and thus make translation algorithms unsure as to what is best. I will quote an example from years ago when suddenly English text was appearing in translations because it had been left like that in the bilingual training set. Therefore, with no cleaning at all, the engine was learning that some financial terms had to be left in English in the target.
Lesson: spending sometime in cleaning even at a basic level, will produce far better results in future machine translation work.
– Higher BLEU scores mean engines are translating better: BIG NO. Anyody who has had some experience in machine translation will tell you that comparing different BLEU scores between languages and even between different domains does not make any sense. The metric provides an indication after a set (typically 2,000 segments) were extracted and not entered the training corpus. If the test has been conducted properly, never expect scores higher than 60. Some people “cheat” the metric by not deleting the 2,000 segments from the training corpus, or simply do not ensure that some of those sentences also entered the training corpus as repetitions. In fact, no training corpus should contain repetitions, at least for testing purposes, in a first-off trial. Not following these rules will provide unrealistic BLEU scores (some companies claim over 80%!!) that bear no relation to usability and just play with users’ expectations.
Lesson: Make sure you ask the machine translation company to provide proof that the initial tests were conducted according to standard practice, not adding repetitions and making sure the training corpus was “pure and clean”.
Visit our Pangeanic Blog entry if you need a few more tips.
So, what happened to this humble German engine with a very average BLEU score? Well, initially, this does not look like a small engine, it contains around 40 million words in German and over 41 million words in English. It is a bilingual engine showing humble BLEU into German at 36,56 and 45,67 into English – the latter one would be a little more acceptable. To make matters worse, the request was ad-hoc, we had no time to prepare as the translation deadline was too short: the engine was quickly assembled with client material not exceeding 250,000 words into a much larger corpus within the electronics and engineering fields. This is not the typical “MT approach” but all professionals in the translation industry know that time and delivery are pressing issues many times. Later on, the engine underwent typical customization during later 2013 and it was later refined with post-edited material and translator feedback. Internal German translation staff at Pangeanic were involved in the process.
Productivity gains applying machine translation and post-editing
The client request was to deliver 25,741 new words in one week by a single translator and run quality checks with QA Distiller or similar terminology-checking software. Our internal German translator run the job through the engine was to obtain a pre-translation and did post-editing using a popular translation memory interface, recalling MT as penalized entries from the memory. Files were delivered on a Friday afternoon (the files came from Japan to Europe) and were requested the following Friday morning, Japan time (therefore they had to be sent Thursday night European time).
Typically, a human translator can produce around 2,500-3,000 words a day before QA, which would have meant around 10 working days for us to complete the job – something the client was not prepared to accept. The use of controlled language, despite little time allowed for initial customization, was enough to turn around more than twice that figure so that the preliminary translation was finished at the end of day 3, clocking an everage of almost 8,000 words /day – leaving day 4 for a full QA and human proof. Job statistics were as follows:
|Match types:||Words:||Percent:||Equivalent words:|
|95% – 99%||6558||20||1311,6|
|85% – 94%||15||30||4,5|
|75% – 84%||8||50||4|
|50% – 74%||103||50||51,5|
Total words = 33425 CATCount = 27112,60
The actual translation effort would have qualified as just over 27,000 new words, but 95%-99% matches were not retrieved from the German machine translation engine but from the translation memory.
PangeaMT’s settings provide for optimum machine translation engine training and their are set diffirently for each language pair. Furthermore, incoming material can have more weight than existing material in order to prioritize it. These and another features are part of the knowledge Pangea adds to its system supplies, whether SaaS or hosted solutions. Ask our team for a demo if you would like to find out more on how to improve your translation workflows.
For human translation services, follow this link.