by Elia Yuste & Manuel Herranz
At Pangeanic most of our internal globalization production workflows are now driven by open-standard software, be it our own PangeaMT engines or a translation environment, like SwordFish, to use it say for MT post-editing. Both our in-house and outsourced linguists are familiar with this. Still, we keep using translation-memory based software, and in this respect, traditional CAT tools like Trados or SDL 205, 2007 are still perfectly adequate. We have not felt urged by the client side to upgrade to the new Studio 2009. Nevertheless, we have tried and tested some of its features.
Exciting as though it may seem that such significant industry-share software product has incorporated Machine Translation plug-ins, which reflects SDL´s broadening mind towards this kind of technology, many of their users may not be conscious about something that we consider as pretty outrageous. We first noticed that when calling up one of the two MT plug-ins available in SDL Studio 2009, Google Translate and Language Weaver, the system took some time to process. Uhm, was it just translating? Why the latency? – we wondered. Where and to what end was our data being processed?
When opening the system once more, we were astonished to discover that our suspicions were right. The system provides the user with a Question, which in our opinion should be flagged properly as a Real Warning. See the screenshot of this Question in Spanish, which would translate into English like this:
When connecting, you accept to send text to an external machine translation provider over the Internet and you declare that you will neither break your agreement with the data owner nor stop complying with the applicable laws. The usage [output] of this provider will be shown in the bilingual file.
Do you wish to connect?
Yes / No
[tick option] Do not ask me again.
How many translators in a rush will not pay enough attention to this? How many of them, consciously or unconsciously, will be sending out their material (or better said, their clients´) to the two above-mentioned MT provider servers without informing their clients? In one word, the battle for data sources has moved to the translator desktop, picking up output for MT training even if it means breaching NDAs and confidentiality clauses. How many clients are simply not conscious that their sensitive data is being sent over the Internet and used by third parties commercially and for purposes beyond their knowledge? Has a Question enough legal grounds as to provide the translator the knowledge about the consequences of his/her actions? Above all, why is this feature not explained clearly when purchasing the software?
We all know what Google´s mission and philosophy are and that Google Translate has become almost as omnipresent as the search engine itself. Therefore, it comes as no surprise that many translators resort to it via the search engine giant´s website and now also within SDL Studio 2009. The fact that translator input will be used as basis to enhance Google´s engines and the consequences are explained in a long software contract and warning. However, do SDL MT or Language Weaver have the same mission or philosophy as Google? There is a huge difference between providing an online post-editing environment with plain text as i/o and picking up human-approved segments from desktop applications.
Whatever the MT provider´s commercial or development focus is, beware of exposing or disclosing your confidential data to others. This is the kind of awareness that we would like to raise here. You could run into serious trouble if you accept without informing or take extra precautions. If you are aware of this dubious procedure in SDL Studio 2009 and still wish to keep using it, at least take the time to remove any sensitive info from the files to be externally machine translated. Advise your clients and vendors this may happen to their information.
We at Pangeanic strongly believe that MT doesn´t have to compromise your confidentiality. Customized MT solutions, such as PangeaMT, are not only domain-specific but customer-focused! Data that is critical has to be handled with care and a sense of responsibility. We take the time to remove any sensitive information from the data provided by the client before it is used for customized engine training. We appreciate the client´s collaboration to provide us with any pointers about this. If the PangeaMT solution is to be used at the client´s end exclusively or within an intranet, there is usually no especial request from them in this respect. But as soon as the solution is going to be part of a corporate extranet or website, special measures not to disclose sensitive data will be taken.
To sum up, we take your corporate data and, if you are a translator or an LSP, your customer´s data, very seriously. Data gathering, selection and cleaning are essential steps in making a customized MT engine perform more accurately. Sensitive data handling should be part of any MT development or deployment practice. Whether our competitors think the same…? Well, you can now make up your mind about it!