by Elia Yuste
TAUS has been tracking the exciting experiences of companies pioneering in a radical new MT engine training space for the last year or so. Pangeanic is one of the most outstanding cases, and so we were advertised as the first LSP to create a new business stream with TAUS Data Association (TDA) data earlier on this year. Then, PangeaMT, Pangeanic´s technological division geared at customized MT solutions and consulting, was invited to take part in the proof-of-concept of TAUS MT Trainer and present its results on the occasion of the TAUS Executive Forum in Copenhagen in late May 2010.
The idea behind this MT Trainer, a web-based facility from TAUS TDA that will materialize within the current year, is twofold: first, to foster pro-active adoption of TDA data for MT engine training; and second, to connect MT service commissioners and providers under the TAUS umbrella, whereby the former may submit their data files (reference files for engine training and files for translation) and the latter would turn around the MT output in a short time. The MT Trainer has a counterpart facility called MT Evaluator, which lets the commissioner or client evaluate the uploaded MT output by means of standard metrics-based figures.
To test the viability of such double initiative, the so-called MT Trainer pilot was discussed among the selected partners and then launched about two weeks before the Copenhagen meeting. Would it be possible to automate workflow for MT customization using client data and data from TDA? On the one hand, Adobe, eBay and McAfee were the three prospective MT commissioners seeking trained engines and metrics to measure the quality of output. On the other, Languagelens, PangeaMT, and Tilde were the three selected MT companies. We all could turn around customized MT engines in 24 hours or less, from which the output was measured for quality using BLEU scores. In the specific case of Pangeanic, the challenges of speed and acceptable quality could be met without any problem.
If these two TDA service offerings, the MT Trainer and Evaluator, get well accepted and regularly deployed by members, it will instigate more data uploads/downloads and reinforce the usefulness and applicability of relevant, domain-specific data sharing for MT training. This should also lead to a much more desired increase in memberships and overall member pro-activity within TAUS. For Pangeanic it will mean more visibility in the MT arena, a quicker access to high-calibre clients, whose content and domain specificities are btw. already familiar to us, and a controlled workspace to offer our MT services.
Apart from the MT Trainer & Evaluator proof-of-concept, the Copenhagen event gave rise to lots of fruitful discussions among MT practitioners and newcomers. In our case, apart from describing the ins and outs of our engine training experience for eBay under the MT Trainer pilot scenario, we engaged in interesting conversations about how PangeaMT has been able to overcome Moses shortcomings. Our TMX filter or inline mark-up parser were acclaimed features that are much needed in our industry and have made us stand out of the (S)MT crowd.
Other takeaways of the TAUS Copenhagen event were the convergence of MT, open platforms and contexts of application (e.g. in corporate support), learning more about TAUS TDA member experiences, and gathering collective wisdom resulting from future-projecting, table discussions on a number of hot language industry topics. A full report about the event can be found here and also downloaded from the TAUS website.
Your Machine Translation Customization Solutions