CORPORATE LANGUAGE SOLUTION
Processing large amounts of data in multiple formats
Multilingual voice recordings, texts and documents are processed to generate a single documental database that can be searched using NLP to extract the relevant information constituting the facts of a legal process.
As part of the proceedings of a legal case, the contents of personal computers, voice recorders and printed texts are stored.
The transcribed language resources were the equivalent of 350 times the complete works of Shakespeare. Finding the needles in the multilingual haystack required an automated solution.
The Corporate Language Solution is built up with different neural networks in charge of processing speech and text. The processes involve:
Transcription, speech to text conversion.
Translation, generating monolingual (English) versions of all language resources.
Sentiment Analysis, detecting the positive/negative relevance of text excerpts.
Summarization, paragraph abstraction into short sentences
Indexing, locating and referencing entities (people, organizations, dates, locations, money amounts, key words) in the batch of documents.
Categorization, ranging and sorting documents according to class, category and relevance.
How does it work?
The Corporate Solution runs in 2-3 servers at the customer premises. No interaction with external third parties is needed, and the information is contained at the customer Data Center.
Multiformat resources are loaded in the input area and the firstprocess transcribes resources into single format text:
◦ Images / Raster files are OCRed
◦ Speech is transcribed and actors are detected and referenced
◦ PDF, Word and PowerPoint formats are all converted to plain text files.
Language Detection and Translation: The language at paragraph level is detected, and if it is not English the text is sent through a neural network specifically designed to translate from the source language into English
Specific neural networks receive the monolingual input and produce records with the relevant results (sentiment, importance, referenced entities…) and a graph-model that can later be used to find references in the raw data.
Support services are in charge of managing the collaborative workflow by pipelining the data to and from neural networks and distributing the load to efficiently leverage the hardware and software resources.
Sometimes there’s simply too much data and there’s no way the relevant information can be found using traditional resources.
Cost and delivery time are both strong reasons to consider an automated analysis of language resources.
We like Pangeanic's work ethos and professionalism. They actively listen to their clients - and that helps them be the best every day to provide tailored language solutions. From my point of view, that's one of their greatest qualities
Pangeanic makes the translation process easy... And they provide a friendly, fast translation service. Creating a database for all our translations was particularly useful so we could recycle translations and re-use content in other occasions and/or similar jobs.
The quality is excellent as usual. The source has been changed many times during the translation. Pangeanic was quick to respond to the changes and it was helpful.