Our language detector can successfully be used to:
Process text before Machine Translation
Pre-enhance the text and improve the quality of the received data when training algorithms
Organize data (speech to text, documents, etc.) prior to other processes
Extract bilingual texts from online resources for machine translation
Retrieve, group, and understand relevant information (user texts, e-mails, etc.) in a multilingual environment
Pangeanic's language detector accurately determines both the language of the entire document and the language of each fragment, paragraph or section
Our language detector combines statistical and neural technologies to obtain the best recognition results. Our own algorithm is based on a mathematically sound model of the vector spacing algorithm.
We create a multidimensional space with vectors that analyze the content of the documents and use the notion of n-grams to compute the frequencies. The algorithm analyzes the positions of the required vectors in space to determine their similarity.
Finally, the combined results of the algorithm are corrected using special linguistic rules developed by our team of expert linguists.
For evaluation purposes, we created a demo page to detect the most popular languages with a language identification accuracy of 95% to 99% (typical competitor results: 86% to 96%). The average processing speed was over 8000 KB/s.