What does Pangeanic's automatic text classification consist of?
It is a set of modules that implement common classification tasks. This can be related to text classification, or function as a separate, high-level element.
The various details are flexible: for example, you can choose which categorization algorithm to use, which features (words or other kinds) of the documents should be used (or how to automatically choose these features), in which format the documents are in, etc.
How do I customize my module?
The process of customizing this module usually involves obtaining a collection of pre-categorized documents from the organization. Pangeanic trains its deep neural networks to recognize the characteristics of each document and differentiate it from others. This creates a "knowledge graph" representation, which trains the categorizer to recognize a particular set of knowledge. This trained set is saved and can be used for performing queries.
There are several ways to perform queries. The top-level text classification module provides an overall category for the operations of the top-level category classifier. You can use the interfaces of the individual categories within each of them.
Accuracy of text classification
Our semantic tool automatically classifies documents by content and organizes them into general categories such as Eurovoc, or can be customized according to your organization's structure, terminology and processes. Categories can be legal, compliance, human resources, research and development, accounting and finance, reports(sales, management, etc.), customer feedback, newsletters, and many more. The definition of the categories can be freely chosen by the user, as it is not restricted by the categorization algorithms.
Pangeanic's text classification is an ideal solution for:
- Managing business/knowledge content
- Categorizing financial documentation
- Pre-classifying secure documents
- Evaluating new trends in business, science, and technology
- Managing enterprise information
- Identifying and analyzing the state of patent techniques
- Automated assistance systems
- The Pangeanic Categorizer is available as a server application for use on-site or in SaaS
The algorithms of the Pangeanic Categorizer are based on deep Machine Learning techniques. Our approach to document categorization is executed in two phases: training and prediction.
In the training stage, the Pangeanic Categorizer builds a classifier by learning a set of model documents for each category. Its learning algorithm uses a wide range of semantic features extracted from documents:
- Words with grammatical category labels
- Noun phrases and their syntactic dependence
- Complex semantic relationships detected in our linguistic processor
This training process creates models that in the prediction phase use the vector space model to categorize the documents. Each text received is compared with the semantic characteristics of the model category and the degree of proximity between them is calculated. The document is assigned to the category with the highest relevance value.