Try our custom LLM Masker


Classify text and documents automatically. Categorization and classification solve knowledge bottlenecks and tap into information silos

Manually classifying data, from customer emails to financial or insurance claims is time-consuming and error-prone. Our AI-powered text classification solution can help you automate this process, freeing up your time so you can focus on other tasks.

Talk to an expert


Automate text classification with our AI-powered solution

Do you have a large volume of emails or documents that need to be classified? No two needs are alike, that is why we build bespoke AI-powered text classification solutions for each client according to their taxonomy and needs. We help you automate tedious processes that don't scale. We use machine learning to learn the patterns in your data and introduce all our knowledge as computational linguists. Once our AI has learned these patterns, it can automatically classify new emails or documents into the appropriate categories.

What does Pangeanic's automatic text classification consist of?

It is a set of modules that implement common classification tasks. This can be related to text classification, or function as a separate, high-level element.

The various details are flexible: for example, you can choose which categorization algorithm to use, which features (words or other kinds) of the documents should be used (or how to automatically choose these features), in which format the documents are in, etc.


How do I customize my module?

The process of customizing this module usually involves obtaining a collection of pre-categorized documents from the organization. Pangeanic trains its deep neural networks to recognize the characteristics of each document and differentiate it from others. This creates a "knowledge graph" representation, which trains the categorizer to recognize a particular set of knowledge. This trained set is saved and can be used for performing queries.


There are several ways to perform queries. The top-level text classification module provides an overall category for the operations of the top-level category classifier.  You can use the interfaces of the individual categories within each of them.

Accuracy of text classification

Our semantic tool automatically classifies documents by content and organizes them into general categories such as Eurovoc, or can be customized according to your organization's structure, terminology and processes. Categories can be legal, compliance, human resources, research and development, accounting and finance, reports(sales, management, etc.), customer feedback, newsletters, and many more. The definition of the categories can be freely chosen by the user, as it is not restricted by the categorization algorithms.


Pangeanic's text classification is an ideal solution for:

  • Managing business/knowledge content
  • Categorizing financial documentation
  • Pre-classifying secure documents
  • Evaluating new trends in business, science, and technology
  • Improve your spam filtering
  • Organize your email inbox
  • Managing enterprise information
  • Identifying and analyzing the state of patent techniques
  • Automated assistance systems
  • The Pangeanic Categorizer is available as a server application for use on-site or in SaaS
  • Categorize your documents for easier retrieval
  • Gain insights into your customer data

Categorization technology

The algorithms of the Pangeanic Categorizer are based on deep Machine Learning techniques. Our approach to document categorization is executed in two phases: training and prediction.

In the training stage, the Pangeanic Categorizer builds a classifier by learning a set of model documents for each category. Its learning algorithm uses a wide range of semantic features extracted from documents:

  • Words with grammatical category labels
  • Noun phrases and their syntactic dependence
  • Complex semantic relationships detected in our linguistic processor

This training process creates models that in the prediction phase use the vector space model to categorize the documents. Each text received is compared with the semantic characteristics of the model category and the degree of proximity between them is calculated. The document is assigned to the category with the highest relevance value.

Do you want to automatically categorize documents with knowledge classifiers?

Talk to an expert