Text and Transaction Data Processing

The global scientific community uses topic modeling technologies for solving applied data analysis tasks. The market is not flexible, fast and efficient tool for building a hierarchical topic models of multimodal data. You need to create a tool for rapid analysis and prototyping, which can be used as an add-on over the BigARTM topic modeling library.

Input Data

  • Text information about the collection of documents
  • Transactional information about the behavior of customers in
    the system
  • Additional multimodal information

Output Data

  • Multimodal model of soft hierarchical clustering
  • Topic data flow visualizer
  • Flexible tool selection model learning strategies

Main Results

Flexible tool for building topic hierarchical models:

  • Text and transactional data
  • Automatic selection of modality weights
  • Adjustment of regularizer weights
  • Topic balancing by power
  • Selection of topics in the stream of new data

Impact on research process

  • Reducing the speed of prototyping on new data by 60%
  • Increase the speed of analysis of the results of topic modeling by 40%

  • Getting the baseline 90% of maximum quality automatically

The final quality of the semantic core selection:

Quality criterion Quality
The proportion of interpreted topics 80 %
The number of hierarchy levels 3
Proportion of automated hyperparameters 70 %
Proportion of errors 60 %

Quality built soft clustering system

Clustering quality criterion Quality
Accuracy 90 %