Text and Transaction Data Processing
The global scientific community uses topic modeling technologies for solving applied data analysis tasks. The market is not flexible, fast and efficient tool for building a hierarchical topic models of multimodal data. You need to create a tool for rapid analysis and prototyping, which can be used as an add-on over the BigARTM topic modeling library.
Input Data
- Text information about the collection of documents
- Transactional information about the behavior of customers in
the system - Additional multimodal information
Output Data
- Multimodal model of soft hierarchical clustering
- Topic data flow visualizer
- Flexible tool selection model learning strategies
Main Results
Flexible tool for building topic hierarchical models:
- Text and transactional data
- Automatic selection of modality weights
- Adjustment of regularizer weights
- Topic balancing by power
- Selection of topics in the stream of new data
Impact on research process
- Reducing the speed of prototyping on new data by 60%
Increase the speed of analysis of the results of topic modeling by 40%
- Getting the baseline 90% of maximum quality automatically
The final quality of the semantic core selection:
Quality criterion | Quality |
---|---|
The proportion of interpreted topics | 80 % |
The number of hierarchy levels | 3 |
Proportion of automated hyperparameters | 70 % |
Proportion of errors | 60 % |
Quality built soft clustering system
Clustering quality criterion | Quality |
---|---|
Accuracy | 90 % |