Topic modelling

Text collection is set of documents, which of them is multiset of terms. Terms are elements of vocanulary — finite set W.

Probabilistic topic model is mathematical model that describes each document and each term as discrete probabilistic distribution over set of topics T.

To construct such distributions, one must represent matrix F of frequences of words in documents as product of two matrices Φ and Θ. Then, columns of Φ will be distributions of words over topics, and rows of matrix Θ will be distributions of documents over topics.

One of methods to do suh decompostion is ARTM (Additive Regularization of topic models). It is implemented in BigARTM open source libraryl

VisARTM

VisARTM is an interface for BigARTM. It's main purpose is visualizig topic models. It is aimed for two groups of users: researchers who build topic models themselves and want to visualize them for research purposes and those users who want construct topic models without programming.

Main features of this service:

  • Uploading and preprocessing text collections.
  • Uploading prepared topic models.
  • Automatic building topic models with BigARTM.
  • Automatic topic naming.
  • Automatic topic arranging (so-called topic spectrum).
  • Visualiztion of topic models:
    • Visualization of topic distribution in document.
    • Visualization of topic as ranked lists of words and documents.
    • Temporal visualizations.
    • Hierarchical visualizations.
  • Search
  • Assessment framework
  • Automatized research framework
  • Converting tools