Linear Topic Modeling
Apr 17th, 2013The interest in learning some kind of topics from a corpus of documents started from the publication of Latent Semantic Analysis (LSA) [Deerwester90], also called Latent Semantic Indexing (LSI) in the context of information retrieval.
LSA is a linear method based on the factorization of the document-word matrix
In this model, documents are converted to the bag-of-words format which ignores the ordering of words and only keeps the word counts.
Singular Value Decomposition.
The most common way is to use the Singular Value Decomposition (SVD) of
where
Non-negative Matrix Factorization.
Another common approach is to use use Non-negative Matrix Factorization (NMF) on the document-word matrix