Skip to contents

Introduction

  We illustrate the close relationship between the PSD model, the Poisson NMF model, the multinomial topic model and the LDA model, which can optimize the algorithm.

Details

  The PSD model is closely related to the multinomial topic model. More precisely, The PSD model (log-likelihood) is similar to the probabilistic latent semantic analysis (PLSA) model (Hofmann 2001), and the PSD model (Bayesian posterior) is similar to the latent dirichlet allocation (LDA) model (Blei, Ng, and Jordan 2003). They are very similar in the representation of the model, the derivation of the algorithm and so on. Even using the algorithm of the multinomial topic model to fit the diploid genotype data can get good results. But this is practical, not strictly mathematically equivalent. In fact, the multinomial topic model have been shown to be equivalent to the Poisson NMF model (Carbonetto et al. 2021) and are widely used to analyze the population structure of single-cell genes such as RNA.

Literature Cited

Blei, David M, Andrew Y Ng, and Michael I Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3 (Jan): 993–1022.
Carbonetto, Peter, Abhishek Sarkar, Zihao Wang, and Matthew Stephens. 2021. “Non-Negative Matrix Factorization Algorithms Greatly Improve Topic Model Fits.” arXiv Preprint arXiv:2105.13440.
Hofmann, Thomas. 2001. “Unsupervised Learning by Probabilistic Latent Semantic Analysis.” Machine Learning 42 (1): 177–96.