edu.stanford.nlp.cluster
Class PLSI

java.lang.Object
  |
  +--edu.stanford.nlp.cluster.AbstractClusteringMethod
        |
        +--edu.stanford.nlp.cluster.PLSI
All Implemented Interfaces:
ClusteringMethod

public class PLSI
extends AbstractClusteringMethod

Probabilistic Latent Semantic Indexing. PLSI (Hofmann, 1999) uses Expectation Maximization to optimize the log-likelihood of a generative model for documents.


Field Summary
 
Fields inherited from class edu.stanford.nlp.cluster.AbstractClusteringMethod
clusters, db, method, nc, nd, nt
 
Constructor Summary
PLSI()
          Sets values for db, nt, nd;
 
Method Summary
 SimpleClusters cluster(DataCollection data, int num_clusters)
          Perform default number of iterations (60)
 double Epr_z_dw(int z, int d, int w)
          Expectation Step: calculates posterior probabilities for latent variables z based on current estimates of parameters.
 void initialize()
          Initialize each class with arbitrary priors P(z), P(w|z), P(d|z).
 void oneIteration()
          Maximization Step
 Cluster oneIteration(int z)
          Re-estimates parameters using posterior probabilities given in the E-step.
 void perform_n_iterations(int n)
          Loops through n iterations of EM
 
Methods inherited from class edu.stanford.nlp.cluster.AbstractClusteringMethod
cluster, evaluate, evaluate, initialize, toString, toXMLString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

PLSI

public PLSI()
Sets values for db, nt, nd;

Method Detail

initialize

public void initialize()
Initialize each class with arbitrary priors P(z), P(w|z), P(d|z). these priors are set randomly, but ensure that sum_over_z(P(z))=1; sum_over_zw(P(w|z))=1; and sum_over_zd(P(d|z))=1


Epr_z_dw

public double Epr_z_dw(int z,
                       int d,
                       int w)
Expectation Step: calculates posterior probabilities for latent variables z based on current estimates of parameters.

Returns:
P(z|d,w), calculated from the priors in the aspect model

oneIteration

public Cluster oneIteration(int z)
Re-estimates parameters using posterior probabilities given in the E-step.


oneIteration

public void oneIteration()
Maximization Step


perform_n_iterations

public void perform_n_iterations(int n)
Loops through n iterations of EM


cluster

public SimpleClusters cluster(DataCollection data,
                              int num_clusters)
Perform default number of iterations (60)



Stanford NLP Group