edu.stanford.nlp.cluster
Class LSSA

java.lang.Object
  |
  +--edu.stanford.nlp.cluster.AbstractClusteringMethod
        |
        +--edu.stanford.nlp.cluster.LSSA
All Implemented Interfaces:
ClusteringMethod

public class LSSA
extends AbstractClusteringMethod

Latent State Sequence Analysis. A Clustering Method based on Hidden Markov Models


Field Summary
 
Fields inherited from class edu.stanford.nlp.cluster.AbstractClusteringMethod
clusters, db, method, nc, nd, nt
 
Constructor Summary
LSSA()
          allocates memory for arrays, initializes all values
 
Method Summary
 ScientificNotationDouble calculate_S(int z)
          calculates sum_over_t(gamma(z)), since it takes a long time to calculate, and it is used many times it is used as the denominator each calculation of a and each calculation of b
 SimpleClusters cluster(DataCollection data, int num_clusters)
          Performs clustering algorithm, and populates Clusters
 ScientificNotationDouble gamma(int i, Entry entry)
           
 ScientificNotationDouble new_a(int i, int j)
           
 ScientificNotationDouble new_b(int i, int w)
           
 ScientificNotationDouble new_pi(int i)
           
 void new_pr_d_z()
           
 void new_pr_z()
          sets pr_z for all clusters using the transition probability matrix
 void oneIteration()
           
 void oneIteration(int z)
           
 ScientificNotationDouble p(Entry entry, int i, int j)
           
 void perform_n_iterations(int n)
           
 void populate_backward_trellis()
          populate backward trellis given the sequence of observations in the document collection being clustered the algorithm for this is presented on page 331 of Manning and Schutze (1999)
 void populate_forward_trellis()
          populate forward trellis given the sequence of observations in the document collection being clustered the algorithm for this is presented on page 327 of Manning and Schutze (1999)
 void populate_trellis()
           
 void setTermCounts()
          helper function for new_pr_d_z sets term counts for all terms only needs to be called once, term counts don't change with each iteration
 ScientificNotationDouble test()
           
 
Methods inherited from class edu.stanford.nlp.cluster.AbstractClusteringMethod
cluster, evaluate, evaluate, initialize, toString, toXMLString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

LSSA

public LSSA()
allocates memory for arrays, initializes all values

Method Detail

populate_trellis

public void populate_trellis()

populate_forward_trellis

public void populate_forward_trellis()
populate forward trellis given the sequence of observations in the document collection being clustered the algorithm for this is presented on page 327 of Manning and Schutze (1999)


populate_backward_trellis

public void populate_backward_trellis()
populate backward trellis given the sequence of observations in the document collection being clustered the algorithm for this is presented on page 331 of Manning and Schutze (1999)


test

public ScientificNotationDouble test()

p

public ScientificNotationDouble p(Entry entry,
                                  int i,
                                  int j)

gamma

public ScientificNotationDouble gamma(int i,
                                      Entry entry)

calculate_S

public ScientificNotationDouble calculate_S(int z)
calculates sum_over_t(gamma(z)), since it takes a long time to calculate, and it is used many times it is used as the denominator each calculation of a and each calculation of b


new_pi

public ScientificNotationDouble new_pi(int i)

new_a

public ScientificNotationDouble new_a(int i,
                                      int j)

new_b

public ScientificNotationDouble new_b(int i,
                                      int w)

oneIteration

public void oneIteration(int z)

new_pr_z

public void new_pr_z()
sets pr_z for all clusters using the transition probability matrix


setTermCounts

public void setTermCounts()
helper function for new_pr_d_z sets term counts for all terms only needs to be called once, term counts don't change with each iteration


new_pr_d_z

public void new_pr_d_z()

oneIteration

public void oneIteration()

perform_n_iterations

public void perform_n_iterations(int n)

cluster

public SimpleClusters cluster(DataCollection data,
                              int num_clusters)
Description copied from interface: ClusteringMethod
Performs clustering algorithm, and populates Clusters



Stanford NLP Group