|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--edu.stanford.nlp.ie.pnp.PnpClassifier
Statistical classifier of unseen proper noun phrases. Supports training and testing on data files. Uses an n-gram word-length model, and n-gram character model, and a word model.
Standard usage:
PnpClassifier(String trainingFilename)
.
getLogProb(String line,int category)
.
getBestCategory(String line)
.
generateLine(int category)
Field Summary | |
static int[] |
charBinCutoffs
|
static int |
cn
|
static char |
END_SYMBOL
|
static int[] |
lengthBinCutoffs
|
static int |
ln
|
static Random |
rand
|
static char |
START_SYMBOL
|
Constructor Summary | |
PnpClassifier(String trainingFilename)
Constructs a new PnpClassifier which is trained on the given file. |
Method Summary | |
String |
generateLine(int category)
Generates a novel example of the given category, starting with (cn-1) start symbols and ending with an end symbol. |
String |
generateWord(int wordLength,
String initialContext,
char finalChar,
int category)
Randomly generates a word of the given length, starting with the given intial context, and ending with the given final char by sampling from the char n-gram model of the given category. |
int |
getBestCategory(String line)
Returns the category that generates the given line with the highest probability. |
double |
getEmpiricalProb(List lengthSequence,
int category)
Returns the empirical estimate of the probability of the last word length in the sequence given the sequence excluding that length, as observed within the given category. |
double |
getEmpiricalProb(String charSequence,
int category)
Returns the empirical estimate of the probability of the last char in the sequence given the sequence excluding that char, as observed within the given category. |
double |
getEmpiricalProb(String word,
int wordLength,
int category)
Returns the empirical estimate of the probability of the given word given the word's length and the given category. |
static String |
getEndMarkedString(String line)
Returns the given line prepended with enough ' ' symbols to allow n-gram parsing. |
double |
getInterpolatedProb(List lengthSequence,
int category)
Returns a linearly interpolated estimate of the last length in the sequence given the rest of it. |
double |
getInterpolatedProb(String charSequence,
int category)
Returns a linearly interpolated estimate of the last char in the sequence given the rest of it. |
double |
getLogProb(String line,
int category)
Computes and returns Log[P(line|category)]. |
int |
getNumCategories()
Returns the number of different categories represented in this classifier. |
double |
getPriorProb(int category)
Returns the empirical a piori probability of each category, as observed in the training data (fraction of each category in the whole training data). |
static String |
getPureString(String word)
Prunes the first (cn-1) chars from the beginning of the word as well as the final char. |
double |
getScore(String line,
int category)
Returns the score for the given example as scored in the given category. |
static List |
getWordLengths(String line)
Takes an end-marked string and returns a list of Integers for the length of each word. |
static List |
getWordsWithContext(String line)
Takes an end-marked string and returns a List of strings, one for each word in the line. |
static void |
main(String[] args)
Trains and tests a PnpClassifier on the passed-in files. |
protected void |
test(String testFilename)
Runs the classifier on each line in the given test file and prints out the category with the highest score. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int ln
public static final int cn
public static final char START_SYMBOL
public static final char END_SYMBOL
public static final Random rand
public static final int[] charBinCutoffs
public static final int[] lengthBinCutoffs
Constructor Detail |
public PnpClassifier(String trainingFilename)
Method Detail |
public int getBestCategory(String line)
public double getScore(String line, int category)
public double getLogProb(String line, int category)
public double getInterpolatedProb(String charSequence, int category)
public double getEmpiricalProb(String charSequence, int category)
public double getInterpolatedProb(List lengthSequence, int category)
public double getEmpiricalProb(List lengthSequence, int category)
public double getEmpiricalProb(String word, int wordLength, int category)
public double getPriorProb(int category)
public int getNumCategories()
public static String getEndMarkedString(String line)
public static String getPureString(String word)
getEndMarkedString(java.lang.String)
.
public static List getWordLengths(String line)
public static List getWordsWithContext(String line)
public String generateWord(int wordLength, String initialContext, char finalChar, int category)
public String generateLine(int category)
protected void test(String testFilename) throws FileNotFoundException, IOException
FileNotFoundException
IOException
public static void main(String[] args)
Usage: java PnpClassifier trainingFilename testFilename
.
PnpClassifier(String trainingFIlename)
,
test(String testFilename)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |