edu.stanford.nlp.ie.hmm
Class Extractor

java.lang.Object
  |
  +--edu.stanford.nlp.ie.hmm.Extractor

public class Extractor
extends Object

A command line information extraction tool built using the HMM and Corpus classes. This is a demonstration, debugging, and testing tool. The Extractor trains and tests all on one run. The data file it takes as input is a list of tagged documents separated by the ENDOFDOC token. It either does crossvalidation or it uses the first 70% of these documents to train on, and the rest to test on. Alternatively, it can test HMM.extractFrom() functionality on a hardcoded text string.

To train and test separately and build a saved HMM file, use the Trainer and Tester classes.


Method Summary
static void main(String[] args)
          Train and test an extractor.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(String[] args)
Train and test an extractor. This now allows cross-validation doing ten runs and doing calculation of average performance (maybe we should also add a confidence interval?).
Usage: java edu.stanford.nlp.ie.hmm.Extractor [-cv|-ef|-s1|-sd] dataFile targetField+
-ef With the "extractFrom" option, it trains on all the given data, and then tests the extractFrom functionality by doing extraction from a String hardcoded within the main method.
-cv With the "crossvalidate" option, the data is divide into contiguous tenths, and training and testing is done ten times on 90% of the data, with testing on the remaining 10%.
With neither of the above options, training is done on the first 70% of the data, with testing on the remaining 30%. -sd Use a simple minimal default HMM structure, with one prefix, suffix, and target.
-s1 Use a more complex structure, built by Structure, involving 3 prefixes, 3 suffixes, and 4 (2x2) targets.
-se Use an ergodic (fully connected) structure with 8 background and 4 target states With none of the above "s" options, a traditional default HMM structure handcoded by Jim is used.
Details of the HMM's structure and its performance are printed.

Parameters:
args - Command line arguments, as explained above


Stanford NLP Group