edu.stanford.nlp.ie.hmm
Class Extractor
java.lang.Object
|
+--edu.stanford.nlp.ie.hmm.Extractor
- public class Extractor
- extends Object
A command line information extraction tool built using the HMM
and Corpus
classes. This is a demonstration, debugging, and testing
tool.
The Extractor trains and tests all on one run.
The data file it takes as input is a list of tagged documents separated by
the ENDOFDOC token. It either does crossvalidation or it uses the first 70%
of these documents to train on, and the rest to test on.
Alternatively, it can test HMM.extractFrom() functionality on a
hardcoded text string.
To train and test separately and build a saved HMM file, use the
Trainer
and Tester
classes.
Method Summary |
static void |
main(String[] args)
Train and test an extractor. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
main
public static void main(String[] args)
- Train and test an extractor. This now allows cross-validation
doing ten runs and doing calculation of average performance (maybe
we should also add a confidence interval?).
Usage: java edu.stanford.nlp.ie.hmm.Extractor [-cv|-ef|-s1|-sd]
dataFile targetField+
-ef With the "extractFrom" option, it trains on all the given data, and
then tests the extractFrom functionality by doing extraction from a
String hardcoded within the main method.
-cv With the "crossvalidate"
option, the data is divide into contiguous tenths, and training and
testing is done ten times on 90% of the data, with testing on the
remaining 10%.
With neither of the above options, training is done on the first 70% of
the data, with testing on the remaining 30%.
-sd Use a simple minimal default HMM structure, with one prefix, suffix,
and target.
-s1 Use a more complex structure, built by Structure
,
involving 3 prefixes, 3 suffixes, and 4 (2x2) targets.
-se Use an ergodic (fully connected) structure with 8 background and
4 target states
With none of the above "s" options, a traditional default
HMM structure handcoded by Jim is used.
Details of the HMM's structure and its performance are printed.
- Parameters:
args
- Command line arguments, as explained above
Stanford NLP Group