Extractor (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.ie.hmm
Class Extractor

java.lang.Object
  |
  +--edu.stanford.nlp.ie.hmm.Extractor

public class Extractor
extends Object

A command line information extraction tool built using the HMM and Corpus classes. This is a demonstration, debugging, and testing tool. The Extractor trains and tests all on one run. The data file it takes as input is a list of tagged documents separated by the ENDOFDOC token. It either does crossvalidation or it uses the first 70% of these documents to train on, and the rest to test on. Alternatively, it can test HMM.extractFrom() functionality on a hardcoded text string.

To train and test separately and build a saved HMM file, use the Trainer and Tester classes.

Method Summary

static void main(String[] args)
Train and test an extractor.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Detail

main

public static void main(String[] args)

Train and test an extractor. This now allows cross-validation doing ten runs and doing calculation of average performance (maybe we should also add a confidence interval?).
Usage:

java edu.stanford.nlp.ie.hmm.Extractor [-cv|-ef|-s1|-sd]
                     dataFile targetField+

-ef With the "extractFrom" option, it trains on all the given data, and then tests the extractFrom functionality by doing extraction from a String hardcoded within the main method.
-cv With the "crossvalidate" option, the data is divide into contiguous tenths, and training and testing is done ten times on 90% of the data, with testing on the remaining 10%.
With neither of the above options, training is done on the first 70% of the data, with testing on the remaining 30%. -sd Use a simple minimal default HMM structure, with one prefix, suffix, and target.
-s1 Use a more complex structure, built by Structure, involving 3 prefixes, 3 suffixes, and 4 (2x2) targets.
-se Use an ergodic (fully connected) structure with 8 background and 4 target states With none of the above "s" options, a traditional default HMM structure handcoded by Jim is used.
Details of the HMM's structure and its performance are printed.

Parameters:: args - Command line arguments, as explained above