Package edu.stanford.nlp.ie.hmm

A package implementing HMMs for the purpose of information extraction.

See:
          Description

Interface Summary
EmitMap Interface to model a states emission distribution.
GeneralStructure A simple interface for anything that has a State array.
HasType Something that implements the HasType interface knows about HMM target types.
 

Class Summary
AnswerChecker Utility class for checking whether words pulled from the HMM match the errors created from an AnswerConstructor.
AnswerChecker.Range Reprsents a range [from,to) (same semantics as substring).
AnswerConstructor Takes a Collection of TypedTaggedWords (or a Collection of Words and a list of integers for the corresponsing types) and pulls out the strings of each type.
ContextTrainer Trains a context HMM on the contexts of the given target states, representing each target state as atomic.
Corpus Class to handle a corpus of information extraction data.
DiscriminativeHMMDiffFunction Interface to optimization package for discriminatively learning the structure of an HMM.
Extractor A command line information extraction tool built using the HMM and Corpus classes.
HMM Class for a Hidden Markov Model information extraction tool.
HMMSingleFieldExtractor An interface between the KAON extraction world, and extraction of a single field via an HMM information extractor.
HMMTester Programmatically tests the quality of an HMM on a Corpus.
MergeTrainer Main class for building a single HMM by combining multiple target HMMs and a context HMM.
MultiStructure Class to model an HMM context structure.
State Class to model a single state in an HMM.
Structure Class to model an HMM structure.
StructureLearner A class to learn HMM structures by stochastic optimization.
TargetTrainer Trains a small target HMM on target sequences only.
Tester Test a trained, serialized HMM on a (tagged) testing file.
Trainer Trains HMM and saves it as a serialized object.
TypedTaggedDocument Document whose words are TypedTaggedWord objects.
TypedTaggedWord A TypedTaggedWord object contains a word, it's tag, and it's type.
WordTypeStripper Appliable that sets the type of a TypedTaggedWord to 0.
 

Package edu.stanford.nlp.ie.hmm Description

A package implementing HMMs for the purpose of information extraction. This work is based largely on work done by Freitag and McCallum. For more descriptions of ideas used, see:

The key classes are:

A variety of command line utility classes are available to build, train, and test HMM extractors:

Data format: The input utilities work with a simple XML-like but not XML document structure which is described in the documentation of the class Corpus. Documents are all in one file, separated by the string ENDOFDOC on a line by itself. Within a document, fields for training are marked as XML-style elements.

Use cases

Testing a single field extractor

Using cross validation, and a default structure:

java edu.stanford.nlp.ie.hmm.Extractor /u/nlp/data/iedata/acquisitions.txt acquired

Building a single field extractor from a target and context HMM, and testing it

One should be able to put together an HMM from parts and test it like this. However, this doesn't currently work (Oct 2002).

Making a target HMM with a fixed target structure

java edu.stanford.nlp.ie.hmm.TargetTrainer /u/nlp/data/iedata/acquisitions.txt acquired acquired-fixed.hmm

Making a target HMM with a learned target structure

Alternatively, one could learn a target HMM structure as below. This takes considerably longer, but it isn't so bad for a simple target HMM. It learns a much bigger HMM structure.
java edu.stanford.nlp.ie.hmm.TargetTrainer -sl /u/nlp/data/iedata/acquisitions.txt acquired acquired-learned.hmm

Making the context HMM

java edu.stanford.nlp.ie.hmm.ContextTrainer /u/nlp/data/iedata/acquisitions.txt acquired-context.hmm acquired
(This didn't seem to work in the older code -- but doing java edu.stanford.nlp.ie.hmm.ContextTrainer -cc /u/nlp/data/iedata/acquisitions.txt acquired seems sensible, so I think it shouldn't be too far from working.)

Gluing the HMMs together

java edu.stanford.nlp.ie.hmm.MergeTrainer -f acquired-merged.hmm acquired-context.hmm acquired-fixed.hmm

Testing the merged HMM

java edu.stanford.nlp.ie.hmm.Tester /u/nlp/data/iedata/acquisitions.txt acquired acquired-merged.hmm

Since:
1.4


Stanford NLP Group