edu.stanford.nlp.ie.pcfg
Class IEMan

java.lang.Object
  |
  +--edu.stanford.nlp.ie.pcfg.IEMan

public class IEMan
extends Object

Information Extraction Manager. Performs information extraction. Must be fed training data. Each IEMan has its own IE parameters (e.g., smoothing ratios). Altering these parameters allows one to run comparisons between two IEMans on the same data


Field Summary
 List g
          not currently in use
 double grammarP1
          the mixing ratio between gram4 (the lexicalized, tagged grammar) and gram3 (the tagged grammar)
 double grammarP2
          the mixing ratio between gram3 (the unlexicalized, tagged grammar) and the uniform grammar (see GLUtil)
 List l
           
 double lexiconP
          the mixing ratio between the PNPC lexicon (see PNPC, XPNPC) and the uniform lexicon (see GLUtil)
 int maxTagCombinationSize
          the maximum number of tags that can be identified per sentence
 int numAfterthoughts
          not used currently
 boolean repeat
          if repeat, the program reads the PNPC lexicon from rulesFN rather than generating it again.
static boolean sly
           
 
Constructor Summary
IEMan()
          constructs a new IEMan.
 
Method Summary
 List GetBestTagSets(Tree tree, int numParses)
          gets the "numParses" best tag sets corresponding to a given tree.
 List GetGrammar(Tree tree)
          Gets the mixed grammar that will be used to parse this tree.
 List GetLexicon(Tree tree)
          Gets the mixed lexicon that will be used to parse this tree.
static HashMap GetMap(List rules)
          creates a hashmap that links rules (without probabilties) to their XRules (rules + probabilities).
 List GetMissingGrammar(Tree tree)
          no longer used
static void main(String[] args)
          Analyzes training data and produces formatted grammars and lexicons.
 void ScoreTagSet(Tree tree, XTagSet tagSet)
          Calculates the probability that this tagset corresponds to this tree.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

sly

public static boolean sly

grammarP1

public double grammarP1
the mixing ratio between gram4 (the lexicalized, tagged grammar) and gram3 (the tagged grammar)


grammarP2

public double grammarP2
the mixing ratio between gram3 (the unlexicalized, tagged grammar) and the uniform grammar (see GLUtil)


lexiconP

public double lexiconP
the mixing ratio between the PNPC lexicon (see PNPC, XPNPC) and the uniform lexicon (see GLUtil)


numAfterthoughts

public int numAfterthoughts
not used currently


maxTagCombinationSize

public int maxTagCombinationSize
the maximum number of tags that can be identified per sentence


repeat

public boolean repeat
if repeat, the program reads the PNPC lexicon from rulesFN rather than generating it again. Obviously, you can only do this if you're only parsing one sentence (or if lexiconP == 0). Generating the PNPC lexicon takes a while (about a minute?) so setting repeat = true can save time, but keep in mind that generating the PNPC lexicon is a one-time-per-execution cost


g

public List g
not currently in use


l

public List l
Constructor Detail

IEMan

public IEMan()
      throws IOException
constructs a new IEMan. reads probUntaggedGrammar, probGrammar, probHeadGrammar, and listLexicon from their respective files. assumes that these files have already been written (see IEMan.main)

Method Detail

GetMap

public static HashMap GetMap(List rules)
creates a hashmap that links rules (without probabilties) to their XRules (rules + probabilities). Currently uses LookupRules as keys, although this should probably be updated (see LookupRule)


GetGrammar

public List GetGrammar(Tree tree)
                throws IOException
Gets the mixed grammar that will be used to parse this tree. See code for mixing details.

IOException

GetLexicon

public List GetLexicon(Tree tree)
                throws IOException
Gets the mixed lexicon that will be used to parse this tree. See code for mixing details.

IOException

ScoreTagSet

public void ScoreTagSet(Tree tree,
                        XTagSet tagSet)
Calculates the probability that this tagset corresponds to this tree. sets tagSet.p (see XTagSet)


GetBestTagSets

public List GetBestTagSets(Tree tree,
                           int numParses)
                    throws IOException
gets the "numParses" best tag sets corresponding to a given tree. ("best" is determined by the parse probability). Assumes that the sentence has already been parsed and lexical information has already been added to the tree

IOException

GetMissingGrammar

public List GetMissingGrammar(Tree tree)
no longer used


main

public static void main(String[] args)
                 throws IOException
Analyzes training data and produces formatted grammars and lexicons. The user must run IEMan before performing information extraction. running "IEMan 2" (i.e., IEMan.main("2")) will produce a non-tagged, non-lexicalized grammar. Running "IEMan 3" will produce a tagged, non-lexicalized grammar and lexicon. Running "IEMan 4" will produce a tagged, lexicalized grammar. All three steps must be completed before doing any information extraction. NOTE: You might imagine that all three steps could be run in one execution, but I had problems running ExtractPTBRules.main more than once in a single execution.

IOException


Stanford NLP Group