|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--edu.stanford.nlp.ie.pcfg.TBMan
Treebank Manager. Handles all treebank operations.
Field Summary | |
static int |
lengthCutoff
The TBMan parses sentences as it reads them. |
static boolean |
parse
The TBMan parses sentences as it reads them. |
static int |
portNumber
The TBMan parses sentences as it reads them. |
List |
tags
a list of the tags in all the training and testing data. |
Constructor Summary | |
TBMan(String inFN,
String tbFN,
double split)
Calls TBMan(inFN, tbFN, split, null) |
|
TBMan(String inFN,
String tbFN,
double split,
String tag)
constructs a new TBMan. |
Method Summary | |
static List |
GetSentences(List documents)
Breaks a list of documents into sentences. |
static List |
GetSentences(List documents,
boolean headlines)
Breaks a list of documents into sentences. |
List |
GetTestData(int seed)
Gets the test data. |
List |
GetTrainingData(int seed)
Gets the training data. |
List |
GetTrees(List sentences)
gets the parse trees for a list of sentences. |
static void |
main(String[] args)
test class functionality. |
Tree |
Parse(List sentence)
parses a sentence. |
static void |
Preprocess(String inFN,
String outFN)
This reads in the Acquisitions data set as text and outputs the training data as headline / paragraph pairs. |
void |
WriteTB(String fn)
writes the treebank out to a file |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static int portNumber
public static boolean parse
public static int lengthCutoff
public List tags
Constructor Detail |
public TBMan(String inFN, String tbFN, double split, String tag) throws IOException
inFN
- the name of the file containing training/test data.
input data is expected to be headlines and paragraphs on
alternating linestbFN
- the name of the file containing parse trees. Hopefully
these are the parse trees for the sentences in the training/test
data. if not, as the TBMan parses new sentences, it will update
update the file tbFNsplit
- the percent of headline/paragraph pairs to use as
training data. the remaining headline/paragraph pairs are used
for test datatag
- this constructor eliminates all tags in the training/test
data except for instances of "tag". if "tag" is null,
no effectpublic TBMan(String inFN, String tbFN, double split) throws IOException
Method Detail |
public static void Preprocess(String inFN, String outFN) throws IOException
IOException
public List GetTrainingData(int seed)
seed
- the random seed. the algorithm will always split the
data the same way given the same data and the same seedpublic List GetTestData(int seed)
seed
- the random seed. the algorithm will always split the
data the same way given the same data and the same seedpublic static List GetSentences(List documents, boolean headlines)
documents
- a list of documents (each document is a list
of headline and paragraph, where a headline is a sentence
and a paragraph is a list of sentences)headlines
- if true, just returns headlines. otherwise,
just returns sentences from paragraphs
public static List GetSentences(List documents)
documents
- a list of documents (each document is a list
of headline and paragraph, where a headline is a sentence
and a paragraph is a list of sentences)
public List GetTrees(List sentences)
public void WriteTB(String fn) throws IOException
IOException
public Tree Parse(List sentence)
public static void main(String[] args) throws IOException
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |