edu.stanford.nlp.ie.pnp
Class DataGenerator
java.lang.Object
|
+--edu.stanford.nlp.ie.pnp.DataGenerator
- public class DataGenerator
- extends Object
Creates training/test/answer files from input files.
- See Also:
main(java.lang.String[])
Constructor Summary |
DataGenerator(String[] args)
Constructs a new DataGenerator with the given args. |
Method Summary |
static void |
main(String[] args)
Generates training, test, and answer files from files, one per category. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DataGenerator
public DataGenerator(String[] args)
- Constructs a new DataGenerator with the given args.
- See Also:
main(java.lang.String[])
main
public static void main(String[] args)
- Generates training, test, and answer files from files, one per category.
Usage: java DataGenerator outdir source1 source2 [source3 ...].
The first argument is the dir in which the dir of files should be output.
The remaining args are names of category files. Each category file is just a
list of lines, where each line is an example of the category. The name of the category
is the name of the category file minus its file extention (e.g. a file named "drug.txt"
will constitute a category named "drug"). This program reads in the category files
and produces a bunch of train/test/answer files. Specifically, it makes pairwise tests
for each pair of catgeories, it makes "1-all" tests where one category is put up against
the union of all the other categories, and one "n-way" test where all categories are put
in on their own. It creates several random folds for each situation, with the number of folds
and the percentage of data to keep for testing specified as hard-coded constants.
Stanford NLP Group