edu.stanford.nlp.ie.pnp
Class DataGenerator

java.lang.Object
  |
  +--edu.stanford.nlp.ie.pnp.DataGenerator

public class DataGenerator
extends Object

Creates training/test/answer files from input files.

See Also:
main(java.lang.String[])

Nested Class Summary
static class DataGenerator.Example
          Stores a category number and an example text.
 
Constructor Summary
DataGenerator(String[] args)
          Constructs a new DataGenerator with the given args.
 
Method Summary
static void main(String[] args)
          Generates training, test, and answer files from files, one per category.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataGenerator

public DataGenerator(String[] args)
Constructs a new DataGenerator with the given args.

See Also:
main(java.lang.String[])
Method Detail

main

public static void main(String[] args)
Generates training, test, and answer files from files, one per category.

Usage: java DataGenerator outdir source1 source2 [source3 ...].

The first argument is the dir in which the dir of files should be output. The remaining args are names of category files. Each category file is just a list of lines, where each line is an example of the category. The name of the category is the name of the category file minus its file extention (e.g. a file named "drug.txt" will constitute a category named "drug"). This program reads in the category files and produces a bunch of train/test/answer files. Specifically, it makes pairwise tests for each pair of catgeories, it makes "1-all" tests where one category is put up against the union of all the other categories, and one "n-way" test where all categories are put in on their own. It creates several random folds for each situation, with the number of folds and the percentage of data to keep for testing specified as hard-coded constants.



Stanford NLP Group