edu.stanford.nlp.dbm
Class FileDataCollection

java.lang.Object
  |
  +--java.util.AbstractCollection
        |
        +--java.util.AbstractList
              |
              +--edu.stanford.nlp.dbm.AbstractDataCollection
                    |
                    +--edu.stanford.nlp.dbm.FileDataCollection
All Implemented Interfaces:
Collection, DataCollection, List
Direct Known Subclasses:
Contexts, Cranfield, DataSet, LocusLink, Medline, Ohsumed, USPDI

public abstract class FileDataCollection
extends AbstractDataCollection

DataCollection in which the Data and Features are stored in a File or Directory. Subclasses must only implement fileIterator, whose next() method returns the next data item from the file.


Field Summary
protected  DataMatrix datamatrix
           
protected  int numdocs
           
 
Fields inherited from class edu.stanford.nlp.dbm.AbstractDataCollection
data, features, name
 
Fields inherited from class java.util.AbstractList
modCount
 
Constructor Summary
protected FileDataCollection()
          Creates an empty filedatacollection
  FileDataCollection(List db, String filename)
           
  FileDataCollection(String filename)
          Creates a FileDataCollection given a filename.
  FileDataCollection(String filename, int n)
          Creates a FileDataCollection.
 
Method Summary
 int add(Datum d)
          inserts a Datum into the Data Collection assigns Datum to lowest unassigned index in FileDataCollection and returns this index note: this allows for duplicate objects to be stored with different indices.
 Matrix dataMatrix()
          gets the feature matrix
 List features()
          returns an IndexedCollection of all the features
abstract  Iterator fileIterator()
          returns an iterator over the file where the FileDataCollection is stored.
 void populate()
          populates the FileDataCollection with information from a file
protected  void set(DataMatrix dm)
           
protected  void set(int nd)
           
protected  void set(List db)
           
protected  void setDefaults(String filename)
          Sets DataMatrix, List, Processors, to default values, where features and data are stored in memory.
protected  void setDefaults(String filename, String dir)
          Sets DataMatrix, List, Processors, to default values, where features and data are stored on disk in directory dir.
 
Methods inherited from class edu.stanford.nlp.dbm.AbstractDataCollection
get, name, size, toString, toXMLString
 
Methods inherited from class java.util.AbstractList
add, add, addAll, clear, equals, hashCode, indexOf, iterator, lastIndexOf, listIterator, listIterator, remove, removeRange, set, subList
 
Methods inherited from class java.util.AbstractCollection
addAll, contains, containsAll, isEmpty, remove, removeAll, retainAll, toArray, toArray
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.List
add, add, addAll, addAll, clear, contains, containsAll, equals, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, subList, toArray, toArray
 

Field Detail

datamatrix

protected DataMatrix datamatrix

numdocs

protected int numdocs
Constructor Detail

FileDataCollection

protected FileDataCollection()
Creates an empty filedatacollection


FileDataCollection

public FileDataCollection(String filename)
Creates a FileDataCollection given a filename. Stores data, features, and ttheir respective indices in memory. Uses standard Bag-of-Features Matrix.


FileDataCollection

public FileDataCollection(List db,
                          String filename)

FileDataCollection

public FileDataCollection(String filename,
                          int n)
Creates a FileDataCollection. Only takes first n documents from File. The data, features, and their respective indices are stored in memory, and a standard Bag-of-Features Matrix is used. Processors are specified by the client.

Method Detail

set

protected void set(List db)

set

protected void set(DataMatrix dm)

set

protected void set(int nd)

setDefaults

protected void setDefaults(String filename)
Sets DataMatrix, List, Processors, to default values, where features and data are stored in memory.


setDefaults

protected void setDefaults(String filename,
                           String dir)
                    throws IOException
Sets DataMatrix, List, Processors, to default values, where features and data are stored on disk in directory dir.

IOException

add

public int add(Datum d)
inserts a Datum into the Data Collection assigns Datum to lowest unassigned index in FileDataCollection and returns this index note: this allows for duplicate objects to be stored with different indices.


dataMatrix

public Matrix dataMatrix()
gets the feature matrix

Specified by:
dataMatrix in interface DataCollection
Overrides:
dataMatrix in class AbstractDataCollection

features

public List features()
returns an IndexedCollection of all the features

Specified by:
features in interface DataCollection
Overrides:
features in class AbstractDataCollection

populate

public void populate()
populates the FileDataCollection with information from a file


fileIterator

public abstract Iterator fileIterator()
returns an iterator over the file where the FileDataCollection is stored.



Stanford NLP Group