Package edu.stanford.nlp.dbm

Classes for building and operating on documents and data collections.

See:
          Description

Interface Summary
DataCollection Interface for data collections.
Datum Interface for Objects which can be described by their features.
Document Represents a text document as a list of Words with a title.
Featurizable Interface for Objects that can be described by their features.
IndexedSet List in which no duplicate Objects may be stored
LabeledDataCollection Interface for hand-classified data collections.
MatrixWrapper Interface for a class of objects which construct and store a Matrix.
 

Class Summary
AbstractDataCollection Abstract Data Collection.
BasicDatum Basic implementation of Datum interface that can be constructed with a Collection of features and one more more labels.
BasicDocument Basic implementation of Document that should be suitable for most needs.
BOFDataMatrix "Bag of Features" Feature Matrix.
Context One line, with word as the first element, and it's context following it on the same line.
Contexts Contains methods dealing with populating a DBM given a file containing one Context per line, where the word to be disambiguated is at the top of the file
ContextSet A collection of word contexts that does not allow duplicate words.
CranDocument Stores, processes, and allows access to a Document of the format specified in the Cranfield document collection
Cranfield Contains methods dealing with populating a DBM given a file containing all Cranfield documents
DataMatrix Class with methods to construct a DataMatrix (such as a term-document matrix), from the initial Objects (such as Documents).
DataSet A Data Collection that does not allow duplicate data.
DBIndexedSet Implementation of IndexedSet which uses a List and a SimpleDatabase: the List to store index-object pairs, and the SimpleDatabase to store object-index pairs
DSDataMatrix DataMatrix for a DataSet, where duplicate Data are not allowed.
FileDataCollection DataCollection in which the Data and Features are stored in a File or Directory.
HTMLDocument The HTMLDocument class implements Document methods for an HTML encoded document.
LabelMatrix Wrapper for a Matrix whose columns are Label Vectors of Labeled Objects
LocusLink A DataCollection where each Data Item is a LocusLink document about a gene.
LocusLinkDocument A LocusLink document about a gene with LocusLink ID locusID.
Medline Contains methods dealing with populating a DBM given a file containing all Medline documents
MedlineDocument A Medline Document in Medline XML Format.
Ohsumed Contains methods dealing with populating a DBM given a file containing all Ohsumed documents.
OhsumedDocument Stores, processes, and allows access to a Document of the format specified in the Ohsumed document collection NOTE: THIS NEEDS TO BE CONVERTED TO WORK WITH BASICDOCUMENT THE WAY CRANDOCUMENT DOES
PersistentHashList Persistent List backed by a SimpleDatabase
RestrictedDataMatrix "Restricted Bag of Features" Feature Matrix.
SimpleCollection Just like AbstractCollection, but it implements the size() method for you in the obvious way by using the Iterator.
USPDI A DataCollection where each Data Item is a LocusLink document about a gene.
USPDIDocument A USPDIDocument is the relevant contents of a query on USP DI Drug Database for the drug given by drugName.
 

Package edu.stanford.nlp.dbm Description

Classes for building and operating on documents and data collections. Two of the basic interfaces are Document for representing a document as a list of words with meta-data, and DataCollection for representing a collection of documents. The most common document class you will probably use is BasicDocument, which provides support for constructing documents from a variety of input sources. There are several subclasses of BasicDocument that handle special file formats and additional meta-data. The most common DataCollection class you will probably use is FileDataCollection, which manages a group of files and allows you to iterate through them or work with them in aggregate.

NOTE: The package name dbm is a historical anachronism, and will probably soon change to something like data.



Stanford NLP Group