Class OhsumedDocument

All Implemented Interfaces:
Cloneable, Collection, Datum, Document, Featurizable, Labeled, List, RandomAccess, Serializable

public class OhsumedDocument
extends BasicDocument

Stores, processes, and allows access to a Document of the format specified in the Ohsumed document collection NOTE: THIS NEEDS TO BE CONVERTED TO WORK WITH BASICDOCUMENT THE WAY CRANDOCUMENT DOES

See Also:
Serialized Form

Field Summary
Fields inherited from class edu.stanford.nlp.dbm.BasicDocument
labels, originalText, title
Fields inherited from class java.util.AbstractList
Constructor Summary
Method Summary
 String abstractText()
          Returns the abstract text for this document.
 boolean hasAbstract()
          Returns true if Document has abstract.
 boolean in(Set UIDs)
          Returns true if Document is in UIDs
protected  void parse(String s)
          Tokenizes the given text to populate the list of Words this Document represents.
Methods inherited from class edu.stanford.nlp.dbm.BasicDocument
addLabel, asFeatures, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, label, labels, main, originalText, presentableText, setLabel, setLabels, setTitle, title
Methods inherited from class java.util.ArrayList
add, add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, removeRange, set, size, toArray, toArray, trimToSize
Methods inherited from class java.util.AbstractList
equals, hashCode, iterator, listIterator, listIterator, subList
Methods inherited from class java.util.AbstractCollection
containsAll, remove, removeAll, retainAll, toString
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.util.List
add, add, addAll, addAll, clear, contains, containsAll, equals, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, size, subList, toArray, toArray

Constructor Detail


public OhsumedDocument()
Method Detail


protected void parse(String s)
Description copied from class: BasicDocument
Tokenizes the given text to populate the list of Words this Document represents. The default implementation uses a SimpleTokenizer and tokenizes the entirity of the text into words. Subclasses should override this method to parse documents in non-standard formats, and/or to pull the title of the document from the text. The given text may be empty ("") but will never be null.

parse in class BasicDocument


public boolean hasAbstract()
Returns true if Document has abstract.


public String abstractText()
Returns the abstract text for this document.


public boolean in(Set UIDs)
Returns true if Document is in UIDs

Stanford NLP Group