edu.stanford.nlp.dbm
Class OhsumedDocument
java.lang.Object
|
+--java.util.AbstractCollection
|
+--java.util.AbstractList
|
+--java.util.ArrayList
|
+--edu.stanford.nlp.dbm.BasicDocument
|
+--edu.stanford.nlp.dbm.OhsumedDocument
- All Implemented Interfaces:
- Cloneable, Collection, Datum, Document, Featurizable, Labeled, List, RandomAccess, Serializable
- public class OhsumedDocument
- extends BasicDocument
Stores, processes, and allows access to a Document of the format specified in the Ohsumed document collection
NOTE: THIS NEEDS TO BE CONVERTED TO WORK WITH BASICDOCUMENT THE WAY CRANDOCUMENT DOES
- See Also:
- Serialized Form
Method Summary |
String |
abstractText()
Returns the abstract text for this document. |
boolean |
hasAbstract()
Returns true if Document has abstract. |
boolean |
in(Set UIDs)
Returns true if Document is in UIDs |
protected void |
parse(String s)
Tokenizes the given text to populate the list of Words this Document
represents. |
Methods inherited from class edu.stanford.nlp.dbm.BasicDocument |
addLabel, asFeatures, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, label, labels, main, originalText, presentableText, setLabel, setLabels, setTitle, title |
Methods inherited from class java.util.ArrayList |
add, add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, removeRange, set, size, toArray, toArray, trimToSize |
Methods inherited from interface java.util.List |
add, add, addAll, addAll, clear, contains, containsAll, equals, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, size, subList, toArray, toArray |
OhsumedDocument
public OhsumedDocument()
parse
protected void parse(String s)
- Description copied from class:
BasicDocument
- Tokenizes the given text to populate the list of Words this Document
represents. The default implementation uses a SimpleTokenizer and tokenizes
the entirity of the text into words. Subclasses should override this method
to parse documents in non-standard formats, and/or to pull the title of the
document from the text. The given text may be empty ("") but will never
be null.
- Overrides:
parse
in class BasicDocument
hasAbstract
public boolean hasAbstract()
- Returns true if Document has abstract.
abstractText
public String abstractText()
- Returns the abstract text for this document.
in
public boolean in(Set UIDs)
- Returns true if Document is in
UIDs
Stanford NLP Group