Class CranDocument

All Implemented Interfaces:
Cloneable, Collection, Datum, Document, Featurizable, Labeled, List, RandomAccess, Serializable

public class CranDocument
extends BasicDocument

Stores, processes, and allows access to a Document of the format specified in the Cranfield document collection

See Also:
Serialized Form

Field Summary
Fields inherited from class edu.stanford.nlp.dbm.BasicDocument
labels, originalText, title
Fields inherited from class java.util.AbstractList
Constructor Summary
Method Summary
protected  void parse(String text)
          Parses the given text as a Cranfield document to extract the title and text.
Methods inherited from class edu.stanford.nlp.dbm.BasicDocument
addLabel, asFeatures, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, label, labels, main, originalText, presentableText, setLabel, setLabels, setTitle, title
Methods inherited from class java.util.ArrayList
add, add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, removeRange, set, size, toArray, toArray, trimToSize
Methods inherited from class java.util.AbstractList
equals, hashCode, iterator, listIterator, listIterator, subList
Methods inherited from class java.util.AbstractCollection
containsAll, remove, removeAll, retainAll, toString
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.util.List
add, add, addAll, addAll, clear, contains, containsAll, equals, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, size, subList, toArray, toArray

Constructor Detail


public CranDocument()
Method Detail


protected void parse(String text)
Parses the given text as a Cranfield document to extract the title and text.

The second line of every CRAN document has the form .T experimental investigation of the aerodynamics of a wing in a slipstream . .A etc. where .T denotes that the title is coming, and the next lines until .A specify the title.

CRAN documents denote the abstracts of each document by .W The lines following .W until the end of the document are all part of the abstract text--parses these lines to get the text.

parse in class BasicDocument

Stanford NLP Group