|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--java.util.AbstractCollection | +--java.util.AbstractList | +--java.util.ArrayList | +--edu.stanford.nlp.dbm.BasicDocument
Basic implementation of Document that should be suitable for most needs.
BasicDocument is an ArrayList for storing words and performs tokenization
during construction. Override parse(java.lang.String)
to provide support for custom
document formats or to do a custom job of tokenization. BasicDocument should
only be used for documents that are small enough to store in memory.
Field Summary | |
protected List |
labels
Label(s) for this document. |
protected String |
originalText
original text of this document (may be null). |
protected String |
title
title of this document (never null). |
Fields inherited from class java.util.AbstractList |
modCount |
Constructor Summary | |
BasicDocument()
Constructs a new (empty) BasicDocument. |
Method Summary | |
void |
addLabel(Label label)
Adds the given Label to the List of labels for this Document if it is not null. |
Collection |
asFeatures()
Returns this (the features are the list of words). |
BasicDocument |
init()
Calls init((String)null,null,true) |
BasicDocument |
init(File textFile)
Calls init(textFile,textFile.getCanonicalPath(),true) |
BasicDocument |
init(File textFile,
boolean keepOriginalText)
Calls init(textFile,textFile.getCanonicalPath(),keepOriginalText) |
BasicDocument |
init(File textFile,
String title)
Calls init(textFile,title,true) |
BasicDocument |
init(File textFile,
String title,
boolean keepOriginalText)
Inits a new BasicDocument by reading in the text from the given File. |
BasicDocument |
init(List words)
Calls init(words,null) |
BasicDocument |
init(List words,
String title)
Inits a new BasicDocument with the given list of words and title. |
BasicDocument |
init(Reader textReader)
Calls init(textReader,null,true) |
BasicDocument |
init(Reader textReader,
boolean keepOriginalText)
Calls init(textReader,null,keepOriginalText) |
BasicDocument |
init(Reader textReader,
String title)
Calls init(textReader,title,true) |
BasicDocument |
init(Reader textReader,
String title,
boolean keepOriginalText)
Inits a new BasicDocument by reading in the text from the given Reader. |
BasicDocument |
init(String text)
Calls init(text,null,true) |
BasicDocument |
init(String text,
boolean keepOriginalText)
Calls init(text,null,keepOriginalText) |
BasicDocument |
init(String text,
String title)
Calls init(text,title,true) |
BasicDocument |
init(String text,
String title,
boolean keepOriginalText)
Inits a new BasicDocument with the given text contents and title. |
BasicDocument |
init(URL textURL)
Calls init(textURL,textURL.toExternalForm(),true) |
BasicDocument |
init(URL textURL,
boolean keepOriginalText)
Calls init(textURL,textFile.toExternalForm(),keepOriginalText) |
BasicDocument |
init(URL textURL,
String title)
Calls init(textURL,title,true) |
BasicDocument |
init(URL textURL,
String title,
boolean keepOriginalText)
Constructs a new BasicDocument by reading in the text from the given URL. |
Label |
label()
Returns the first label for this Document, or null if none have been set. |
Collection |
labels()
Returns the complete List of labels for this Document. |
static void |
main(String[] args)
For internal debugging purposes only. |
String |
originalText()
Returns the text originally used to construct this document, or null if there was no original text. |
protected void |
parse(String text)
Tokenizes the given text to populate the list of Words this Document represents. |
String |
presentableText()
Returns a "pretty" version of the words in this Document suitable for display. |
void |
setLabel(Label label)
Removes all currently assigned Labels for this Document then adds the given Label. |
void |
setLabels(Collection labels)
Removes all currently assigned labels for this Document then adds all of the given Labels. |
void |
setTitle(String title)
Sets the title of this Document to the given title. |
String |
title()
Returns the title of this document. |
Methods inherited from class java.util.ArrayList |
add, add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, removeRange, set, size, toArray, toArray, trimToSize |
Methods inherited from class java.util.AbstractList |
equals, hashCode, iterator, listIterator, listIterator, subList |
Methods inherited from class java.util.AbstractCollection |
containsAll, remove, removeAll, retainAll, toString |
Methods inherited from class java.lang.Object |
finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface java.util.List |
add, add, addAll, addAll, clear, contains, containsAll, equals, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, size, subList, toArray, toArray |
Field Detail |
protected String title
protected String originalText
protected final List labels
Constructor Detail |
public BasicDocument()
Method Detail |
public BasicDocument init(String text, String title, boolean keepOriginalText)
parse(java.lang.String)
to populate the list of words
("" is used if text is null). If specified, a reference to the
original text is also maintained so that the text() method returns the
text given to this constructor. Returns a reference to this BasicDocument
for convinience (so it's more like a constructor, but inherited).
public BasicDocument init(String text, String title)
public BasicDocument init(String text, boolean keepOriginalText)
public BasicDocument init(String text)
public BasicDocument init()
public BasicDocument init(Reader textReader, String title, boolean keepOriginalText) throws IOException
IOException
init(String,String,boolean)
public BasicDocument init(Reader textReader, String title) throws IOException
IOException
public BasicDocument init(Reader textReader, boolean keepOriginalText) throws IOException
IOException
public BasicDocument init(Reader textReader) throws IOException
IOException
public BasicDocument init(File textFile, String title, boolean keepOriginalText) throws FileNotFoundException, IOException
FileNotFoundException
IOException
init(String,String,boolean)
public BasicDocument init(File textFile, String title) throws FileNotFoundException, IOException
FileNotFoundException
IOException
public BasicDocument init(File textFile, boolean keepOriginalText) throws FileNotFoundException, IOException
FileNotFoundException
IOException
public BasicDocument init(File textFile) throws FileNotFoundException, IOException
FileNotFoundException
IOException
public BasicDocument init(URL textURL, String title, boolean keepOriginalText) throws IOException
IOException
init(String,String,boolean)
public BasicDocument init(URL textURL, String title) throws FileNotFoundException, IOException
FileNotFoundException
IOException
public BasicDocument init(URL textURL, boolean keepOriginalText) throws FileNotFoundException, IOException
FileNotFoundException
IOException
public BasicDocument init(URL textURL) throws FileNotFoundException, IOException
FileNotFoundException
IOException
public BasicDocument init(List words, String title)
public BasicDocument init(List words)
protected void parse(String text)
public Collection asFeatures()
asFeatures
in interface Featurizable
public Label label()
label
in interface Labeled
public Collection labels()
labels
in interface Labeled
public void setLabel(Label label)
setLabel
in interface Labeled
public void setLabels(Collection labels)
setLabels
in interface Labeled
public void addLabel(Label label)
public String title()
title
in interface Document
public void setTitle(String title)
public String originalText()
public String presentableText()
Returns a "pretty" version of the words in this Document suitable for display.
The default implementation returns each of the words in this Document separated
by spaces. Specifically, each element that is a Word
has its Word.word()
printed, and other elements are skipped.
Subclasses that maintain additional information may which to override this method.
public static void main(String[] args)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |