edu.stanford.nlp.ie.hmm
Class TypedTaggedDocument
java.lang.Object
|
+--java.util.AbstractCollection
|
+--java.util.AbstractList
|
+--java.util.ArrayList
|
+--edu.stanford.nlp.dbm.BasicDocument
|
+--edu.stanford.nlp.ie.hmm.TypedTaggedDocument
- All Implemented Interfaces:
- Cloneable, Collection, Datum, Document, Featurizable, Labeled, List, RandomAccess, Serializable
- public class TypedTaggedDocument
- extends BasicDocument
Document whose words are TypedTaggedWord
objects. When reading in text,
all word types are assumed to be 0 (i.e. background state).
- See Also:
getTypeSequence()
,
Serialized Form
Method Summary |
int[] |
getTypeSequence()
Returns an array representing the type of each word in this Document. |
protected void |
parse(String s)
Tokenizes the given text to populate the list of Words this Document
represents. |
Methods inherited from class edu.stanford.nlp.dbm.BasicDocument |
addLabel, asFeatures, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, label, labels, main, originalText, presentableText, setLabel, setLabels, setTitle, title |
Methods inherited from class java.util.ArrayList |
add, add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, removeRange, set, size, toArray, toArray, trimToSize |
Methods inherited from interface java.util.List |
add, add, addAll, addAll, clear, contains, containsAll, equals, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, size, subList, toArray, toArray |
TypedTaggedDocument
public TypedTaggedDocument()
parse
protected void parse(String s)
- Description copied from class:
BasicDocument
- Tokenizes the given text to populate the list of Words this Document
represents. The default implementation uses a SimpleTokenizer and tokenizes
the entirity of the text into words. Subclasses should override this method
to parse documents in non-standard formats, and/or to pull the title of the
document from the text. The given text may be empty ("") but will never
be null.
- Overrides:
parse
in class BasicDocument
getTypeSequence
public int[] getTypeSequence()
- Returns an array representing the type of each word in this Document.
The ith element in the returned array is the type of the ith word if it
implements HasType (as it should if you've constructed this normally),
or 0 (i.e. background state) otherwise.
Stanford NLP Group