TypedTaggedDocument (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.ie.hmm
Class TypedTaggedDocument

java.lang.Object
  |
  +--java.util.AbstractCollection
        |
        +--java.util.AbstractList
              |
              +--java.util.ArrayList
                    |
                    +--edu.stanford.nlp.dbm.BasicDocument
                          |
                          +--edu.stanford.nlp.ie.hmm.TypedTaggedDocument

All Implemented Interfaces:: Cloneable, Collection, Datum, Document, Featurizable, Labeled, List, RandomAccess, Serializable

public class TypedTaggedDocument
extends BasicDocument

Document whose words are TypedTaggedWord objects. When reading in text, all word types are assumed to be 0 (i.e. background state).

See Also:: getTypeSequence(), Serialized Form

Field Summary

Fields inherited from class edu.stanford.nlp.dbm.BasicDocument

labels, originalText, title

Fields inherited from class java.util.AbstractList

modCount

Constructor Summary

TypedTaggedDocument()


Method Summary

int[] getTypeSequence()
          Returns an array representing the type of each word in this Document.

protected void parse(String s)
          Tokenizes the given text to populate the list of Words this Document represents.

Methods inherited from class edu.stanford.nlp.dbm.BasicDocument

addLabel, asFeatures, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, init, label, labels, main, originalText, presentableText, setLabel, setLabels, setTitle, title

Methods inherited from class java.util.ArrayList

add, add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, removeRange, set, size, toArray, toArray, trimToSize

Methods inherited from class java.util.AbstractList

equals, hashCode, iterator, listIterator, listIterator, subList

Methods inherited from class java.util.AbstractCollection

containsAll, remove, removeAll, retainAll, toString

Methods inherited from class java.lang.Object

finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface java.util.List

add, add, addAll, addAll, clear, contains, containsAll, equals, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, size, subList, toArray, toArray

Constructor Detail

TypedTaggedDocument

public TypedTaggedDocument()

Method Detail

parse

protected void parse(String s)

Description copied from class: BasicDocument

Tokenizes the given text to populate the list of Words this Document represents. The default implementation uses a SimpleTokenizer and tokenizes the entirity of the text into words. Subclasses should override this method to parse documents in non-standard formats, and/or to pull the title of the document from the text. The given text may be empty ("") but will never be null.

Overrides:: parse in class BasicDocument

getTypeSequence

public int[] getTypeSequence()

Returns an array representing the type of each word in this Document. The ith element in the returned array is the type of the ith word if it implements HasType (as it should if you've constructed this normally), or 0 (i.e. background state) otherwise.