|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--edu.stanford.nlp.trees.SentenceNormalizer | +--edu.stanford.nlp.trees.PennSentenceNormalizer
A class for Penn tag directory sentence normalization. This one knows about the funny things in Penn Treebank pos files -- like lots of equals signs and square brackets. It also interns strings. A Singleton.
Constructor Summary | |
PennSentenceNormalizer()
Constructs a PennSentenceNormalizer object. |
|
PennSentenceNormalizer(boolean divideOffTags,
char tagDivider)
Constructs a PennSentenceNormalizer object. |
|
PennSentenceNormalizer(boolean divideOffTags,
char tagDivider,
boolean unescape,
char escapeChar)
Constructs a PennSentenceNormalizer object. |
Method Summary | |
boolean |
endSentenceToken(String token,
String prev,
String next)
Returns true if this token represents the end of a sentence. |
Sentence |
normalizeSentence(Sentence sent,
LabelFactory lf)
Normalize a sentence -- this method assumes that the argument that it is passed is the whole (linguistic) Sentence . |
String |
normalizeString(String word)
Normalizes a read string word (and maybe intern it). |
Methods inherited from class edu.stanford.nlp.trees.SentenceNormalizer |
eolIsSentenceEnd |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public PennSentenceNormalizer()
public PennSentenceNormalizer(boolean divideOffTags, char tagDivider)
divideOffTags
- true
iff an unescaped
tagDivider
and all characters to the right of it should be cut off
from wordstagDivider
- The character that separates words from their tagspublic PennSentenceNormalizer(boolean divideOffTags, char tagDivider, boolean unescape, char escapeChar)
divideOffTags
- true
iff an unescaped
tagDivider
and all characters to the right of it should be cut off
from wordstagDivider
- The character that separates words from their tagsunescape
- true
if words should be unescaped, but at
present this isn't implementedescapeChar
- The character used to escape a following characterMethod Detail |
public String normalizeString(String word)
SentenceNormalizer
normalizeString
in class SentenceNormalizer
word
- The word to normalize
public Sentence normalizeSentence(Sentence sent, LabelFactory lf)
Sentence
.
It is normally implemented as a List-walking routine.
normalizeSentence
in class SentenceNormalizer
sent
- The sentence to be normalizedlf
- the LabelFactory to create new Labels (if needed)
public boolean endSentenceToken(String token, String prev, String next)
endSentenceToken
in class SentenceNormalizer
token
- The String
to be checkedprev
- The previous tokennext
- The next token (lookahead)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |