|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Object
|
+--edu.stanford.nlp.trees.SentenceNormalizer
|
+--edu.stanford.nlp.trees.PennSentenceNormalizer
A class for Penn tag directory sentence normalization. This one knows about the funny things in Penn Treebank pos files -- like lots of equals signs and square brackets. It also interns strings. A Singleton.
| Constructor Summary | |
PennSentenceNormalizer()
Constructs a PennSentenceNormalizer object. |
|
PennSentenceNormalizer(boolean divideOffTags,
char tagDivider)
Constructs a PennSentenceNormalizer object. |
|
PennSentenceNormalizer(boolean divideOffTags,
char tagDivider,
boolean unescape,
char escapeChar)
Constructs a PennSentenceNormalizer object. |
|
| Method Summary | |
boolean |
endSentenceToken(String token,
String prev,
String next)
Returns true if this token represents the end of a sentence. |
Sentence |
normalizeSentence(Sentence sent,
LabelFactory lf)
Normalize a sentence -- this method assumes that the argument that it is passed is the whole (linguistic) Sentence. |
String |
normalizeString(String word)
Normalizes a read string word (and maybe intern it). |
| Methods inherited from class edu.stanford.nlp.trees.SentenceNormalizer |
eolIsSentenceEnd |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public PennSentenceNormalizer()
public PennSentenceNormalizer(boolean divideOffTags,
char tagDivider)
divideOffTags - true iff an unescaped
tagDivider
and all characters to the right of it should be cut off
from wordstagDivider - The character that separates words from their tags
public PennSentenceNormalizer(boolean divideOffTags,
char tagDivider,
boolean unescape,
char escapeChar)
divideOffTags - true iff an unescaped
tagDivider
and all characters to the right of it should be cut off
from wordstagDivider - The character that separates words from their tagsunescape - true if words should be unescaped, but at
present this isn't implementedescapeChar - The character used to escape a following character| Method Detail |
public String normalizeString(String word)
SentenceNormalizer
normalizeString in class SentenceNormalizerword - The word to normalize
public Sentence normalizeSentence(Sentence sent,
LabelFactory lf)
Sentence.
It is normally implemented as a List-walking routine.
normalizeSentence in class SentenceNormalizersent - The sentence to be normalizedlf - the LabelFactory to create new Labels (if needed)
public boolean endSentenceToken(String token,
String prev,
String next)
endSentenceToken in class SentenceNormalizertoken - The String to be checkedprev - The previous tokennext - The next token (lookahead)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||