PennSentenceNormalizer (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.trees
Class PennSentenceNormalizer

java.lang.Object
  |
  +--edu.stanford.nlp.trees.SentenceNormalizer
        |
        +--edu.stanford.nlp.trees.PennSentenceNormalizer

Direct Known Subclasses:: PennSentenceMrgNormalizer

public class PennSentenceNormalizer
extends SentenceNormalizer

A class for Penn tag directory sentence normalization. This one knows about the funny things in Penn Treebank pos files -- like lots of equals signs and square brackets. It also interns strings. A Singleton.

Constructor Summary

PennSentenceNormalizer()
          Constructs a PennSentenceNormalizer object.

PennSentenceNormalizer(boolean divideOffTags, char tagDivider)
          Constructs a PennSentenceNormalizer object.

PennSentenceNormalizer(boolean divideOffTags, char tagDivider, boolean unescape, char escapeChar)
          Constructs a PennSentenceNormalizer object.

Method Summary

boolean endSentenceToken(String token, String prev, String next)
          Returns true if this token represents the end of a sentence.

Sentence normalizeSentence(Sentence sent, LabelFactory lf)
          Normalize a sentence -- this method assumes that the argument that it is passed is the whole (linguistic) Sentence.

String normalizeString(String word)
          Normalizes a read string word (and maybe intern it).

Methods inherited from class edu.stanford.nlp.trees.SentenceNormalizer

eolIsSentenceEnd

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

PennSentenceNormalizer

public PennSentenceNormalizer()

Constructs a PennSentenceNormalizer object.

PennSentenceNormalizer

public PennSentenceNormalizer(boolean divideOffTags,
                              char tagDivider)

Constructs a PennSentenceNormalizer object.
Parameters:: divideOffTags - true iff an unescaped tagDivider and all characters to the right of it should be cut off from words; tagDivider - The character that separates words from their tags

PennSentenceNormalizer

public PennSentenceNormalizer(boolean divideOffTags,
                              char tagDivider,
                              boolean unescape,
                              char escapeChar)

Constructs a PennSentenceNormalizer object.
Parameters:: divideOffTags - true iff an unescaped tagDivider and all characters to the right of it should be cut off from words; tagDivider - The character that separates words from their tags; unescape - true if words should be unescaped, but at present this isn't implemented; escapeChar - The character used to escape a following character

Method Detail

normalizeString

public String normalizeString(String word)

Description copied from class: SentenceNormalizer

Normalizes a read string word (and maybe intern it).

Overrides:: normalizeString in class SentenceNormalizer

Parameters:: word - The word to normalize
Returns:: The normalized form

normalizeSentence

public Sentence normalizeSentence(Sentence sent,
                                  LabelFactory lf)

Normalize a sentence -- this method assumes that the argument that it is passed is the whole (linguistic) Sentence. It is normally implemented as a List-walking routine.

Overrides:: normalizeSentence in class SentenceNormalizer

Parameters:: sent - The sentence to be normalized; lf - the LabelFactory to create new Labels (if needed)
Returns:: Sentence the normalized sentence

endSentenceToken

public boolean endSentenceToken(String token,
                                String prev,
                                String next)

Returns true if this token represents the end of a sentence. Perhaps shouldn't be in this class, but it seemed a good place since other source-specific handling is here.... This is called on the token as read _prior_ to normalization. This seems more useful, as can detect things that are deleted during the normalization process.

Overrides:: endSentenceToken in class SentenceNormalizer

Parameters:: token - The String to be checked; prev - The previous token; next - The next token (lookahead)
Returns:: boolean True if this token is a sentence end

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Stanford NLP Group