HTMLParser (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.web
Class HTMLParser

java.lang.Object
  |
  +--javax.swing.text.html.HTMLEditorKit.ParserCallback
        |
        +--edu.stanford.nlp.web.HTMLParser

Direct Known Subclasses:: LocusLinkParser, USPDIParser

public class HTMLParser
extends HTMLEditorKit.ParserCallback

Parses an HTML document and returns the plain text (and title). The main thing that HTMLParser is used for is the parse(String url) method, which will return a String with the contents of an HTML page, without the tags. After calling parse, you can get the HTML title (contents of the TITLE tag) by calling title(). Subclasses may override the handleText(), handleComment(), handleStartTag(), etc. methods so that parse(String url) returns something other than the text of the web page. (For example, one may be interested in returning only part of the text, or only the links.)

Field Summary

protected StringBuffer textBuffer


protected String title


Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback

IMPLIED

Constructor Summary

HTMLParser()


Method Summary

void handleEndTag(HTML.Tag tag, int pos)
          Sets a flag if the end tag is the "TITLE" element end tag

void handleStartTag(HTML.Tag tag, MutableAttributeSet attrSet, int pos)
          Sets a flag if the start tag is the "TITLE" element start tag.

void handleText(char[] data, int pos)


String parse(Reader r)


String parse(String text)


String parse(URL url)


String title()


Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback

flush, handleComment, handleEndOfLineString, handleError, handleSimpleTag

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

textBuffer

protected StringBuffer textBuffer

title

protected String title

Constructor Detail

HTMLParser

public HTMLParser()

Method Detail

handleText

public void handleText(char[] data,
                       int pos)

Overrides:: handleText in class HTMLEditorKit.ParserCallback

handleStartTag

public void handleStartTag(HTML.Tag tag,
                           MutableAttributeSet attrSet,
                           int pos)

Sets a flag if the start tag is the "TITLE" element start tag.

Overrides:: handleStartTag in class HTMLEditorKit.ParserCallback

handleEndTag

public void handleEndTag(HTML.Tag tag,
                         int pos)

Sets a flag if the end tag is the "TITLE" element end tag

Overrides:: handleEndTag in class HTMLEditorKit.ParserCallback

parse

public String parse(URL url)
             throws IOException

IOException

parse

public String parse(String text)
             throws IOException

IOException

parse

public String parse(Reader r)
             throws IOException

IOException

title

public String title()

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Stanford NLP Group

Field Summary
`protected StringBuffer`	`textBuffer`
`protected String`	`title`

Method Summary
`void`	`handleEndTag(HTML.Tag tag, int pos)` Sets a flag if the end tag is the "TITLE" element end tag
`void`	`handleStartTag(HTML.Tag tag, MutableAttributeSet attrSet, int pos)` Sets a flag if the start tag is the "TITLE" element start tag.
`void`	`handleText(char[] data, int pos)`
`String`	`parse(Reader r)`
`String`	`parse(String text)`
`String`	`parse(URL url)`
`String`	`title()`

edu.stanford.nlp.web Class HTMLParser

textBuffer

title

HTMLParser

handleText

handleStartTag

handleEndTag

parse

parse

parse

title

edu.stanford.nlp.web
Class HTMLParser