HTMLLexer1

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.Ostermiller.Syntax.Lexer
Class HTMLLexer1

java.lang.Object
  com.Ostermiller.Syntax.Lexer.HTMLLexer1

All Implemented Interfaces:: Lexer

public class HTMLLexer1
extends Object
implements Lexer
extends Object
implements Lexer

HTMLLexer1 is a html 2.0 lexer. Created with JFlex. An example of how it is used:

  HTMLLexer1 shredder = new HTMLLexer1(System.in);
  HTMLToken1 t;
  while ((t = shredder.getNextToken()) != null){
      System.out.println(t);
  }

There are two HTML Lexers that come with this package. HTMLLexer is a basic HTML lexer that knows the difference between tags, text, and comments. HTMLLexer1 knows something about the structure of tags and can return names and values from name value pairs. It also knows about text elements such as words and character references. The two are similar but which you should use depends on your purpose. In my opinion the HTMLLexer1 is much better for syntax highlighting.

See Also:: HTMLLexer, HTMLToken1

Field Summary
`static int`	`COMMENT_DEF`
`static int`	`DOCTYPE`
`static int`	`FINISH_END_TAG`
`static int`	`PRE`
`static int`	`PRE_TAG`
`static int`	`SCRIPT`
`static int`	`SCRIPT_TAG`
`static int`	`START_DOC_TAG`
`static int`	`START_END_TAG`
`static int`	`START_EQUAL`
`static int`	`START_PRE_EQUAL`
`static int`	`START_PRE_VALUE`
`static int`	`START_SCRIPT_EQUAL`
`static int`	`START_SCRIPT_VALUE` lexical states
`static int`	`START_TAG`
`static int`	`START_TEXTAREA_EQUAL`
`static int`	`START_TEXTAREA_VALUE`
`static int`	`START_VALUE`
`static int`	`TAG`
`static int`	`TAG_END`
`static int`	`TEXTAREA`
`static int`	`TEXTAREA_TAG`
`static int`	`YYEOF` This character denotes the end of file
`static int`	`YYINITIAL`

Constructor Summary
`HTMLLexer1(InputStream in)` Creates a new scanner.
`HTMLLexer1(Reader in)` Creates a new scanner There is also a java.io.InputStream version of this constructor.

Method Summary
`Token`	`getNextToken()` Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
`Token`	`getNextToken(boolean returnComments, boolean returnWhiteSpace)` next Token method that allows you to control if whitespace and comments are returned as tokens.
`static void`	`main(String[] args)` Prints out tokens from a file or System.in.
`void`	`reset(Reader reader, int yyline, int yychar, int yycolumn)` Closes the current input stream, and resets the scanner to read from a new input stream.
`void`	`yybegin(int newState)` Enters a new lexical state
`char`	`yycharat(int pos)` Returns the character at position `pos` from the matched text.
`void`	`yyclose()` Closes the input stream.
`int`	`yylength()` Returns the length of the matched text region.
`void`	`yypushback(int number)` Pushes the specified amount of characters back into the input stream.
`void`	`yyreset(Reader reader)` Resets the scanner to read from a new input stream.
`int`	`yystate()` Returns the current lexical state.
`String`	`yytext()` Returns the text matched by the current regular expression.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

YYEOF

public static final int YYEOF

This character denotes the end of file

See Also:: Constant Field Values

START_SCRIPT_VALUE

public static final int START_SCRIPT_VALUE

lexical states

See Also:: Constant Field Values

PRE_TAG

public static final int PRE_TAG

See Also:: Constant Field Values

TAG

public static final int TAG

See Also:: Constant Field Values

COMMENT_DEF

public static final int COMMENT_DEF

See Also:: Constant Field Values

TEXTAREA

public static final int TEXTAREA

See Also:: Constant Field Values

START_PRE_EQUAL

public static final int START_PRE_EQUAL

See Also:: Constant Field Values

START_END_TAG

public static final int START_END_TAG

See Also:: Constant Field Values

START_PRE_VALUE

public static final int START_PRE_VALUE

See Also:: Constant Field Values

TEXTAREA_TAG

public static final int TEXTAREA_TAG

See Also:: Constant Field Values

SCRIPT

public static final int SCRIPT

See Also:: Constant Field Values

START_TEXTAREA_EQUAL

public static final int START_TEXTAREA_EQUAL

See Also:: Constant Field Values

START_TEXTAREA_VALUE

public static final int START_TEXTAREA_VALUE

See Also:: Constant Field Values

TAG_END

public static final int TAG_END

See Also:: Constant Field Values

START_EQUAL

public static final int START_EQUAL

See Also:: Constant Field Values

FINISH_END_TAG

public static final int FINISH_END_TAG

See Also:: Constant Field Values

START_TAG

public static final int START_TAG

See Also:: Constant Field Values

SCRIPT_TAG

public static final int SCRIPT_TAG

See Also:: Constant Field Values

START_VALUE

public static final int START_VALUE

See Also:: Constant Field Values

PRE

public static final int PRE

See Also:: Constant Field Values

START_DOC_TAG

public static final int START_DOC_TAG

See Also:: Constant Field Values

YYINITIAL

public static final int YYINITIAL

See Also:: Constant Field Values

START_SCRIPT_EQUAL

public static final int START_SCRIPT_EQUAL

See Also:: Constant Field Values

DOCTYPE

public static final int DOCTYPE

See Also:: Constant Field Values

Constructor Detail

HTMLLexer1

public HTMLLexer1(Reader in)

Creates a new scanner There is also a java.io.InputStream version of this constructor.

Parameters:: in - the java.io.Reader to read input from.

HTMLLexer1

public HTMLLexer1(InputStream in)

Creates a new scanner. There is also java.io.Reader version of this constructor.

Parameters:: in - the java.io.Inputstream to read input from.

Method Detail

getNextToken

public Token getNextToken(boolean returnComments,
                          boolean returnWhiteSpace)
                   throws IOException

next Token method that allows you to control if whitespace and comments are returned as tokens.

Throws:: IOException

main

public static void main(String[] args)

Prints out tokens from a file or System.in. If no arguments are given, System.in will be used for input. If more arguments are given, the first argument will be used as the name of the file to use as input

Parameters:: args - program arguments, of which the first is a filename

reset

public void reset(Reader reader,
                  int yyline,
                  int yychar,
                  int yycolumn)
           throws IOException

Closes the current input stream, and resets the scanner to read from a new input stream. All internal variables are reset, the old input stream cannot be reused (content of the internal buffer is discarded and lost). The lexical state is set to the initial state. Subsequent tokens read from the lexer will start with the line, char, and column values given here.

Specified by:: reset in interface Lexer

Parameters:: reader - The new input.; yyline - The line number of the first token.; yychar - The position (relative to the start of the stream) of the first token.; yycolumn - The position (relative to the line) of the first token.
Throws:: IOException - if an IOExecption occurs while switching readers.

yyclose

public final void yyclose()
                   throws IOException

Closes the input stream.

Throws:: IOException

yyreset

public final void yyreset(Reader reader)

Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL.

Parameters:: reader - the new input stream

yystate

public final int yystate()

Returns the current lexical state.

yybegin

public final void yybegin(int newState)

Enters a new lexical state

Parameters:: newState - the new lexical state

yytext

public final String yytext()

Returns the text matched by the current regular expression.

yycharat

public final char yycharat(int pos)

Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster

Parameters:: pos - the position of the character to fetch. A value from 0 to yylength()-1.
Returns:: the character at position pos

yylength

public final int yylength()

Returns the length of the matched text region.

yypushback

public void yypushback(int number)

Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method

Parameters:: number - the number of characters to be read again. This number must not be greater than yylength()!

getNextToken

public Token getNextToken()
                   throws IOException

Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.

Specified by:: getNextToken in interface Lexer

Returns:: the next token
Throws:: IOException - if any I/O-Error occurs

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.Ostermiller.Syntax.Lexer Class HTMLLexer1

YYEOF

START_SCRIPT_VALUE

PRE_TAG

TAG

COMMENT_DEF

TEXTAREA

START_PRE_EQUAL

START_END_TAG

START_PRE_VALUE

TEXTAREA_TAG

SCRIPT

START_TEXTAREA_EQUAL

START_TEXTAREA_VALUE

TAG_END

START_EQUAL

FINISH_END_TAG

START_TAG

SCRIPT_TAG

START_VALUE

PRE

START_DOC_TAG

YYINITIAL

START_SCRIPT_EQUAL

DOCTYPE

HTMLLexer1

HTMLLexer1

getNextToken

main

reset

yyclose

yyreset

yystate

yybegin

yytext

yycharat

yylength

yypushback

getNextToken

com.Ostermiller.Syntax.Lexer
Class HTMLLexer1