com.Ostermiller.Syntax.Lexer
Class HTMLLexer

java.lang.Object
  extended by com.Ostermiller.Syntax.Lexer.HTMLLexer
All Implemented Interfaces:
Lexer

public class HTMLLexer
extends Object
implements Lexer

HTMLLexer is a html 2.0 lexer. Created with JFlex. An example of how it is used:

  HTMLLexer shredder = new HTMLLexer(System.in);
  HTMLToken t;
  while ((t = shredder.getNextToken()) != null){
      System.out.println(t);
  }
  

There are two HTML Lexers that come with this package. HTMLLexer is a basic HTML lexer that knows the difference between tags, text, and comments. HTMLLexer1 knows something about the structure of tags and can return names and values from name value pairs. It also knows about text elements such as words and character references. The two are similar but which you should use depends on your purpose. In my opinion the HTMLLexer1 is much better for syntax highlighting.

See Also:
HTMLLexer1, HTMLToken

Field Summary
static int PRE
           
static int SCRIPT
          lexical states
static int TEXTAREA
           
static int YYEOF
          This character denotes the end of file
static int YYINITIAL
           
 
Constructor Summary
HTMLLexer(InputStream in)
          Creates a new scanner.
HTMLLexer(Reader in)
          Creates a new scanner There is also a java.io.InputStream version of this constructor.
 
Method Summary
 Token getNextToken()
          Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
 Token getNextToken(boolean returnComments, boolean returnWhiteSpace)
          next Token method that allows you to control if whitespace and comments are returned as tokens.
static void main(String[] args)
          Prints out tokens from a file or System.in.
 void reset(Reader reader, int yyline, int yychar, int yycolumn)
          Closes the current input stream, and resets the scanner to read from a new input stream.
 void yybegin(int newState)
          Enters a new lexical state
 char yycharat(int pos)
          Returns the character at position pos from the matched text.
 void yyclose()
          Closes the input stream.
 int yylength()
          Returns the length of the matched text region.
 void yypushback(int number)
          Pushes the specified amount of characters back into the input stream.
 void yyreset(Reader reader)
          Resets the scanner to read from a new input stream.
 int yystate()
          Returns the current lexical state.
 String yytext()
          Returns the text matched by the current regular expression.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

YYEOF

public static final int YYEOF
This character denotes the end of file

See Also:
Constant Field Values

SCRIPT

public static final int SCRIPT
lexical states

See Also:
Constant Field Values

YYINITIAL

public static final int YYINITIAL
See Also:
Constant Field Values

PRE

public static final int PRE
See Also:
Constant Field Values

TEXTAREA

public static final int TEXTAREA
See Also:
Constant Field Values
Constructor Detail

HTMLLexer

public HTMLLexer(Reader in)
Creates a new scanner There is also a java.io.InputStream version of this constructor.

Parameters:
in - the java.io.Reader to read input from.

HTMLLexer

public HTMLLexer(InputStream in)
Creates a new scanner. There is also java.io.Reader version of this constructor.

Parameters:
in - the java.io.Inputstream to read input from.
Method Detail

getNextToken

public Token getNextToken(boolean returnComments,
                          boolean returnWhiteSpace)
                   throws IOException
next Token method that allows you to control if whitespace and comments are returned as tokens.

Throws:
IOException

main

public static void main(String[] args)
Prints out tokens from a file or System.in. If no arguments are given, System.in will be used for input. If more arguments are given, the first argument will be used as the name of the file to use as input

Parameters:
args - program arguments, of which the first is a filename

reset

public void reset(Reader reader,
                  int yyline,
                  int yychar,
                  int yycolumn)
           throws IOException
Closes the current input stream, and resets the scanner to read from a new input stream. All internal variables are reset, the old input stream cannot be reused (content of the internal buffer is discarded and lost). The lexical state is set to the initial state. Subsequent tokens read from the lexer will start with the line, char, and column values given here.

Specified by:
reset in interface Lexer
Parameters:
reader - The new input.
yyline - The line number of the first token.
yychar - The position (relative to the start of the stream) of the first token.
yycolumn - The position (relative to the line) of the first token.
Throws:
IOException - if an IOExecption occurs while switching readers.

yyclose

public final void yyclose()
                   throws IOException
Closes the input stream.

Throws:
IOException

yyreset

public final void yyreset(Reader reader)
Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL.

Parameters:
reader - the new input stream

yystate

public final int yystate()
Returns the current lexical state.


yybegin

public final void yybegin(int newState)
Enters a new lexical state

Parameters:
newState - the new lexical state

yytext

public final String yytext()
Returns the text matched by the current regular expression.


yycharat

public final char yycharat(int pos)
Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster

Parameters:
pos - the position of the character to fetch. A value from 0 to yylength()-1.
Returns:
the character at position pos

yylength

public final int yylength()
Returns the length of the matched text region.


yypushback

public void yypushback(int number)
Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method

Parameters:
number - the number of characters to be read again. This number must not be greater than yylength()!

getNextToken

public Token getNextToken()
                   throws IOException
Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.

Specified by:
getNextToken in interface Lexer
Returns:
the next token
Throws:
IOException - if any I/O-Error occurs