com.Ostermiller.Syntax.Lexer
Class HTMLLexer1

java.lang.Object
  extended by com.Ostermiller.Syntax.Lexer.HTMLLexer1
All Implemented Interfaces:
Lexer

public class HTMLLexer1
extends Object
implements Lexer

HTMLLexer1 is a html 2.0 lexer. Created with JFlex. An example of how it is used:

  HTMLLexer1 shredder = new HTMLLexer1(System.in);
  HTMLToken1 t;
  while ((t = shredder.getNextToken()) != null){
      System.out.println(t);
  }
  

There are two HTML Lexers that come with this package. HTMLLexer is a basic HTML lexer that knows the difference between tags, text, and comments. HTMLLexer1 knows something about the structure of tags and can return names and values from name value pairs. It also knows about text elements such as words and character references. The two are similar but which you should use depends on your purpose. In my opinion the HTMLLexer1 is much better for syntax highlighting.

See Also:
HTMLLexer, HTMLToken1

Field Summary
static int COMMENT_DEF
           
static int DOCTYPE
           
static int FINISH_END_TAG
           
static int PRE
           
static int PRE_TAG
           
static int SCRIPT
           
static int SCRIPT_TAG
           
static int START_DOC_TAG
           
static int START_END_TAG
           
static int START_EQUAL
           
static int START_PRE_EQUAL
           
static int START_PRE_VALUE
           
static int START_SCRIPT_EQUAL
           
static int START_SCRIPT_VALUE
          lexical states
static int START_TAG
           
static int START_TEXTAREA_EQUAL
           
static int START_TEXTAREA_VALUE
           
static int START_VALUE
           
static int TAG
           
static int TAG_END
           
static int TEXTAREA
           
static int TEXTAREA_TAG
           
static int YYEOF
          This character denotes the end of file
static int YYINITIAL
           
 
Constructor Summary
HTMLLexer1(InputStream in)
          Creates a new scanner.
HTMLLexer1(Reader in)
          Creates a new scanner There is also a java.io.InputStream version of this constructor.
 
Method Summary
 Token getNextToken()
          Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
 Token getNextToken(boolean returnComments, boolean returnWhiteSpace)
          next Token method that allows you to control if whitespace and comments are returned as tokens.
static void main(String[] args)
          Prints out tokens from a file or System.in.
 void reset(Reader reader, int yyline, int yychar, int yycolumn)
          Closes the current input stream, and resets the scanner to read from a new input stream.
 void yybegin(int newState)
          Enters a new lexical state
 char yycharat(int pos)
          Returns the character at position pos from the matched text.
 void yyclose()
          Closes the input stream.
 int yylength()
          Returns the length of the matched text region.
 void yypushback(int number)
          Pushes the specified amount of characters back into the input stream.
 void yyreset(Reader reader)
          Resets the scanner to read from a new input stream.
 int yystate()
          Returns the current lexical state.
 String yytext()
          Returns the text matched by the current regular expression.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

YYEOF

public static final int YYEOF
This character denotes the end of file

See Also:
Constant Field Values

START_SCRIPT_VALUE

public static final int START_SCRIPT_VALUE
lexical states

See Also:
Constant Field Values

PRE_TAG

public static final int PRE_TAG
See Also:
Constant Field Values

TAG

public static final int TAG
See Also:
Constant Field Values

COMMENT_DEF

public static final int COMMENT_DEF
See Also:
Constant Field Values

TEXTAREA

public static final int TEXTAREA
See Also:
Constant Field Values

START_PRE_EQUAL

public static final int START_PRE_EQUAL
See Also:
Constant Field Values

START_END_TAG

public static final int START_END_TAG
See Also:
Constant Field Values

START_PRE_VALUE

public static final int START_PRE_VALUE
See Also:
Constant Field Values

TEXTAREA_TAG

public static final int TEXTAREA_TAG
See Also:
Constant Field Values

SCRIPT

public static final int SCRIPT
See Also:
Constant Field Values

START_TEXTAREA_EQUAL

public static final int START_TEXTAREA_EQUAL
See Also:
Constant Field Values

START_TEXTAREA_VALUE

public static final int START_TEXTAREA_VALUE
See Also:
Constant Field Values

TAG_END

public static final int TAG_END
See Also:
Constant Field Values

START_EQUAL

public static final int START_EQUAL
See Also:
Constant Field Values

FINISH_END_TAG

public static final int FINISH_END_TAG
See Also:
Constant Field Values

START_TAG

public static final int START_TAG
See Also:
Constant Field Values

SCRIPT_TAG

public static final int SCRIPT_TAG
See Also:
Constant Field Values

START_VALUE

public static final int START_VALUE
See Also:
Constant Field Values

PRE

public static final int PRE
See Also:
Constant Field Values

START_DOC_TAG

public static final int START_DOC_TAG
See Also:
Constant Field Values

YYINITIAL

public static final int YYINITIAL
See Also:
Constant Field Values

START_SCRIPT_EQUAL

public static final int START_SCRIPT_EQUAL
See Also:
Constant Field Values

DOCTYPE

public static final int DOCTYPE
See Also:
Constant Field Values
Constructor Detail

HTMLLexer1

public HTMLLexer1(Reader in)
Creates a new scanner There is also a java.io.InputStream version of this constructor.

Parameters:
in - the java.io.Reader to read input from.

HTMLLexer1

public HTMLLexer1(InputStream in)
Creates a new scanner. There is also java.io.Reader version of this constructor.

Parameters:
in - the java.io.Inputstream to read input from.
Method Detail

getNextToken

public Token getNextToken(boolean returnComments,
                          boolean returnWhiteSpace)
                   throws IOException
next Token method that allows you to control if whitespace and comments are returned as tokens.

Throws:
IOException

main

public static void main(String[] args)
Prints out tokens from a file or System.in. If no arguments are given, System.in will be used for input. If more arguments are given, the first argument will be used as the name of the file to use as input

Parameters:
args - program arguments, of which the first is a filename

reset

public void reset(Reader reader,
                  int yyline,
                  int yychar,
                  int yycolumn)
           throws IOException
Closes the current input stream, and resets the scanner to read from a new input stream. All internal variables are reset, the old input stream cannot be reused (content of the internal buffer is discarded and lost). The lexical state is set to the initial state. Subsequent tokens read from the lexer will start with the line, char, and column values given here.

Specified by:
reset in interface Lexer
Parameters:
reader - The new input.
yyline - The line number of the first token.
yychar - The position (relative to the start of the stream) of the first token.
yycolumn - The position (relative to the line) of the first token.
Throws:
IOException - if an IOExecption occurs while switching readers.

yyclose

public final void yyclose()
                   throws IOException
Closes the input stream.

Throws:
IOException

yyreset

public final void yyreset(Reader reader)
Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL.

Parameters:
reader - the new input stream

yystate

public final int yystate()
Returns the current lexical state.


yybegin

public final void yybegin(int newState)
Enters a new lexical state

Parameters:
newState - the new lexical state

yytext

public final String yytext()
Returns the text matched by the current regular expression.


yycharat

public final char yycharat(int pos)
Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster

Parameters:
pos - the position of the character to fetch. A value from 0 to yylength()-1.
Returns:
the character at position pos

yylength

public final int yylength()
Returns the length of the matched text region.


yypushback

public void yypushback(int number)
Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method

Parameters:
number - the number of characters to be read again. This number must not be greater than yylength()!

getNextToken

public Token getNextToken()
                   throws IOException
Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.

Specified by:
getNextToken in interface Lexer
Returns:
the next token
Throws:
IOException - if any I/O-Error occurs