com.Ostermiller.util Java Utilities


com.Ostermiller.util
Class CSVLexer

java.lang.Object
  extended by com.Ostermiller.util.CSVLexer

public class CSVLexer
extends Object

Read files in comma separated value format. More information about this class is available from ostermiller.org. The use of this class is no longer recommended. It is now recommended that you use com.Ostermiller.util.CSVParser instead. That class, has a cleaner API, and methods for returning all the values on a line in a String array. CSV is a file format used as a portable representation of a database. Each line is one entry or record and the fields in a record are separated by commas. Commas may be preceded or followed by arbitrary space and/or tab characters which are ignored.

If field includes a comma or a new line, the whole field must be surrounded with double quotes. When the field is in quotes, any quote literals must be escaped by \" Backslash literals must be escaped by \\. Otherwise a backslash an the character following it will be treated as the following character, ie."\n" is equivelent to "n". Other escape sequences may be set using the setEscapes() method. Text that comes after quotes that have been closed but come before the next comma will be ignored.

Empty fields are returned as as String of length zero: "". The following line has four empty fields and two non-empty fields in it. There is an empty field on each end, and two in the middle.

,second,, ,fifth,

Blank lines are always ignored. Other lines will be ignored if they start with a comment character as set by the setCommentStart() method.

An example of how CVSLexer might be used:

 CSVLexer shredder = new CSVLexer(System.in);
 shredder.setCommentStart("#;!");
 shredder.setEscapes("nrtf", "\n\r\t\f");
 String t;
 while ((t = shredder.getNextToken()) != null) {
           System.out.println("" + shredder.getLineNumber() + " " + t);
 }
 

Since:
ostermillerutils 1.00.00
Author:
Stephen Ostermiller http://ostermiller.org/contact.pl?regarding=Java+Utilities

Field Summary
static int AFTER
           
static int BEFORE
          lexical states
static int COMMENT
           
static int YYEOF
          This character denotes the end of file
static int YYINITIAL
           
 
Constructor Summary
CSVLexer(InputStream in)
          Creates a new scanner.
CSVLexer(Reader in)
          Creates a new scanner There is also a java.io.InputStream version of this constructor.
 
Method Summary
 void changeDelimiter(char newDelim)
          Change this Lexer so that it uses a new delimiter.
 void changeQuote(char newQuote)
          Change this Lexer so that it uses a new character for quoting.
 int getLineNumber()
          Get the line number that the last token came from.
 String getNextToken()
          Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
static void main(String[] args)
          Prints out tokens and line numbers from a file or System.in.
 void setCommentStart(String commentDelims)
          Set the characters that indicate a comment at the beginning of the line.
 void setEscapes(String escapes, String replacements)
          Specify escape sequences and their replacements.
 void yybegin(int newState)
          Enters a new lexical state
 char yycharat(int pos)
          Returns the character at position pos from the matched text.
 void yyclose()
          Closes the input stream.
 int yylength()
          Returns the length of the matched text region.
 void yypushback(int number)
          Pushes the specified amount of characters back into the input stream.
 void yyreset(Reader reader)
          Resets the scanner to read from a new input stream.
 int yystate()
          Returns the current lexical state.
 String yytext()
          Returns the text matched by the current regular expression.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

YYEOF

public static final int YYEOF
This character denotes the end of file

See Also:
Constant Field Values

BEFORE

public static final int BEFORE
lexical states

See Also:
Constant Field Values

YYINITIAL

public static final int YYINITIAL
See Also:
Constant Field Values

COMMENT

public static final int COMMENT
See Also:
Constant Field Values

AFTER

public static final int AFTER
See Also:
Constant Field Values
Constructor Detail

CSVLexer

public CSVLexer(Reader in)
Creates a new scanner There is also a java.io.InputStream version of this constructor.

Parameters:
in - the java.io.Reader to read input from.

CSVLexer

public CSVLexer(InputStream in)
Creates a new scanner. There is also java.io.Reader version of this constructor.

Parameters:
in - the java.io.Inputstream to read input from.
Method Detail

main

public static void main(String[] args)
Prints out tokens and line numbers from a file or System.in. If no arguments are given, System.in will be used for input. If more arguments are given, the first argument will be used as the name of the file to use as input

Parameters:
args - program arguments, of which the first is a filename
Since:
ostermillerutils 1.00.00

changeDelimiter

public void changeDelimiter(char newDelim)
                     throws BadDelimiterException
Change this Lexer so that it uses a new delimiter.

The initial character is a comma, the delimiter cannot be changed to a quote or other character that has special meaning in CSV.

Parameters:
newDelim - delimiter to which to switch.
Throws:
BadDelimiterException - if the character cannot be used as a delimiter.
Since:
ostermillerutils 1.00.00

changeQuote

public void changeQuote(char newQuote)
                 throws BadQuoteException
Change this Lexer so that it uses a new character for quoting.

The initial character is a double quote ("), the delimiter cannot be changed to a comma or other character that has special meaning in CSV.

Parameters:
newQuote - character to use for quoting.
Throws:
BadQuoteException - if the character cannot be used as a quote.
Since:
ostermillerutils 1.00.00

setEscapes

public void setEscapes(String escapes,
                       String replacements)
Specify escape sequences and their replacements. Escape sequences set here are in addition to \\ and \". \\ and \" are always valid escape sequences. This method allows standard escape sequenced to be used. For example "\n" can be set to be a newline rather than an 'n'. A common way to call this method might be:
setEscapes("nrtf", "\n\r\t\f");
which would set the escape sequences to be the Java escape sequences. Characters that follow a \ that are not escape sequences will still be interpreted as that character.
The two arguemnts to this method must be the same length. If they are not, the longer of the two will be truncated.

Parameters:
escapes - a list of characters that will represent escape sequences.
replacements - the list of repacement characters for those escape sequences.
Since:
ostermillerutils 1.00.00

setCommentStart

public void setCommentStart(String commentDelims)
Set the characters that indicate a comment at the beginning of the line. For example if the string "#;!" were passed in, all of the following lines would be comments:
 # Comment
 ; Another Comment
 ! Yet another comment
By default there are no comments in CVS files. Commas and quotes may not be used to indicate comment lines.

Parameters:
commentDelims - list of characters a comment line may start with.
Since:
ostermillerutils 1.00.00

getLineNumber

public int getLineNumber()
Get the line number that the last token came from.

New line breaks that occur in the middle of a token are not counted in the line number count.

If no tokens have been returned, the line number is undefined.

Returns:
line number of the last token.
Since:
ostermillerutils 1.00.00

yyclose

public final void yyclose()
                   throws IOException
Closes the input stream.

Throws:
IOException

yyreset

public final void yyreset(Reader reader)
Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL.

Parameters:
reader - the new input stream

yystate

public final int yystate()
Returns the current lexical state.


yybegin

public final void yybegin(int newState)
Enters a new lexical state

Parameters:
newState - the new lexical state

yytext

public final String yytext()
Returns the text matched by the current regular expression.


yycharat

public final char yycharat(int pos)
Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster

Parameters:
pos - the position of the character to fetch. A value from 0 to yylength()-1.
Returns:
the character at position pos

yylength

public final int yylength()
Returns the length of the matched text region.


yypushback

public void yypushback(int number)
Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method

Parameters:
number - the number of characters to be read again. This number must not be greater than yylength()!

getNextToken

public String getNextToken()
                    throws IOException
Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.

Returns:
the next token
Throws:
IOException - if any I/O-Error occurs

com.Ostermiller.util Java Utilities


Copyright © 2001-2012 by Stephen Ostermiller