String Tokenizer

A drop in replacement for the java.util.StringTokenizer class. Because Sun is introducing a regular expression package in the Merlin release of java, StringTokenizer has become a legacy class and they have stopped making enhancements to it.

This implementation of StringTokenizer tries to fix the following bugs:

  • Bug 4086845, Bug 4140850, Bug 4233089, Bug 4287338 Report empty/null tokens - It would be nice to have the string "one,two,,four" have four tokens. This implementation has the option to return empty tokens as an empty string ("").
  • Bug 4232593 Performance improvements - StringTokenizer can be optimized for some common cases. The optimizations that were implemented in java.util.StringTokenizer involving maximum characters from this suggestion have also been implemented here. Also token count is cached between calls.
  • Bug 4232770 Null parameter handling not specified - The string to parse should not be null, but the delimiter lists may be null in this implementation.
  • Bug 4229450 Add a String[] toArray method - Convenience method added for returning all the tokens as an array of Strings.
  • Bug 4391919 getPosition() method - Exposes the internal state of the StringTokenizer. In addition, other internal state variables are protected and well documented so this class can be easily sub-classed.
  • Bug 4228047 Allow re-use to minimize heap activity - A setText() method was added for this purpose.
  • Bug 4238266, Bug 4080058, Bug 4280213, Bug 4310962, Bug 4338282, Bug 4377004, Bug 4359719 Fixed hasMoreElements() side-effect (no easy way to deal with code broken by this fix) - hasMoreElements() had an unexpected side-effect. Some people relied on this and the fix broke their code. In the fixed version there was no good way to recover the functionality of the broken code. Methods are introduced here to correct that (API taken from Bug 4359719).
  • Bug 4359719 token delimiters and nontoken delimiters - Some delimiters may be returned as tokens and at the same time, other should not be returned. This bug report contains code with a nice API that I have used for this implementation.
  • Bug 4215241 restOfText() method - Return the remaining un-tokenized portion of the string.
  • Change delimiters without returning token - Contains methods for changing the delimiters midstream while getting a token, counting tokens, or not associated with another action.
  • Peek at the next token - A peek() method is available for looking ahead at the next token. (Inspired by com.tinyplanet.docwiz.PeekStringTokenizer)