Class Lexer

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    final class Lexer
    extends java.lang.Object
    implements java.io.Closeable
    Lexical analyzer.
    Version:
    $Id: Lexer.java 1742468 2016-05-05 20:02:35Z britter $
    • Field Detail

      • DISABLED

        private static final char DISABLED
        Constant char to use for disabling comments, escapes and encapsulation. The value -2 is used because it won't be confused with an EOF signal (-1), and because the Unicode value FFFE would be encoded as two chars (using surrogates) and thus there should never be a collision with a real text char.
        See Also:
        Constant Field Values
      • delimiter

        private final char delimiter
      • escape

        private final char escape
      • quoteChar

        private final char quoteChar
      • commentStart

        private final char commentStart
      • ignoreSurroundingSpaces

        private final boolean ignoreSurroundingSpaces
      • ignoreEmptyLines

        private final boolean ignoreEmptyLines
    • Method Detail

      • nextToken

        Token nextToken​(Token token)
                 throws java.io.IOException
        Returns the next token.

        A token corresponds to a term, a record change or an end-of-file indicator.

        Parameters:
        token - an existing Token object to reuse. The caller is responsible to initialize the Token.
        Returns:
        the next token found
        Throws:
        java.io.IOException - on stream access error
      • parseSimpleToken

        private Token parseSimpleToken​(Token token,
                                       int ch)
                                throws java.io.IOException
        Parses a simple token.

        Simple token are tokens which are not surrounded by encapsulators. A simple token might contain escaped delimiters (as \, or \;). The token is finished when one of the following conditions become true:

        • end of line has been reached (EORECORD)
        • end of stream has been reached (EOF)
        • an unescaped delimiter has been reached (TOKEN)
        Parameters:
        token - the current token
        ch - the current character
        Returns:
        the filled token
        Throws:
        java.io.IOException - on stream access error
      • parseEncapsulatedToken

        private Token parseEncapsulatedToken​(Token token)
                                      throws java.io.IOException
        Parses an encapsulated token.

        Encapsulated tokens are surrounded by the given encapsulating-string. The encapsulator itself might be included in the token using a doubling syntax (as "", '') or using escaping (as in \", \'). Whitespaces before and after an encapsulated token are ignored. The token is finished when one of the following conditions become true:

        • an unescaped encapsulator has been reached, and is followed by optional whitespace then:
          • delimiter (TOKEN)
          • end of line (EORECORD)
        • end of stream has been reached (EOF)
        Parameters:
        token - the current token
        Returns:
        a valid token object
        Throws:
        java.io.IOException - on invalid state: EOF before closing encapsulator or invalid character before delimiter or EOL
      • mapNullToDisabled

        private char mapNullToDisabled​(java.lang.Character c)
      • getCurrentLineNumber

        long getCurrentLineNumber()
        Returns the current line number
        Returns:
        the current line number
      • getCharacterPosition

        long getCharacterPosition()
        Returns the current character position
        Returns:
        the current character position
      • readEscape

        int readEscape()
                throws java.io.IOException
        Handle an escape sequence. The current character must be the escape character. On return, the next character is available by calling ExtendedBufferedReader.getLastChar() on the input stream.
        Returns:
        the unescaped character (as an int) or Constants.END_OF_STREAM if char following the escape is invalid.
        Throws:
        java.io.IOException - if there is a problem reading the stream or the end of stream is detected: the escape character is not allowed at end of strem
      • trimTrailingSpaces

        void trimTrailingSpaces​(java.lang.StringBuilder buffer)
      • readEndOfLine

        boolean readEndOfLine​(int ch)
                       throws java.io.IOException
        Greedily accepts \n, \r and \r\n This checker consumes silently the second control-character...
        Returns:
        true if the given or next character is a line-terminator
        Throws:
        java.io.IOException
      • isClosed

        boolean isClosed()
      • isWhitespace

        boolean isWhitespace​(int ch)
        Returns:
        true if the given char is a whitespace character
      • isStartOfLine

        boolean isStartOfLine​(int ch)
        Checks if the current character represents the start of a line: a CR, LF or is at the start of the file.
        Parameters:
        ch - the character to check
        Returns:
        true if the character is at the start of a line.
      • isEndOfFile

        boolean isEndOfFile​(int ch)
        Returns:
        true if the given character indicates end of file
      • isDelimiter

        boolean isDelimiter​(int ch)
      • isEscape

        boolean isEscape​(int ch)
      • isQuoteChar

        boolean isQuoteChar​(int ch)
      • isCommentStart

        boolean isCommentStart​(int ch)
      • isMetaChar

        private boolean isMetaChar​(int ch)
      • close

        public void close()
                   throws java.io.IOException
        Closes resources.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException - If an I/O error occurs