Package org.apache.commons.csv
Class Lexer
- java.lang.Object
-
- org.apache.commons.csv.Lexer
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable
final class Lexer extends java.lang.Object implements java.io.CloseableLexical analyzer.- Version:
- $Id: Lexer.java 1742468 2016-05-05 20:02:35Z britter $
-
-
Field Summary
Fields Modifier and Type Field Description private charcommentStartprivate chardelimiterprivate static charDISABLEDConstant char to use for disabling comments, escapes and encapsulation.private charescapeprivate booleanignoreEmptyLinesprivate booleanignoreSurroundingSpacesprivate charquoteCharprivate ExtendedBufferedReaderreaderThe input stream
-
Constructor Summary
Constructors Constructor Description Lexer(CSVFormat format, ExtendedBufferedReader reader)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Closes resources.(package private) longgetCharacterPosition()Returns the current character position(package private) longgetCurrentLineNumber()Returns the current line number(package private) booleanisClosed()(package private) booleanisCommentStart(int ch)(package private) booleanisDelimiter(int ch)(package private) booleanisEndOfFile(int ch)(package private) booleanisEscape(int ch)private booleanisMetaChar(int ch)(package private) booleanisQuoteChar(int ch)(package private) booleanisStartOfLine(int ch)Checks if the current character represents the start of a line: a CR, LF or is at the start of the file.(package private) booleanisWhitespace(int ch)private charmapNullToDisabled(java.lang.Character c)(package private) TokennextToken(Token token)Returns the next token.private TokenparseEncapsulatedToken(Token token)Parses an encapsulated token.private TokenparseSimpleToken(Token token, int ch)Parses a simple token.(package private) booleanreadEndOfLine(int ch)Greedily accepts \n, \r and \r\n This checker consumes silently the second control-character...(package private) intreadEscape()Handle an escape sequence.(package private) voidtrimTrailingSpaces(java.lang.StringBuilder buffer)
-
-
-
Field Detail
-
DISABLED
private static final char DISABLED
Constant char to use for disabling comments, escapes and encapsulation. The value -2 is used because it won't be confused with an EOF signal (-1), and because the Unicode valueFFFEwould be encoded as two chars (using surrogates) and thus there should never be a collision with a real text char.- See Also:
- Constant Field Values
-
delimiter
private final char delimiter
-
escape
private final char escape
-
quoteChar
private final char quoteChar
-
commentStart
private final char commentStart
-
ignoreSurroundingSpaces
private final boolean ignoreSurroundingSpaces
-
ignoreEmptyLines
private final boolean ignoreEmptyLines
-
reader
private final ExtendedBufferedReader reader
The input stream
-
-
Constructor Detail
-
Lexer
Lexer(CSVFormat format, ExtendedBufferedReader reader)
-
-
Method Detail
-
nextToken
Token nextToken(Token token) throws java.io.IOException
Returns the next token.A token corresponds to a term, a record change or an end-of-file indicator.
- Parameters:
token- an existing Token object to reuse. The caller is responsible to initialize the Token.- Returns:
- the next token found
- Throws:
java.io.IOException- on stream access error
-
parseSimpleToken
private Token parseSimpleToken(Token token, int ch) throws java.io.IOException
Parses a simple token. Simple token are tokens which are not surrounded by encapsulators. A simple token might contain escaped delimiters (as \, or \;). The token is finished when one of the following conditions become true:- end of line has been reached (EORECORD)
- end of stream has been reached (EOF)
- an unescaped delimiter has been reached (TOKEN)
- Parameters:
token- the current tokench- the current character- Returns:
- the filled token
- Throws:
java.io.IOException- on stream access error
-
parseEncapsulatedToken
private Token parseEncapsulatedToken(Token token) throws java.io.IOException
Parses an encapsulated token. Encapsulated tokens are surrounded by the given encapsulating-string. The encapsulator itself might be included in the token using a doubling syntax (as "", '') or using escaping (as in \", \'). Whitespaces before and after an encapsulated token are ignored. The token is finished when one of the following conditions become true:- an unescaped encapsulator has been reached, and is followed by optional whitespace then:
- delimiter (TOKEN)
- end of line (EORECORD)
- end of stream has been reached (EOF)
- Parameters:
token- the current token- Returns:
- a valid token object
- Throws:
java.io.IOException- on invalid state: EOF before closing encapsulator or invalid character before delimiter or EOL
-
mapNullToDisabled
private char mapNullToDisabled(java.lang.Character c)
-
getCurrentLineNumber
long getCurrentLineNumber()
Returns the current line number- Returns:
- the current line number
-
getCharacterPosition
long getCharacterPosition()
Returns the current character position- Returns:
- the current character position
-
readEscape
int readEscape() throws java.io.IOExceptionHandle an escape sequence. The current character must be the escape character. On return, the next character is available by callingExtendedBufferedReader.getLastChar()on the input stream.- Returns:
- the unescaped character (as an int) or
Constants.END_OF_STREAMif char following the escape is invalid. - Throws:
java.io.IOException- if there is a problem reading the stream or the end of stream is detected: the escape character is not allowed at end of strem
-
trimTrailingSpaces
void trimTrailingSpaces(java.lang.StringBuilder buffer)
-
readEndOfLine
boolean readEndOfLine(int ch) throws java.io.IOExceptionGreedily accepts \n, \r and \r\n This checker consumes silently the second control-character...- Returns:
- true if the given or next character is a line-terminator
- Throws:
java.io.IOException
-
isClosed
boolean isClosed()
-
isWhitespace
boolean isWhitespace(int ch)
- Returns:
- true if the given char is a whitespace character
-
isStartOfLine
boolean isStartOfLine(int ch)
Checks if the current character represents the start of a line: a CR, LF or is at the start of the file.- Parameters:
ch- the character to check- Returns:
- true if the character is at the start of a line.
-
isEndOfFile
boolean isEndOfFile(int ch)
- Returns:
- true if the given character indicates end of file
-
isDelimiter
boolean isDelimiter(int ch)
-
isEscape
boolean isEscape(int ch)
-
isQuoteChar
boolean isQuoteChar(int ch)
-
isCommentStart
boolean isCommentStart(int ch)
-
isMetaChar
private boolean isMetaChar(int ch)
-
close
public void close() throws java.io.IOExceptionCloses resources.- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Throws:
java.io.IOException- If an I/O error occurs
-
-