Class CSVParser

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, java.lang.Iterable<CSVRecord>

    public final class CSVParser
    extends java.lang.Object
    implements java.lang.Iterable<CSVRecord>, java.io.Closeable
    Parses CSV files according to the specified format. Because CSV appears in many different dialects, the parser supports many formats by allowing the specification of a CSVFormat. The parser works record wise. It is not possible to go back, once a record has been parsed from the input stream.

    Creating instances

    There are several static factory methods that can be used to create instances for various types of resources:

    Alternatively parsers can also be created by passing a Reader directly to the sole constructor. For those who like fluent APIs, parsers can be created using CSVFormat.parse(java.io.Reader) as a shortcut:

     for(CSVRecord record : CSVFormat.EXCEL.parse(in)) {
         ...
     }
     

    Parsing record wise

    To parse a CSV input from a file, you write:

     File csvData = new File("/path/to/csv");
     CSVParser parser = CSVParser.parse(csvData, CSVFormat.RFC4180);
     for (CSVRecord csvRecord : parser) {
         ...
     }
     

    This will read the parse the contents of the file using the RFC 4180 format.

    To parse CSV input in a format like Excel, you write:

     CSVParser parser = CSVParser.parse(csvData, CSVFormat.EXCEL);
     for (CSVRecord csvRecord : parser) {
         ...
     }
     

    If the predefined formats don't match the format at hands, custom formats can be defined. More information about customising CSVFormats is available in CSVFormat JavaDoc.

    Parsing into memory

    If parsing record wise is not desired, the contents of the input can be read completely into memory.

     Reader in = new StringReader("a;b\nc;d");
     CSVParser parser = new CSVParser(in, CSVFormat.EXCEL);
     List<CSVRecord> list = parser.getRecords();
     

    There are two constraints that have to be kept in mind:

    1. Parsing into memory starts at the current position of the parser. If you have already parsed records from the input, those records will not end up in the in memory representation of your CSV data.
    2. Parsing into memory may consume a lot of system resources depending on the input. For example if you're parsing a 150MB file of CSV data the contents will be read completely into memory.

    Notes

    Internal parser state is completely covered by the format and the reader-state.

    Version:
    $Id: CSVParser.java 1743529 2016-05-12 17:02:05Z ggregory $
    See Also:
    package documentation for more details
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private long characterOffset
      Lexer offset when the parser does not start parsing at the beginning of the source.
      private CSVFormat format  
      private java.util.Map<java.lang.String,​java.lang.Integer> headerMap
      A mapping of column names to column indices
      private Lexer lexer  
      private java.util.List<java.lang.String> record
      A record buffer for getRecord().
      private long recordNumber
      The next record number to assign.
      private Token reusableToken  
    • Constructor Summary

      Constructors 
      Constructor Description
      CSVParser​(java.io.Reader reader, CSVFormat format)
      Customized CSV parser using the given CSVFormat
      CSVParser​(java.io.Reader reader, CSVFormat format, long characterOffset, long recordNumber)
      Customized CSV parser using the given CSVFormat
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void addRecordValue​(boolean lastRecord)  
      void close()
      Closes resources.
      long getCurrentLineNumber()
      Returns the current line number in the input stream.
      java.util.Map<java.lang.String,​java.lang.Integer> getHeaderMap()
      Returns a copy of the header map that iterates in column order.
      long getRecordNumber()
      Returns the current record number in the input stream.
      java.util.List<CSVRecord> getRecords()
      Parses the CSV input according to the given format and returns the content as a list of CSVRecords.
      private java.util.Map<java.lang.String,​java.lang.Integer> initializeHeader()
      Initializes the name to index mapping if the format defines a header.
      boolean isClosed()
      Gets whether this parser is closed.
      java.util.Iterator<CSVRecord> iterator()
      Returns an iterator on the records.
      (package private) CSVRecord nextRecord()
      Parses the next record from the current point in the stream.
      static CSVParser parse​(java.io.File file, java.nio.charset.Charset charset, CSVFormat format)
      Creates a parser for the given File.
      static CSVParser parse​(java.lang.String string, CSVFormat format)
      Creates a parser for the given String.
      static CSVParser parse​(java.net.URL url, java.nio.charset.Charset charset, CSVFormat format)
      Creates a parser for the given URL.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.lang.Iterable

        forEach, spliterator
    • Field Detail

      • headerMap

        private final java.util.Map<java.lang.String,​java.lang.Integer> headerMap
        A mapping of column names to column indices
      • lexer

        private final Lexer lexer
      • record

        private final java.util.List<java.lang.String> record
        A record buffer for getRecord(). Grows as necessary and is reused.
      • recordNumber

        private long recordNumber
        The next record number to assign.
      • characterOffset

        private final long characterOffset
        Lexer offset when the parser does not start parsing at the beginning of the source. Usually used in combination with recordNumber.
      • reusableToken

        private final Token reusableToken
    • Constructor Detail

      • CSVParser

        public CSVParser​(java.io.Reader reader,
                         CSVFormat format)
                  throws java.io.IOException
        Customized CSV parser using the given CSVFormat

        If you do not read all records from the given reader, you should call close() on the parser, unless you close the reader.

        Parameters:
        reader - a Reader containing CSV-formatted input. Must not be null.
        format - the CSVFormat used for CSV parsing. Must not be null.
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either reader or format are null.
        java.io.IOException - If there is a problem reading the header or skipping the first record
      • CSVParser

        public CSVParser​(java.io.Reader reader,
                         CSVFormat format,
                         long characterOffset,
                         long recordNumber)
                  throws java.io.IOException
        Customized CSV parser using the given CSVFormat

        If you do not read all records from the given reader, you should call close() on the parser, unless you close the reader.

        Parameters:
        reader - a Reader containing CSV-formatted input. Must not be null.
        format - the CSVFormat used for CSV parsing. Must not be null.
        characterOffset - Lexer offset when the parser does not start parsing at the beginning of the source.
        recordNumber - The next record number to assign
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either reader or format are null.
        java.io.IOException - If there is a problem reading the header or skipping the first record
        Since:
        1.1
    • Method Detail

      • parse

        public static CSVParser parse​(java.io.File file,
                                      java.nio.charset.Charset charset,
                                      CSVFormat format)
                               throws java.io.IOException
        Creates a parser for the given File.

        Note: This method internally creates a FileReader using FileReader(java.io.File) which in turn relies on the default encoding of the JVM that is executing the code. If this is insufficient create a URL to the file and use parse(URL, Charset, CSVFormat)

        Parameters:
        file - a CSV file. Must not be null.
        charset - A charset
        format - the CSVFormat used for CSV parsing. Must not be null.
        Returns:
        a new parser
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either file or format are null.
        java.io.IOException - If an I/O error occurs
      • parse

        public static CSVParser parse​(java.lang.String string,
                                      CSVFormat format)
                               throws java.io.IOException
        Creates a parser for the given String.
        Parameters:
        string - a CSV string. Must not be null.
        format - the CSVFormat used for CSV parsing. Must not be null.
        Returns:
        a new parser
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either string or format are null.
        java.io.IOException - If an I/O error occurs
      • parse

        public static CSVParser parse​(java.net.URL url,
                                      java.nio.charset.Charset charset,
                                      CSVFormat format)
                               throws java.io.IOException
        Creates a parser for the given URL.

        If you do not read all records from the given url, you should call close() on the parser, unless you close the url.

        Parameters:
        url - a URL. Must not be null.
        charset - the charset for the resource. Must not be null.
        format - the CSVFormat used for CSV parsing. Must not be null.
        Returns:
        a new parser
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either url, charset or format are null.
        java.io.IOException - If an I/O error occurs
      • addRecordValue

        private void addRecordValue​(boolean lastRecord)
      • close

        public void close()
                   throws java.io.IOException
        Closes resources.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException - If an I/O error occurs
      • getCurrentLineNumber

        public long getCurrentLineNumber()
        Returns the current line number in the input stream.

        ATTENTION: If your CSV input has multi-line values, the returned number does not correspond to the record number.

        Returns:
        current line number
      • getHeaderMap

        public java.util.Map<java.lang.String,​java.lang.Integer> getHeaderMap()
        Returns a copy of the header map that iterates in column order.

        The map keys are column names. The map values are 0-based indices.

        Returns:
        a copy of the header map that iterates in column order.
      • getRecordNumber

        public long getRecordNumber()
        Returns the current record number in the input stream.

        ATTENTION: If your CSV input has multi-line values, the returned number does not correspond to the line number.

        Returns:
        current record number
      • getRecords

        public java.util.List<CSVRecord> getRecords()
                                             throws java.io.IOException
        Parses the CSV input according to the given format and returns the content as a list of CSVRecords.

        The returned content starts at the current parse-position in the stream.

        Returns:
        list of CSVRecords, may be empty
        Throws:
        java.io.IOException - on parse error or input read-failure
      • initializeHeader

        private java.util.Map<java.lang.String,​java.lang.Integer> initializeHeader()
                                                                                  throws java.io.IOException
        Initializes the name to index mapping if the format defines a header.
        Returns:
        null if the format has no header.
        Throws:
        java.io.IOException - if there is a problem reading the header or skipping the first record
      • isClosed

        public boolean isClosed()
        Gets whether this parser is closed.
        Returns:
        whether this parser is closed.
      • iterator

        public java.util.Iterator<CSVRecord> iterator()
        Returns an iterator on the records.

        IOExceptions occurring during the iteration are wrapped in a RuntimeException. If the parser is closed a call to next() will throw a NoSuchElementException.

        Specified by:
        iterator in interface java.lang.Iterable<CSVRecord>
      • nextRecord

        CSVRecord nextRecord()
                      throws java.io.IOException
        Parses the next record from the current point in the stream.
        Returns:
        the record as an array of values, or null if the end of the stream has been reached
        Throws:
        java.io.IOException - on parse error or input read-failure