Skip to content

Latest commit

 

History

History
432 lines (358 loc) · 22 KB

README.md

File metadata and controls

432 lines (358 loc) · 22 KB

VBA-CSV interface

GitHub GitHub release (latest by date) DOI Github All Releases Follow Mentioned in Awesome VBA

Introductory words

The most powerful and comprehensive CSV/TSV/DSV data management library for VBA, providing parsing/writing capabilities compliant with RFC-4180 specifications and a complete set of tools for manipulating records and fields: dedupe, sort and filter records; rearrange, shift, merge and split fields. Is your data spread over two or more CSV files? Don't worry, here you will find Left, Right and Inner joins, and much more!

Advantages

  • RFC-4180 specs compliant.
  • Stable. Fully Test Driven Developed (TDD) library, (69/69 test passed), that includes 650+ line of code for testing. See VBA test library by Tim Hall.
  • Fast. Writes and reads files at the highest speed.
  • Memory-friendly. Files are processed using a custom stream technique, only 0.5MB are in memory at a time.
  • Robust. The library is not just a simple parser and writer, it is also a CSV data editor/manager.
  • UTF-8 encoding support. Do you have a CSV file, perhaps in chinese or some other foreign cyrillic language, downloaded from the Internet? This library is made to help you deal with it! You will be able to read and write UTF-8 encoded files in an easy way.
  • Easy to use. A few lines of code can do the work!
  • Automatic delimiter sniffer. Don't worry if you forgot the file configuration. The interface has a state-of-the-art methodology for sniffing delimiters. It is based on science!
  • Highly Configurable. User can configure the parser to work with a wide range of CSV files.
  • CSV data subsetting. Split CSV data into a set of files with related data.
  • Like SQL queries on CSV files. Use complex patterns to mimic SQL queries and filter data by criteria (=, <>, >=, <=, & (AND), |(OR)).
  • Flexible. Import only certain range of records from the given file, import fields (columns) by indexes or names, read records in sequential mode.
  • Dynamic Typing support. Turn CSV data field to a desired VBA data type.
  • Multi-level data sorting. Sort CSV imported data over multiple columns using the hyper-fast(100k records per second) Yaroslavskiy Dual-Pivot Quicksort like Java and also other methods like: IntroSort, HeapSort and Merge sort.
  • Microsoft Access compatible. The library has a version for those who feel in comfort working through DAO databases, download from here.

Getting started

If you don't know how to get started with VBA-CSV Interface class, visit the documentation repo for code hints, basic and more in-depth use of the library.

Visit the frequently asked questions section for the most common questions.

Using the Code

This section will attempt to analyze all the capabilities of the CSV interface

Import whole CSV file:

Sub CSVimport()
    Dim CSVint As CSVinterface

    Set CSVint = New CSVinterface
    With CSVint.parseConfig
        .path = "C:\Sample.csv"                ' Full path to the file, including its extension.
        .dialect.fieldsDelimiter = ","         ' Columns delimiter
        .dialect.recordsDelimiter = vbCrLf     ' Rows delimiter
    End With
    With CSVint
        .ImportFromCSV .parseConfig             ' Import the CSV to internal object
    End With
End Sub

Now suppose from the file "Sample.csv" the user only requires to import a specific range of records. It is possible to write a code like the one shown below:

Sub CSVimportRecordsRange()
    Dim CSVint As CSVinterface

    Set CSVint = New CSVinterface
    With CSVint.parseConfig
        .path = "C:\Sample.csv"                ' Full path to the file, including its extension.
        .dialect.fieldsDelimiter = ","         ' Columns delimiter
        .dialect.recordsDelimiter = vbCrLf     ' Rows delimiter
        .startingRecord = 10                   ' Start import on the tenth record
        .endingRecord = 20                     ' End of importation in the 20th record
    End With
    With CSVint
        .ImportFromCSV .parseConfig             ' Import the CSV to internal object
    End With
End Sub

If the user wants to sort the imported data, a code like the following can be written:

Sub CSVimportAndSort()
    Dim CSVint As CSVinterface

    Set CSVint = New CSVinterface
    With CSVint.parseConfig
        .path = "C:\Sample.csv"               ' Full path to the file, including its extension.
        .dialect.fieldsDelimiter = ","                ' Columns delimiter
        .dialect.recordsDelimiter = vbCrLf            ' Rows delimiter
    End With
    With CSVint
        .ImportFromCSV .parseConfig           ' Import the CSV to internal object
        .Sort SortingKeys:=-1                 ' Sort imported data on first column is descending order
    End With
End Sub

CSV data are mainly treated as text strings, what if the user wants to do some calculations on the data obtained from a given file? In this situation, the user can change the behavior of the parser to work in dynamic typing mode. Here's an example:

Sub CSVimportAndTypeData()
    Dim CSVint As CSVinterface

    Set CSVint = New CSVinterface
    With CSVint.parseConfig
        .path = "C:\Sample.csv"                       ' Full path to the file, including its extension.
        .dialect.fieldsDelimiter = ","                ' Columns delimiter
        .dialect.recordsDelimiter = vbCrLf            ' Rows delimiter
        .dynamicTyping = True                         ' Enable dynamic typing mode
        '@---------------------------------------------------------
        ' Configure dynamic typing
        .DefineTypingTemplate TypeConversion.ToDate, _
                                TypeConversion.ToLong, _
                                TypeConversion.ToDouble
        .DefineTypingTemplateLinks 6, _
                                    7, _
                                    10
        ' The dynamic typing mode will perform the following:
        '      * Over column 6 ---> String To Date data Type conversion
        '      * Over column 7 ---> String To Long data Type conversion
        '      * Over column 10 ---> String To Double data Type conversion
    End With
    With CSVint
        .ImportFromCSV .parseConfig             ' Import the CSV to internal object
    End With
End Sub

The quote character, used for escape fields, can be defined as one of them, according to an enumeration:

Sub SetQuoteChar()
    Dim CSVint As CSVinterface

    Set CSVint = New CSVinterface
    With CSVint.parseConfig.dialect
        .quoteToken = QuoteTokens.DoubleQuotes  ' 2 = ["] (Default)
        '.quoteToken = QuoteTokens.Apostrophe   ' 1 = [']
        '.quoteToken = QuoteTokens.Tilde        ' 3 = [~]
    End With
End Sub

Once the data is imported and saved to the internal object, the user can access it in the same way as a standard VBA array. An example would be:

Sub LoopData(ByRef CSVint As CSVinterface)
    With CSVint
        Dim iCounter As Long
        Dim cRecord() As Variant              ' Records are stored as a one-dimensional array.
        Dim cField As Variant
        
        For iCounter = 0 To CSVint.count - 1
            cRecord() = .item(iCounter)       ' Retrieves a record
            cField = .item(iCounter, 1)       ' Retrieves the 2nd field of the current record
        Next
    End With
End Sub

In addition, the user can use the CSVArrayList class to access the contents using code like this:

Sub LoopData2(ByRef CSVint As CSVinterface)
    With CSVint
        Dim iCounter As Long
        Dim cRecord() As Variant                    ' Records are stored as a one-dimensional array.
        Dim cField As Variant
        
        For iCounter = 0 To CSVint.count - 1
            cRecord() = .items.item(iCounter)       ' Retrieves a record
            cField = .items.item(iCounter)(1)       ' Retrieves the 2nd field of the current record
        Next
    End With
End Sub

However, it is sometimes disadvantageous to store data in containers other than VBA arrays. This becomes especially noticeable when it is required to write the information stored in Excel's own objects, such as spreadsheets, or VBA user forms, the case of list boxes, which allow to be filled in a single instruction using arrays. Then, the user can copy the information from the internal object using code like this:

Sub DumpData(ByRef CSVint As CSVinterface)
    Dim oArray() As Variant
    With CSVint
        .DumpToArray oArray            ' Dump the internal data into a two-dimensional array
        .DumpToJaggedArray oArray      ' Dump the internal data into a jagged array
        oArray = .items.items          ' Dump the internal data into a jagged array
        .DumpToSheet                   ' Dump the internal data into a new sheet
                                       ' using ThisWorkbook
        '@-------------------------------------------------------------------
        ' *NOTE: ONLY AVAILABLE FOR THE ACCESS VERSION OF THE CSV INTERFACE
        ' Dump the internal data into the Table1 in oAccessDB database.
        ' The method would create indexes in the 2nd and 3th fields.
        .DumpToAccessTable oAccessDB, _
                           "Table1", _
                            2, 3
    End With
End Sub

So far, in the examples addressed, the user has been allowed to choose between two actions:

  1. Import ALL records contained in a CSV file.
  2. Import a recordset, starting at record X and ending at record Y.

In both options, the user is forced to import all fields (columns) present in the file. Most CSV file parsers only offer the first option, but what if the user wants to save only the information that is relevant for a particular purpose, and what if user wants to store in memory only the records that meet a certain set of requirements?

An user may need to import 2 of X columns from a CSV file, in this case, the user can use something like:

Sub CSVimportDesiredColumns()
    Dim CSVint As CSVinterface

    Set CSVint = New CSVinterface
    With CSVint.parseConfig
        .path = "C:\Sample.csv"              ' Full path to the file, including its extension.
        .dialect.fieldsDelimiter = ","       ' Columns delimiter
        .dialect.recordsDelimiter = vbCrLf   ' Rows delimiter
    End With
    With CSVint
        .ImportFromCSV .parseConfig, _
                        1, "Revenue"         ' Import 1st and "Revenue" fields ONLY
    End With
End Sub

So, OK, let's imagine now that an user wants to apply some logic before saving the data, in which case they can step through the records in the CSV file one by one, using the sequential reader, as shown in the following example:

Sub CSVsequentialImport()
    Dim CSVint As CSVinterface
    Dim csvRecord As CSVArrayList
    
    Set CSVint = New CSVinterface
    With CSVint.parseConfig
        .path = "C:\Sample.csv"             ' Full path to the file, including its extension.
        .dialect.fieldsDelimiter = ","      ' Columns delimiter
        .dialect.recordsDelimiter = vbCrLf  ' Rows delimiter
    End With
    With CSVint
        .OpenSeqReader .parseConfig, _
                        1, "Revenue"        ' Import the 1st and "Revenue" fields using
                                            ' seq. reader
        Do
            Set csvRecord = .GetRecord
            '//////////////////////////////////////////////
            'Implement your logic here
            '//////////////////////////////////////////////
        Loop While Not csvRecord Is Nothing   ' Loop until the end of the file is reached
    End With
End Sub

Is there a way to sequentially fetch a set of records at a time instead of a single record? Currently, there is no built-in method to do that with a single instruction, as in the examples above, but with a few extra lines of code and the tools provided by the library, it is possible to achieve that goal. This is illustrated in the following example where the CSV file is streamed:

Sub CSVimportChunks()
    Dim CSVint As CSVinterface
    Dim StreamReader As CSVTextStream
            
    Set CSVint = New CSVinterface
    With CSVint.parseConfig
        .dialect.fieldsDelimiter = ","                      ' Columns delimiter
        .dialect.recordsDelimiter = vbCrLf                  ' Rows delimiter
    End With
    Set StreamReader = New CSVTextStream
    With StreamReader
        .endStreamOnLineBreak = True                        ' Instruct to find line breaks
        .OpenStream "C:\Sample.csv"                         ' Connect to CSV file
        Do
            .ReadText                                       ' Read a CSV chunk
            CSVint.ImportFromCSVString .bufferString, _
                                    CSVint.parseConfig, _
                                    1, "Revenue"            ' Import a set of records
            '//////////////////////////////////////
            'Implement your logic here
            '//////////////////////////////////////
        Loop While Not .atEndOfStream                       ' Continue until reach
                                                            ' the end of the CSV file.
    End With
    Set CSVint = Nothing
    Set StreamReader = Nothing
End Sub

So far, it has been outlined the way in which you can import the records from a CSV file sequentially, the following example shows how to filter the records, in a like SQL way, according to whether they meet a criterion set by the user:

Sub QueryCSV()
    Dim CSVint As CSVinterface
    Dim path As String
    Dim FilteredData As CSVArrayList
    
    Set CSVint = New CSVinterface
    path = Environ("USERPROFILE") & "\Desktop\Demo_100000records.csv"
    CSVint.parseConfig.Headers = False                                      		'The file has no header record/row
    CSVint.parseConfig.path = path
    If path <> vbNullString Then
        Set FilteredData = CSVint.Filter("f1='Asia' & f9>20 & f9<=50", path) 		'Select "Units sold" greater than 20 and less or 
																											'equal to 50 from Asian customers
        Set CSVint = Nothing
        Set FilteredData = Nothing
    End If
End Sub

In some situations, we may encounter a CSV file with a combination of vbCrLf, vbCr and vbLf as record delimiters. This can happen for many reasons, but the most common is by adding data to an existing CSV file without checking the configuration of the previously stored information. These cases will break the logic of many robust CSV parsers, including the demo of the 737K weekly downloaded Papa Parse. The next example shows how an user can import CSV files with mixed line break as record delimiter, an option that uses the multiEndOfLineCSV property of the parseConfig object to work with these special CSV files.

Sub ImportMixedLineEndCSV()
    Dim CSVint As CSVinterface
            
    Set CSVint = New CSVinterface
    With CSVint.parseConfig
        .path = "C:\Mixed Line Breaks.csv"
        .dialect.fieldsDelimiter = ","        ' Columns delimiter
        .dialect.recordsDelimiter = vbCrLf    ' Rows delimiter
        .multiEndOfLineCSV = True             ' All delimiters will be turned into vbLf
    End With
    With CSVint
        .ImportFromCSV .parseConfig
    End With
    Set CSVint = Nothing
End Sub

In all the above examples, an implicit assumption has been made, and that is that the user knows the configuration of the CSV file to be imported, so the question arises: can it be possible that the user does not know the configuration of the file to be imported? It is certainly possible, so how can the CSV interface help in these cases?

The tool includes a utility to sniff/guess field delimiters, record delimiters and quote character. This can be done with code like the following:

Sub DelimitersGuessing()
    Dim CSVint As CSVinterface

    Set CSVint = New CSVinterface
    With CSVint.parseConfig
        .path = "C:\Sample.csv"           ' Full path to the file, including its extension.
    End With
    With CSVint
        Set .parseConfig.dialect = .SniffDelimiters(.parseConfig)    ' Try to guess delimiters and save to internal
                                                                     ' parser configuration object.
        '@--------------------------------------------------------------
        ' *NOTE: the user can also create a custom configuration object
        '        and try to guess the delimiter with it.
    End With
End Sub

With VBA CSV interface, many things can be done, for example, an user can perform like SQL joins such as:

Sub JoinTwoCSV()
    Dim csv1 As CSVinterface
    Dim csv2 As CSVinterface
    Dim rTable As CSVArrayList
    
    '@--------------------------------------------------
    ' Import data from CSV files
    Set csv1 = New CSVinterface
    With csv1
        .parseConfig.delimitersGuessing = True
        .parseConfig.path = Environ("USERPROFILE") & "\Desktop\Sales details.csv"
        .ImportFromCSV .parseConfig
    End With
    Set csv2 = New CSVinterface
    With csv2
        .parseConfig.delimitersGuessing = True
        .parseConfig.path = Environ("USERPROFILE") & "\Desktop\Sales revenues.csv"
        .ImportFromCSV .parseConfig
    End With
    '@--------------------------------------------------
    ' Perform a like SQL Left join on imported data.
    With csv1.items
    	 ' Join 1st, "Region", and 3th to 5th fields of left table with "Total_Revenue" field from the right table,
	 ' on "Order_ID" of both tables if Total_Revenue is  greater than 3000000
	 ' and Region is equal to "Central America and the Caribbean"
        Set rTable = .LeftJoin(csv1.items, csv2.items, _
                   	"{1,Region,3-5};{Total_Revenue}", _
                	"Order_ID;Order_ID", _
                	"t2.Total_Revenue>3000000 & t1.Region='Central America and the Caribbean'")
			
        ' The "*" symbol can be used to retrieve all fields
        ' from a given table like this.
        
            'Set rTable = .LeftJoin(csv1.items, csv2.items, _
                        "{*};{Total_Revenue}", _
                        "Order_ID;Order_ID", _
                        "t2.Total_Revenue>3000000 & t1.Region='Central America and the Caribbean'")
                        
    End With
    '@--------------------------------------------------
    ' Write the result in a spreadsheet.
    csv1.DumpToSheet DataSource:=rTable
End Sub

Contributing

In order to contribute within this project, please see the guidance for contributing.

Benchmark

The benchmark results for VBA-CSV Interface are available at this site.

Dependencies

The library is composed of the following class modules:

All dependencies are written in pure VBA.

Limitations

Visit this site in order to known around CSV file size considerations.

Licence

Copyright (C) 2020-2024 W. García.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/gpl-3.0.html.