CsvTitan is a high-performance C++ library designed for handling extremely large CSV files with SQL-like functions. The primary focus of CsvTitan is to improve processing speed for large datasets. The library is written entirely in pure standard C++ and does not use any third-party libraries.
-
Sort (Completed)
- Efficiently sorts large CSV files.
-
Join (In Progress)
- Enables joining multiple CSV files based on common columns.
-
Filter with Simple Queries (In Progress)
- Provides functionality to filter rows using simple query syntax.
-
Pivot (In Progress)
- Allows pivoting of data for summarization and analysis.
-
Unit (In Progress)
- Comprehensive unit operations for CSV file manipulations.
- Int (Completed)
- Double (Completed)
- String (In Development)
- Datetime (In Development)
CsvTitan uses CMake for building the library.
CsvTitan uses Gtest for unit testing.
- Data File: 2021_Yellow_Taxi_Trip_Data.csv (~33 million records)
- Time Taken: Approximately 7 minutes 20 seconds
Contributions are welcome! Please fork the repository and submit a pull request.
CsvTitan is licensed under the MIT License.
Feel free to reach out if you have any questions or need further assistance. Happy coding!