-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem? Please describe.
When using CsvReader and CsvWriter to deserialize to or deserialize from strongly typed objects, ObjectRecordCreator and ObjectRecordWriter are used internally to compile an expression into a delegate at runtime. The compiled delegates are cached for the lifetime of CsvReader and CsvWriter instances, which provides a significant performance improvement over dynamic object deserialization and serialization when reading or writing multiline Csv files.
However, in an opposite scenario, when deserializing or serializing large number of small (possibly single line) Csv files, there is a significant performance drawback, because each new incoming request requires creating a new instance of CsvReader and CsvWriter, which internally compiles a new delegate for each new instance. The delegate compilation is a relatively expensive operation for reading just a single line, and there is also a possible lock contention internally inside the CLR when there are multiple delegates are getting compiled in parallel.
In a scenario when there are just a few data types but large number of incoming requests with a single line Csv payload, most of the compute resources are spent compiling delegates and in rare cases the service was hitting lock contention inside the CLR, causing a significant performance impact due to lock contention.
Describe the solution you'd like
Currently compiled delegate references a specific instance of CsvReader and CsvWriter using ConstantExpression, so the otherwise compatible delegates are not reusable for reading from or writing to a different instance of the reader or writer instances. Changing the delegate to take instance of CsvReader or CsvWriter as a parameter would make it reusable for deserializing from or serializing to the same data types, allowing caching compiled delegates beyond the lifetime of specific reader or writer instance, as long as the caching key takes all possible differences (like changing column order) into account.
Describe alternatives you've considered
For our specific scenario I was able to extend existing ObjectRecordCreator, ObjectRecordWriter, RecordCreatorFactory, RecordWriterFactory, ExpressionManager, and hooking the extended version of these classes through an extended version of ObjectResolver to modify the compiled delegates and cache them for the lifetime of the process, achieving the desired performance improvement of reusing a handful of delegate copies instead of compiling a new copy for each incoming request. The solution works, but feels a bit hacky, requiring careful review of any future library upgrades. It would be nice to have the same caching support built into the library to support high throughput processing of numerous small Csv files, similar to current performance improvements already available for processing large multiline Csv files.