Add Performance Benchmarks for `PartitionedDataset.load()`

## Description

Following recent changes in https://github.com/kedro-org/kedro-plugins/pull/1070, we modified `PartitionedDataset.load()` to always invalidate its internal partition list cache before scanning the filesystem. 

This was necessary to fix bugs (related to Issue [#4164](https://github.com/kedro-org/kedro/issues/4164) and #623) where stale caches caused `load()` to fail, particularly when used with `ParallelRunner`.

While the fix works, always performing a filesystem scan might introduce performance overhead compared to potentially reusing a cached list (even though the cached list could previously be stale). This impact is expected to be negligible for small datasets or fast filesystems but could be noticeable for datasets with a very large number of partitions or those residing on slow/high-latency storage (e.g., S3).

## Possible Implementation

Implement performance benchmarks specifically targeting `PartitionedDataset.load()` to:

- Measure performance difference between the old caching behaviour and the new behaviour (always re-scanning).

- Measure the `load()` time under various conditions:
  - Local filesystem vs. Remote filesystem (e.g., mocked S3).
  - Small number of partitions vs. Very large number of partitions.
  - Repeated `load()` calls on the same dataset instance.

We essentially want to design benchmark scenarios covering the conditions above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Performance Benchmarks for `PartitionedDataset.load()` #1074

Description

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Performance Benchmarks for PartitionedDataset.load() #1074

Description

Description

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add Performance Benchmarks for `PartitionedDataset.load()` #1074