Skip to content
This repository was archived by the owner on Jan 12, 2024. It is now read-only.
This repository was archived by the owner on Jan 12, 2024. It is now read-only.

Automatically disable caching of local data catalog sources #3

@zaneselvans

Description

@zaneselvans

Reading parquet files which are stored on the local filesystem through the current PUDL catalog still results in caching. This slows things down dramatically, and quickly uses an enormous amount of disk space. Especially in development when we've got data that we've just generated locally it could be nice to be working with it using the same mechanism as remote data (the data catalog), but not if we end up with a bunch of unnecessary caching happening continuously in the background.

Identify a way to disable caching when we're working with local data. Ideally this would be done automatically without the user having to think about it. Maybe it's as simple as making the simplecache:: prefix to urlpath conditional based on the value of PUDL_INTAKE_PATH using Jinja templating features?

If that's not possible then maybe caching can be turned off with an argument that's passed to the data source by the user.

Metadata

Metadata

Assignees

Labels

inframundointakeIntake data catalogsperformanceMake data go faster by using less memory, disk, network, compute, etc.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions