Skip to content

Check for existence of output file on reducing step #539

Closed
@delucchi-cmu

Description

@delucchi-cmu

Dask is weird, and will re-run a whole batch of tasks if one fails (or the worker needs to restart for any reason).

This resulted in a bunch of error messages on a recent very-large-import run, that were ultimately not really errors:

Task:  <Task 'reduce_pixel_shards-c53b3b8c9aa4e70e44dcdfa7b7ff9801' reduce_pixel_shards(, ...)>
Exception: "FileNotFoundError('/data3/epyc/data3/hats/skymapper/photometry/intermediate/order_6/dir_20000/pixel_21738')"

The output file (dataset/Norder=6/Dir=20000/Npix=21738.parquet) does in fact exist in the final catalog, and everything else about the import pipeline succeeded. This was mostly just annoying and made me think that the pipeline had failed when it didn't.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinginterfaceUser-friendliness, ease of use, API

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions