Skip to content

Failure when using reader class defined in a Jupyter notebook #542

Open
@hombit

Description

@hombit

Bug report

hats-import fails when I try to use a reader class defined in a notebook.

from hats_import.catalog.arguments import ImportArguments
from dask.distributed import Client
from hats_import.pipeline import pipeline_with_client
from hats_import.catalog.file_readers import CsvReader

columns = "object_id,raj2000,dej2000,e_raj2000,e_dej2000,smss_j,mean_epoch,rms_epoch,glon,glat,flags,nimaflags,ngood,u_flags,u_nimaflags,u_ngood,u_nclip,v_flags,v_nimaflags,v_ngood,v_nclip,g_flags,g_nimaflags,g_ngood,g_nclip,r_flags,r_nimaflags,r_ngood,r_nclip,i_flags,i_nimaflags,i_ngood,i_nclip,z_flags,z_nimaflags,z_ngood,z_nclip,class_star,chi2_psf,flags_psf,radius_petro,mean_fwhm,u_psf,e_u_psf,u_petro,e_u_petro,u_apc05,e_u_apc05,u_mmvar,v_psf,e_v_psf,v_petro,e_v_petro,v_apc05,e_v_apc05,v_mmvar,g_psf,e_g_psf,g_petro,e_g_petro,g_apc05,e_g_apc05,g_mmvar,r_psf,e_r_psf,r_petro,e_r_petro,r_apc05,e_r_apc05,r_mmvar,i_psf,e_i_psf,i_petro,e_i_petro,i_apc05,e_i_apc05,i_mmvar,z_psf,e_z_psf,z_petro,e_z_petro,z_apc05,e_z_apc05,z_mmvar,self_id1,self_dist1,self_id2,self_dist2,self_id3,self_dist3,cnt_self_15,ebmv_sfd,ebmv_gnilc,ebmv_g_err,gaia_dr3_id1,gaia_dr3_dist1,gaia_dr3_id2,gaia_dr3_dist2,cnt_gaia_dr3_15,twomass_key,twomass_dist,allwise_cntr,allwise_dist,catwise_id,catwise_dist,refcat2_id,refcat2_dist,ps1_dr1_id,ps1_dr1_dist,galex_guv_id,galex_guv_dist,vhs_dr6_id,vhs_dr6_dist,ls_dr9_id,ls_dr9_dist,des_dr2_id,des_dr2_dist,nsc_dr2_id,nsc_dr2_dist,splus_dr3_id,splus_dr3_dist".split(',')

class MyCsvReader(CsvReader):
    pass

args = ImportArguments(
    ra_column="raj2000",
    dec_column="dej2000",
    input_file_list=[
        "/data3/epyc/data3/hats/raw/skymapper/photometry/SMSS_DR4.master_sample.csv",
    ],
    file_reader=MyCsvReader(columns=columns),
    output_artifact_name="tmp",
    output_path="/tmp",
)
with Client(n_workers=1) as client:
    pipeline_with_client(args, client)
Failed MAPPING stage with file /data3/epyc/data3/hats/raw/skymapper/photometry/SMSS_DR4.master_sample.csv
  worker address: tcp://127.0.0.1:39734
Can't get attribute 'MyCsvReader' on <module '__main__' (built-in)>

That looks as an expected behavior with pickle, but cloudpickle may help. It is a popular pickle drop-in replacement, which could be faster and could deserialize local "code" objects, but does require the same Python interpreter version for that. See this NB for a demonstration

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions