-
Notifications
You must be signed in to change notification settings - Fork 220
Description
So I hate to write this one, but many of my students are refusing to use RELION, and I'm not convinced the results that they get from other software is sometimes the optimal solution. So I was trying to get a dataset running in RELION after it has been fully run through in cryoSPARC.
Anyway, the issue is that it seems to be fine when the same STAR file is run through C2D but fails in C3D. I suspect that there is something up in how I cooked up the STAR file, like do I need to have the _rlnCtfDataAreCtfPremultiplied
flag? I did try to add that in, but it didn't help. It also seems to be consistent across different machines.
Essentially it makes it through the first iteration and either fails at the maximization step or the expectation step in the 2nd iteration.
Any help would be appreciated, as this issue is going to get more and more as I think RELION still does the best job and many people now just process everything start to finish in cryoSPARC, so I would like to get the bottom of it.
Environment:
- OS: Mint 22.1
- MPI runtime: (Open MPI) 4.1.6
- RELION version: RELION version: 5.0.0-commit-1fdfb9 Precision: BASE=double, CUDA-ACC=single
- Memory: [512 GB]
- GPU: [RTX4000 ada]
Dataset:
- Box size: [e.g. 360 px]
- Pixel size: [e.g. 0.733 Å/px]
- Number of particles: [~500,000]
- Description: [GPCR]
Job options:
- Type of job: [C3D]
- Number of MPI processes: [5]
- Number of threads: [1]
relion_refine_mpi --o Class3D/job001/run --i J55_newoptics.star --ref J55_010_volume_map.mrc --firstiter_cc --trust_ref_size --ini_high 16 --dont_combine_weights_via_disc --pool 30 --pad 1 --ctf --iter 25 --tau2_fudge 4 --particle_diameter 170 --fast_subsets --K 3 --flatten_solvent --zero_mask --strict_highres_exp 4 --blush --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --j 1 --gpu "" --pipeline_control Class3D/job001/
Error message:
munmap_chunk(): invalid pointer
[piastri:1781041] *** Process received signal ***
corrupted double-linked list
[piastri:1781043] *** Process received signal ***
[piastri:1781043] Signal: Aborted (6)
[piastri:1781043] Signal code: (-6)
[piastri:1781041] Signal: Aborted (6)
[piastri:1781041] Signal code: (-6)
[piastri:1781041] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x7936c9c45330]
[piastri:1781041] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x11c)[0x7936c9c9eb2c]
[piastri:1781041] [ 2] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x1e)[0x7936c9c4527e]
[piastri:1781041] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xdf)[0x7936c9c288ff]
[piastri:1781041] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x297b6)[0x7936c9c297b6]
[piastri:1781041] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0xa8ff5)[0x7936c9ca8ff5]
[piastri:1781041] [ 6] /lib/x86_64-linux-gnu/libc.so.6(+0xa947c)[0x7936c9ca947c]
[piastri:1781041] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_free+0xca)[0x7936c9caddfa]
[piastri:1781041] [ 8] /apps/relion/build/bin/relion_refine_mpi(_ZN13MultidimArrayIdE14coreDeallocateEv+0x65)[0x6513f6f9dd65]
[piastri:1781041] [ 9] /apps/relion/build/bin/relion_refine_mpi(_ZN11MlWsumModel4packER13MultidimArrayIdERiS3_b+0x53a)[0x6513f718ef9a]
[piastri:1781041] [10] /apps/relion/build/bin/relion_refine_mpi(_ZN14MlOptimiserMpi22combineAllWeightedSumsEv+0x624)[0x6513f6fc7e34]
[piastri:1781041] [11] /apps/relion/build/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0x665)[0x6513f6fe2c75]
[piastri:1781041] [12] /apps/relion/build/bin/relion_refine_mpi(main+0x81)[0x6513f6f8f021]
[piastri:1781041] [13] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7936c9c2a1ca]
[piastri:1781041] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7936c9c2a28b]
[piastri:1781041] [15] /apps/relion/build/bin/relion_refine_mpi(_start+0x25)[0x6513f6f92385]
[piastri:1781041] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 1781041 on node piastri exited on signal 6 (Aborted).
--------------------------------------------------------------------------
Example of input STAR file:
data_optics
loop_
_rlnVoltage #1
_rlnImagePixelSize #2
_rlnSphericalAberration #3
_rlnAmplitudeContrast #4
_rlnOpticsGroup #5
_rlnImageSize #6
_rlnImageDimensionality #7
_rlnOpticsGroupName #8
300.000000 0.733000 2.700000 0.100000 1 360 2 ptcls_tilt_group0000
300.000000 0.733000 2.700000 0.100000 2 360 2 ptcls_tilt_group0001
300.000000 0.733000 2.700000 0.100000 3 360 2 ptcls_tilt_group0002
300.000000 0.733000 2.700000 0.100000 4 360 2 ptcls_tilt_group0003
data_particles
loop_
_rlnImageName #1
_rlnAngleRot #2
_rlnAngleTilt #3
_rlnAnglePsi #4
_rlnOriginXAngst #5
_rlnOriginYAngst #6
_rlnDefocusU #7
_rlnDefocusV #8
_rlnDefocusAngle #9
_rlnPhaseShift #10
_rlnCtfBfactor #11
_rlnOpticsGroup #12
_rlnRandomSubset #13
_rlnClassNumber #14
000001@J54/extract/009103141063873076924_FoilHole_29471620_Data_29469929_0_20250429_122313_EER_patch_aligned_doseweighted_particles.mrcs 110.093651 102.101463 111.729355 -0.37108 0.364595 22905.933594 22860.035156 268.941101 0.000000 0.000000 1 2 1
000002@J54/extract/009103141063873076924_FoilHole_29471620_Data_29469929_0_20250429_122313_EER_patch_aligned_doseweighted_particles.mrcs -30.59649 98.797394 -96.29035 0.618469 -0.70093 23362.289062 23316.390625 268.941101 0.000000 0.000000 1 1 1