You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all, I'm trying to implement a diffusion kernel but getting unexpected behaviour when running on a computing cluster. I would appreciate any ideas! Every time I run the code on my local machine I get a different set of trajectories, as expected due to the random walk implementation. If instead I run the script as a task distributed via slurm on a computing cluster, I get identical trajectories every time i.e. the same random walks. It appears as though the random number generator is getting seeded somewhere along the line but I'm not sure where.
Both systems are running parcels version 3.1.4. I've dug around for similar issues (eg #1008) but in my case the random number generator is still working, it's just happens to give the same sequence every time.
MWE
importparcelsimportnumpyasnpimportmathfromosimportenvironfromparcelsimportParcelsRandom# ensure gcc compilerenviron['CC'] ='gcc'# get dummy dataexample_dataset_folder=parcels.download_example_dataset("MovingEddies_data")
# define diffusion kerneldefDiffusionUniform2D(particle, fieldset, time): # pragma: no coverdWx=ParcelsRandom.normalvariate(0, math.sqrt(math.fabs(particle.dt)))
dWy=ParcelsRandom.normalvariate(0, math.sqrt(math.fabs(particle.dt)))
bx=math.sqrt(2*fieldset.Kh_zonal[particle])
by=math.sqrt(2*fieldset.Kh_meridional[particle])
particle_dlon+=bx*dWx# noqaparticle_dlat+=by*dWy# noqa# function to set up fieldsetdefget_fieldset():
kh_zonal=5e3# in m^2/skh_meridional=5e3# in m^2/sfieldset=parcels.FieldSet.from_parcels(f"{example_dataset_folder}/moving_eddies")
fieldset.add_constant_field('Kh_zonal', kh_zonal, mesh='flat')
fieldset.add_constant_field('Kh_meridional', kh_meridional, mesh='flat')
returnfieldset# funtion to set up particle setdefget_pset(fieldset):
pset=parcels.ParticleSet.from_list(
fieldset=fieldset, # the fields on which the particles are advectedpclass=parcels.JITParticle, # the type of particles (JITParticle or ScipyParticle)lon=[3.3e5, 3.3e5], # a vector of release longitudeslat=[1e5, 2.8e5], # a vector of release latitudes
)
returnpset# function to set up output_filedefget_output_file(pset, name="tempParticles.zarr"):
output_file=pset.ParticleFile(
name=name, # the file nameoutputdt=np.timedelta64(20*60, 's'), # the time step of the outputs
)
returnoutput_filekernels= [parcels.AdvectionRK4, DiffusionUniform2D]
## Multiple runs within one script# filename = "tempParticles.zarr"# fig, ax = plt.subplots()# ax.set_xlabel("Zonal distance [m]")# ax.set_ylabel("Meridional distance [m]")# for i in range(5):# fieldset = get_fieldset()# pset = get_pset(fieldset)# output_file = get_output_file(pset, filename)# pset.execute(# kernels, # simply combine the Kernels in a list# runtime=np.timedelta64(2, 'D'),# dt=np.timedelta64(300, 's'),# output_file=output_file,# )# ds = xr.open_zarr(filename)# ax.plot(ds.lon.T, ds.lat.T, ".-")# plt.show()# single run modefilename="rngtest_hpc_singlerun1.zarr"fieldset=get_fieldset()
pset=get_pset(fieldset)
output_file=get_output_file(pset, filename)
pset.execute(
kernels, # simply combine the Kernels in a listruntime=np.timedelta64(2, 'D'),
dt=np.timedelta64(300, 's'),
output_file=output_file,
)
Output on local machine
Running this several times locally gives me a different set of trajectories each time, the code is working as expected:
Output on hpc cluster
Running the commented-out loop gives a different set of trajectories each iteration (rngtest_hpc_multi1.zarr, rngtest_hpc_multi2.zarr), as expected, but executing the script multiple times gives the same trajectories each time. I've tested executing two copies of the script sequentially in the same slurm sbatch (rngtest_hpc_singlerun1.zarr, rngtest_hpc_singlerun2.zarr), and running again later in a second sbatch (rngtest_hpc_singlerun3.zarr).
It's not clear to me where the seed is being set—I guess it must come from the execution environment or am I missing something obvious? As a work around I guess I can seed the rng from a numpy rng, which doesn't seem to exhibit the same behaviour.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Question
Hi all, I'm trying to implement a diffusion kernel but getting unexpected behaviour when running on a computing cluster. I would appreciate any ideas! Every time I run the code on my local machine I get a different set of trajectories, as expected due to the random walk implementation. If instead I run the script as a task distributed via slurm on a computing cluster, I get identical trajectories every time i.e. the same random walks. It appears as though the random number generator is getting seeded somewhere along the line but I'm not sure where.
Both systems are running parcels version 3.1.4. I've dug around for similar issues (eg #1008) but in my case the random number generator is still working, it's just happens to give the same sequence every time.
MWE
Output on local machine
Running this several times locally gives me a different set of trajectories each time, the code is working as expected:
Output on hpc cluster
Running the commented-out loop gives a different set of trajectories each iteration (rngtest_hpc_multi1.zarr, rngtest_hpc_multi2.zarr), as expected, but executing the script multiple times gives the same trajectories each time. I've tested executing two copies of the script sequentially in the same slurm sbatch (rngtest_hpc_singlerun1.zarr, rngtest_hpc_singlerun2.zarr), and running again later in a second sbatch (rngtest_hpc_singlerun3.zarr).
It's not clear to me where the seed is being set—I guess it must come from the execution environment or am I missing something obvious? As a work around I guess I can seed the rng from a numpy rng, which doesn't seem to exhibit the same behaviour.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions