Description
Dear IntroUnet developers,
We have other problems for evaluation on larger chromosomes (windows).
For a successful replication of results shown in publication
, it is necessary to simulate 1Mb chromosomes from the ArchIE model.
It is not indicated in the readme how to do this; however minor modification of msmodified
produce the required simulations. The further steps are, however, more difficult:
There are some candidate functions for processing larger chromosomes simulated with msmodified
in the src/data
folder, chiefly format_genomes.py
and format_windowed.py
.
Format_genomes.py
seemingly works only with SLiM input.
Format_windowed.py
loads msmodified data, but non-existing classes/functions are imported (it seems that, e.g., an older version of introunet
is referenced):
introNets/src/data/format_windowed.py
Line 15 in f204924
introNets/src/data/format_windowed.py
Lines 112 to 114 in f204924
Format_unet_data_h5
does not exist and also in the rest of the files no Formatter
can be found.The class which probably should be referenced is located in
data_functions.py
and is currently named TwoPopAlignmentFormatter
, not Formatter
(and the format
function of the TwoPopAlignmentFormatter
class does not accept a zero
parameter, rather it has to be include_zeros
).One can either modify
format_genomes.py
in such a way that the formatting function for msmodified
is called, or change the function names of format_windowed.py
With such modifications, one gets at least an apparently reasonable output in both cases.
Nonetheless, the prediction step is not successful:
It is almost impossible to get
evaluate_unet_windowed.py
to work which seems to be (it is not written in the readme) the appropriate function for predicting and evaluating.Using input created with
msmodified
and prepared with the modified format_windowed.py
(which seems to be the most appropriate function for formatting the larger chromosomes), evaluate_unet_windowed.py
throws errors related to the numpy arrays containing the predicted values (index out of bounds).It may be that only minor modifications of the parameters etc. are necessary, but in any case, this severely limits usability and, in particular, hinders the
replication of results shown in publication
(it should be stressed again that in the readme it is not indicated which functions to use for formatting and predicting/evaluating of chromosome windows).
Furthermore, the default archie.config
is not usable following the readme-instructions.
To successfully apply the training command,
python3 src/models/train.py --config training_configs/archie.config --ifile archie_euclidean.hdf5 --odir training_results/archie_i1
one has to make minor modifications to the default archie.config
file: n_steps
has to be set to None
.
Besides, it should be noted that the default value of pos_bce_logits_weight
(which should be equal to the ratio between positive and negative examples) differs fundamentally from the values one gets from a msmodified
simulation with default parameters on average (approx. 16-17).