Support for win32 #32

TommyJW · 2018-10-18T14:43:27Z

Looks like the dependency modisco/cluster/phenograph/core.py uses an old codebase:

#copied from https://github.com/jacoblevine/PhenoGraph/blob/master/phenograph/core.py

The most recent update from that repo includes support for win32 systems.

I did try to modify a local copy of:
modisco/cluster/core.py -- edit, not sure this file was modified, not near system right now
modisco/cluster/init.py
modisco/affinitymat/transformers.py

to import a copy of jacoblevine/phenograph installed from pip instead (for the up to date copy) but the sample TF MoDISco TAL GATA breaks due to other functions missing from the phenograph .py's

Could we update modisco to require jacoblevine/phenograph as a dependency and add hooks or overrides instead of extending the .pys directly through copy/paste?

Alternatively is there a known list of all functions that were added to pheonograph's files that I could use to build this?

TommyJW · 2018-10-18T14:50:16Z

Traceback from un-modified project

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-ba9250154fe7> in <module>()
----> 1 x.runModisco()
      2 print("DONE")

<ipython-input-2-77f7eee30b6d> in runModisco(self)
    238                 contrib_scores={"task": self.scores},
    239                 hypothetical_contribs={"task": self.hyp_contrib_scores},
--> 240                 one_hot=self.X_test)
    241 
    242     def exportKerasModelJSON(self):

~\Anaconda3\lib\site-packages\modisco\tfmodisco_workflow\workflow.py in __call__(self, task_names, contrib_scores, hypothetical_contribs, one_hot)
    308                 other_comparison_track_names=[])
    309 
--> 310             seqlets_to_patterns_result = seqlets_to_patterns(metacluster_seqlets)
    311             metacluster_idx_to_submetacluster_results[metacluster_idx] =\
    312                 SubMetaclusterResults(

~\Anaconda3\lib\site-packages\modisco\tfmodisco_workflow\seqlets_to_patterns.py in __call__(self, seqlets)
    598                 sys.stdout.flush()
    599 
--> 600             cluster_results = clusterer(density_adapted_affmat)
    601             cluster_results_sets.append(cluster_results)
    602             num_clusters = max(cluster_results.cluster_indices+1)

~\Anaconda3\lib\site-packages\modisco\cluster\core.py in __call__(self, orig_affinity_mat)
     99         all_start = time.time()
    100         if (self.affmat_transformer is not None):
--> 101             affinity_mat = self.affmat_transformer(orig_affinity_mat)
    102         else:
    103             affinity_mat = orig_affinity_mat

~\Anaconda3\lib\site-packages\modisco\affinitymat\transformers.py in __call__(self, affinity_mat)
     93 
     94     def __call__(self, affinity_mat):
---> 95         return self.func(affinity_mat)
     96 
     97 

~\Anaconda3\lib\site-packages\modisco\affinitymat\transformers.py in <lambda>(x)
     84     def chain(self, other_affmat_post_processor):
     85         return AdhocAffMatTransformer(
---> 86                 func = lambda x: other_affmat_post_processor(self(x))) 
     87 
     88 

~\Anaconda3\lib\site-packages\modisco\affinitymat\transformers.py in __call__(self, affinity_mat)
    364                 parallel_threads=self.parallel_threads,
    365                 seed=self.seed,
--> 366                 verbose=self.verbose)

~\Anaconda3\lib\site-packages\modisco\cluster\phenograph\cluster.py in runlouvain_average_runs_given_graph(graph, n_runs, level_to_return, parallel_threads, verbose, max_clusters, tic, seed)
    157                     max_clusters=max_clusters, n_runs=n_runs,
    158                     seed=seed, parallel_threads=parallel_threads,
--> 159                     verbose=verbose)
    160     if (tic is not None):
    161         print("PhenoGraph complete in {} seconds".format(time.time() - tic))

~\Anaconda3\lib\site-packages\modisco\cluster\phenograph\core.py in runlouvain_average_runs(filename, n_runs, level_to_return, verbose, max_clusters, seed, parallel_threads)
    383 
    384     (lpath, community_binary, hierarchy_binary) =\
--> 385         get_paths_and_run_convert(filename)
    386 
    387     coocc_count = None

~\Anaconda3\lib\site-packages\modisco\cluster\phenograph\core.py in get_paths_and_run_convert(filename)
    214     else:
    215         raise RuntimeError("Operating system could not be determined or is not supported. "
--> 216                            "sys.platform == {}".format(sys.platform))
    217     # Prepend appropriate path separator
    218     convert_binary = os.path.sep + convert_binary

RuntimeError: Operating system could not be determined or is not supported. sys.platform == win32

AvantiShri · 2018-10-18T17:04:12Z

Oof, it’s actually more complex than that; I had to modify the Louvain community detection binaries for tfmodisco. Phenograph ships precompiled binaries, but I was only able to prepare a precompiled binary for osx and linux because I literally don’t have access to a windows system. I can look into modifying setup.py to compile things on the fly. Part of the problem is that there are 3 binaries, two of which I modified, and 1 of which was modified by the author of phenograph - but the author of phenograph has not released the source code for those modifications (despite multiple requests jacoblevine/PhenoGraph#9), which is why I was relying on shipping the precompiled binaries rather than having the user compile from source.

In the meantime, are you able to compile the code here from source: https://github.com/kundajelab/modisco_louvain? If so you could just update your version of tfmodisco to have the precompiled windows binaries.

AvantiShri · 2018-10-18T17:08:55Z

(Let me know if you are able to compile from source, and I will provide more detailed instructions on how to modify tfmodisco to use the binaries)

akundaje · 2018-10-18T17:51:12Z

Av - I have a windows machine if you want to try it out.

TommyJW · 2018-10-18T18:53:07Z

Due to deadlines I'm working with, I've already started building a linux environment. Assuming that goes well then the current build will suit my needs. However, I wanted to open this issue to help track and raise visibility to the need.

AvantiShri · 2018-10-18T18:59:31Z

Thanks Tommy! If your datasets are not too big, I might also suggest working off a Colab notebook. There are many ways to get your data on a Colab notebook, one of which is to upload it to some publicly available server and then download it in the Colab notebook with curl, e.g.:
!curl -L "https://drive.google.com/uc?authuser=0&id=file_id_goes_here&export=download" > scores_cnn_class1_test.h5

Here's an example where I run TF-MoDISco in a Colab notebook on SVM-derived importance scores: https://github.com/kundajelab/igsvm/blob/master/lsgkmexplain_NFE2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for win32 #32

Support for win32 #32

TommyJW commented Oct 18, 2018 •

edited

Loading

TommyJW commented Oct 18, 2018

AvantiShri commented Oct 18, 2018

AvantiShri commented Oct 18, 2018

akundaje commented Oct 18, 2018

TommyJW commented Oct 18, 2018

AvantiShri commented Oct 18, 2018

Support for win32 #32

Support for win32 #32

Comments

TommyJW commented Oct 18, 2018 • edited Loading

TommyJW commented Oct 18, 2018

AvantiShri commented Oct 18, 2018

AvantiShri commented Oct 18, 2018

akundaje commented Oct 18, 2018

TommyJW commented Oct 18, 2018

AvantiShri commented Oct 18, 2018

TommyJW commented Oct 18, 2018 •

edited

Loading