Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for win32 #32

Open
TommyJW opened this issue Oct 18, 2018 · 6 comments
Open

Support for win32 #32

TommyJW opened this issue Oct 18, 2018 · 6 comments

Comments

@TommyJW
Copy link

TommyJW commented Oct 18, 2018

Looks like the dependency modisco/cluster/phenograph/core.py uses an old codebase:

#copied from https://github.com/jacoblevine/PhenoGraph/blob/master/phenograph/core.py

The most recent update from that repo includes support for win32 systems.

I did try to modify a local copy of:
modisco/cluster/core.py -- edit, not sure this file was modified, not near system right now
modisco/cluster/init.py
modisco/affinitymat/transformers.py

to import a copy of jacoblevine/phenograph installed from pip instead (for the up to date copy) but the sample TF MoDISco TAL GATA breaks due to other functions missing from the phenograph .py's

Could we update modisco to require jacoblevine/phenograph as a dependency and add hooks or overrides instead of extending the .pys directly through copy/paste?

Alternatively is there a known list of all functions that were added to pheonograph's files that I could use to build this?

@TommyJW
Copy link
Author

TommyJW commented Oct 18, 2018

Traceback from un-modified project

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-ba9250154fe7> in <module>()
----> 1 x.runModisco()
      2 print("DONE")

<ipython-input-2-77f7eee30b6d> in runModisco(self)
    238                 contrib_scores={"task": self.scores},
    239                 hypothetical_contribs={"task": self.hyp_contrib_scores},
--> 240                 one_hot=self.X_test)
    241 
    242     def exportKerasModelJSON(self):

~\Anaconda3\lib\site-packages\modisco\tfmodisco_workflow\workflow.py in __call__(self, task_names, contrib_scores, hypothetical_contribs, one_hot)
    308                 other_comparison_track_names=[])
    309 
--> 310             seqlets_to_patterns_result = seqlets_to_patterns(metacluster_seqlets)
    311             metacluster_idx_to_submetacluster_results[metacluster_idx] =\
    312                 SubMetaclusterResults(

~\Anaconda3\lib\site-packages\modisco\tfmodisco_workflow\seqlets_to_patterns.py in __call__(self, seqlets)
    598                 sys.stdout.flush()
    599 
--> 600             cluster_results = clusterer(density_adapted_affmat)
    601             cluster_results_sets.append(cluster_results)
    602             num_clusters = max(cluster_results.cluster_indices+1)

~\Anaconda3\lib\site-packages\modisco\cluster\core.py in __call__(self, orig_affinity_mat)
     99         all_start = time.time()
    100         if (self.affmat_transformer is not None):
--> 101             affinity_mat = self.affmat_transformer(orig_affinity_mat)
    102         else:
    103             affinity_mat = orig_affinity_mat

~\Anaconda3\lib\site-packages\modisco\affinitymat\transformers.py in __call__(self, affinity_mat)
     93 
     94     def __call__(self, affinity_mat):
---> 95         return self.func(affinity_mat)
     96 
     97 

~\Anaconda3\lib\site-packages\modisco\affinitymat\transformers.py in <lambda>(x)
     84     def chain(self, other_affmat_post_processor):
     85         return AdhocAffMatTransformer(
---> 86                 func = lambda x: other_affmat_post_processor(self(x))) 
     87 
     88 

~\Anaconda3\lib\site-packages\modisco\affinitymat\transformers.py in __call__(self, affinity_mat)
    364                 parallel_threads=self.parallel_threads,
    365                 seed=self.seed,
--> 366                 verbose=self.verbose)

~\Anaconda3\lib\site-packages\modisco\cluster\phenograph\cluster.py in runlouvain_average_runs_given_graph(graph, n_runs, level_to_return, parallel_threads, verbose, max_clusters, tic, seed)
    157                     max_clusters=max_clusters, n_runs=n_runs,
    158                     seed=seed, parallel_threads=parallel_threads,
--> 159                     verbose=verbose)
    160     if (tic is not None):
    161         print("PhenoGraph complete in {} seconds".format(time.time() - tic))

~\Anaconda3\lib\site-packages\modisco\cluster\phenograph\core.py in runlouvain_average_runs(filename, n_runs, level_to_return, verbose, max_clusters, seed, parallel_threads)
    383 
    384     (lpath, community_binary, hierarchy_binary) =\
--> 385         get_paths_and_run_convert(filename)
    386 
    387     coocc_count = None

~\Anaconda3\lib\site-packages\modisco\cluster\phenograph\core.py in get_paths_and_run_convert(filename)
    214     else:
    215         raise RuntimeError("Operating system could not be determined or is not supported. "
--> 216                            "sys.platform == {}".format(sys.platform))
    217     # Prepend appropriate path separator
    218     convert_binary = os.path.sep + convert_binary

RuntimeError: Operating system could not be determined or is not supported. sys.platform == win32

@AvantiShri
Copy link
Collaborator

Oof, it’s actually more complex than that; I had to modify the Louvain community detection binaries for tfmodisco. Phenograph ships precompiled binaries, but I was only able to prepare a precompiled binary for osx and linux because I literally don’t have access to a windows system. I can look into modifying setup.py to compile things on the fly. Part of the problem is that there are 3 binaries, two of which I modified, and 1 of which was modified by the author of phenograph - but the author of phenograph has not released the source code for those modifications (despite multiple requests jacoblevine/PhenoGraph#9), which is why I was relying on shipping the precompiled binaries rather than having the user compile from source.

In the meantime, are you able to compile the code here from source: https://github.com/kundajelab/modisco_louvain? If so you could just update your version of tfmodisco to have the precompiled windows binaries.

@AvantiShri
Copy link
Collaborator

(Let me know if you are able to compile from source, and I will provide more detailed instructions on how to modify tfmodisco to use the binaries)

@akundaje
Copy link

Av - I have a windows machine if you want to try it out.

@TommyJW
Copy link
Author

TommyJW commented Oct 18, 2018

Due to deadlines I'm working with, I've already started building a linux environment. Assuming that goes well then the current build will suit my needs. However, I wanted to open this issue to help track and raise visibility to the need.

@AvantiShri
Copy link
Collaborator

Thanks Tommy! If your datasets are not too big, I might also suggest working off a Colab notebook. There are many ways to get your data on a Colab notebook, one of which is to upload it to some publicly available server and then download it in the Colab notebook with curl, e.g.:
!curl -L "https://drive.google.com/uc?authuser=0&id=file_id_goes_here&export=download" > scores_cnn_class1_test.h5

Here's an example where I run TF-MoDISco in a Colab notebook on SVM-derived importance scores: https://github.com/kundajelab/igsvm/blob/master/lsgkmexplain_NFE2.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants