Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted model #113

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Weighted model #113

wants to merge 12 commits into from

Conversation

michalk8
Copy link

Hi, @davidsebfischer ,
I've started working on the weighted model based on your notes (thanks a lot) and I think I've gotten most of the stuff in numpy right, though I haven't tested this yet.

Questions:

  • how shall closedform_glm_mean and closedform_glm_scale be modified to include weights?

Todos:

  • check if the propagation of weights is correct2
  • write tests (not only for NB)

Todos to discuss:

  • remove dead code
  • typing
  • implement rest of the models so that it matches the TF backend
  • numba - are there any bottlenecks/how does the performance compare to e.g. tf/dask or using sparse?
  • software engineering stuff: pre-commit, travis, black - what's the min. python version you support?

My recommendation would be for now, it would be just the weighted NB model in numpy + tests + SWE stuff, the rest could 2 separate PRs (1 for TF, 1 for the rest of the models and numba if desired/needed).

Related CellRank issue: theislab/cellrank#377

P.S. I like how the repo's structure (i.e. the api/external/pkg_constants - it nicely avoids cyclic imports). The only ugly thing is, that you have to import .api.

np.einsum('ob,of->fob', xh, w),
xh
)
w = self.jac_weight_b_j(j=j) # (observations x features)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like a bug, so I've changed it (I think this is one of the unused function, based on what PyCharm told me).

@michalk8
Copy link
Author

michalk8 commented Sep 23, 2020

Hi @davidsebfischer
I was wondering about the following:

I think both of these are very related and could be solved essentially by doing this: https://github.com/theislab/batchglm/pull/113/files#diff-2a60c6b6af3d9ce6c634d470674d6a55R49 (just remove the true and replace the computes with if dask then compute).
Or for sparse matrices, only allow dask (I already check for that).

@davidsebfischer
Copy link
Contributor

how shall closedform_glm_mean and closedform_glm_scale be modified to include weights?

Sorry I did not see this question before. closedform_glm_mean is probably still possible for weighting, if the weights are used on the observations in this linear system? closedform_glm_scale is methods of moments, this should also work. In worst case you can always disable these at first if weights are used in default to easier initialisations.

P.S. I like how the repo's structure (i.e. the api/external/pkg_constants - it nicely avoids cyclic imports). The only ugly thing is, that you have to import .api.

definitely also open to suggestions here!

is there a reason why the estimator always needs dask (the design_loc is always dask array)?

we can change that!

when the data matrix is sparse and as_dask=False, it gets broken (see https://github.com/theislab/batchglm/pull/113/files#diff-16f723c756df0788db6b339a29976b1fR196)
I think both of these are very related and could be solved essentially by doing this: https://github.com/theislab/batchglm/pull/113/files#diff-2a60c6b6af3d9ce6c634d470674d6a55R49 (just remove the true and replace the computes with if dask then compute).
Or for sparse matrices, only allow dask (I already check for that).

cool, we can definitely do that!

@michalk8
Copy link
Author

we can change that!
cool, we can definitely do that!

Ok, I've started with typing + defining some utility functions. There are 4/5 more things I'd like to do:

  • I'd also like to remove the sys.stdout (in favour of print/log)
  • f-strings (there's a backport for Python3.6 https://github.com/asottile/future-fstrings)
  • enable pre-commit (including black formatting, if you don't mind it)
  • later down the line: docstring polishing (some arguments are missing, style is inconsistent [e.g. use numpy-style])

Do you have any tests locally so that I might check if things are breaking? If not, I will start with some simple unit tests.

@davidsebfischer
Copy link
Contributor

* I'd also like to remove the sys.stdout (in favour of print/log)

ok for me, but then we have to do this consistently across the package on this PR.

* f-strings (there's a backport for Python3.6 https://github.com/asottile/future-fstrings)

yes, have also thought about this before - great!

* enable pre-commit (including black formatting, if you don't mind it)

nice let s try!

* later down the line: docstring polishing (some arguments are missing, style is inconsistent [e.g. use numpy-style])

yes we can build this as we go, this is also work in progress in internal functions!

Do you have any tests locally so that I might check if things are breaking? If not, I will start with some simple unit tests.

I haven't automated running these but I have test for most functionalities, we could set up continuous integration if you want, happy to spend some time doing that! Important test for example concern accurarcy on the parameter estimates: https://github.com/theislab/batchglm/blob/master/batchglm/unit_test/test_acc_glm_all_numpy.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants