-
Notifications
You must be signed in to change notification settings - Fork 10
Refactor ObsDim out of LearnBase and update related interfaces #44
Conversation
I also think the docs on this topic could use some love (which I am willing to do). Currently, they are in hosted MLDataPattern.jl. Would it make sense to create some docs for LearnBase.jl and add a forward pointer to the section in the MLDataPattern.jl docs? Just so that someone looking to implement this interface can find the correct descriptions. |
Lovely @darsnack. Can you confirm that the code changes are mainly moving code from other.jl to observation.jl? I agree with you 💯 percent that we need more love in the docs here #38 In my opinion we should be concentrating the documentation efforts in LearnBase.jl as opposed to downstream packages. |
Also, I would like to ask the permission of other JuliaML members to add @darsnack to the organization? He is putting a lot of effort to improve the ML ecosystem in Julia, and it would be nice to give him access to infrastructure setups, etc. |
Yeah the code changes are removing the |
Since Also, I believe things will be clearer if we consistently make |
What about the case where someone wants to implement If we drop support for this, then the developer needs to implement Also, a non-keyword based function is required to support the above behavior, since we can't dispatch on the type of keyword arguments. |
As long as Also, the core logic in |
I do see your point. So would the change be to make the interface
Agreed. I created JuliaML/MLDataPattern.jl#50 which removes a lot of the excess code. |
Yes, and all the complicated dispatch routes get simplified to |
Just a note on that change. If we use keyword arguments as the interface, then we can't dispatch on the type of LearnBase.getobs(data, idx; obsdim = default_obsdim(data)) = getobs(data, idx, obsdim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, we could dispatch on getobs(tup::Tuple, indices; obsdims::Tuple=default_obsdim(tup)) = map((x, d)->getobs(x, indices, d), tup, obsdims)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nothing of value to add here. You guys are doing a great job as far as I am concerned
@juliohm As far as I am concerned I trust your judgment in these matters. Also I just noticed your were a "member" of the org so I upgraded your membership status as you are as much an "owner" as I have ever been. Thank you for all that you do |
Just an update, I have ported the changes from this PR to MLLabelUtils. MLDataPattern is in progress. I have finished refactoring the portions of MLDataPattern that don't involve the targets. Now that MLLabelUtils is ready, I can refactor the targets. Hoping to get this done this week so I can stop having it hang around the back of my brain. |
Although not participated, I'm so glad to see the enormous progress on the Flux side. |
What is missing here instead? Changes can be propagated with no hurry in downstream packages once we merge this PR and tag a major release |
I've merged #45 to migrate from Travis/AppVeyor to GitHub CI. Thanks @CarloLucibello for the update. I will try to close and reopen this PR to see if the GitHub Actions are triggered now. |
Think I will need to rebase for CI to trigger correctly? @CarloLucibello no changes left, but since this is just an interface, we can't test whether it will work without refactoring the downstream packages. |
c952f4b
to
64b73b7
Compare
This is in response to #41 and #42. Based on discussions, I removed
ObsDim
from LearnBase. Correspondingly, I propose we update the interfaces that relied on it to use aobsdim
keyword argument instead. Here's a summary of what that means:getobs(x, idx, ::ObsDim.Undefined)
=>getobs(x, idx, obsdim::Nothing)
: LearnBase.jl now contains the default routing that mapsgetobs(x, idx, ::Nothing)
togetobs(x, idx)
. A developer implementing the interface can still choose to only implementgetobs(x, idx)
if the observation dimension is nonsensical.getobs(x, idx; obsdim = default_obsdim(x))
: This is the keyword version ofgetobs
. By default,default_obsdim(x) = nothing
, and for arraysdefault_obsdim(x) = ndims(x)
(in line with popular packages like Flux.jl). Most importantly, the docs will make it clear that the non-keyword argument functions are the ones to implement. This keyword argument version is for end-user convenience.getobs(x, obsdim)
: This is no longer going to be supported. It conflicts for dispatch withgetobs(x, idx)
, and the routing logic required to make multiple dispatch work is not worth it in my opinion. I believe this function and the correspondinggetobs(x)
existed for cases where fetching all observations at once is more efficient than fetching all observations one at a time. I think we can still support this optimization by dispatching ongetobs(x, 1:nobs(x))
since1:nobs(x) <: UnitRange
.getobs(x)
: Same as above.Additionally,
nobs
is now restricted toStatsBase
. For example, in MLDataPattern.jl, I implementStatsBase.nobs
instead ofLearnBase.nobs
.I think that covers the bulk of the changes. I've done some crude REPL regression testing to make sure that these changes don't cause performance penalties for common use cases like data stored in arrays.
cc @juliohm @Evizero @oxinabox @racinmat