Reorganize codebase #130

juliohm · 2020-04-13T12:42:32Z

This PR moves source code around to improve the organization of concepts in the package. This exercise has already helped me understand various points of improvement such as the implementation of meta losses "scaled", "ordinal" and "weightedbinary".

There is a single breaking change here, which is the removal of the OrdinalHingeLoss type alias. I don't think we should export this specific combination. Users can still construct OrdinalMarginLoss{HingeLoss} like with other losses.

src/LossFunctions.jl

src/supervised.jl

juliohm · 2020-04-13T22:11:40Z

@joshday I know this PR has a lot of changes to review, but I appreciate if you can take a quick look at it before we can proceed with additional improvements. Below I summarize the changes in a high level language so that you don't waste too much time reading the code. I can guarantee that the tests are passing locally as before, and that this is mainly a reorganization of the source code.

Reorganization

Previously the source file supervised.jl was really large. It contained both a set of traits for the abstract types SupervisedLoss, DistanceLoss and MarginLoss, multiple docstrings for various methods of value, value!, deriv, deriv!, deriv2, and deriv2!, and implementations for the various aggregation modes None, Sum, Mean, WeightedSum and WeightedMean.

My changes consisted of splitting this file into various small files of digestible size for the different aggregation modes. I've also eliminated docstrings that were repeated for different methods of the same function. The plan is to move these docstrings to the LearnBase.jl interface package.

Upcoming changes

This present PR is self-contained. After it is merged I have a few more changes planned in LearnBase.jl according to JuliaML/LearnBase.jl#35 before I can keep working in LossFunctions.jl

I've started reorganizing LearnBase.jl to facilitate future contributions. The new codebase will be split into small files for different concepts. In this case, I've already created a file for the costs.jl and migrated all the docstrings from LossFunctions.jl there (local repo). I've also noticed that some traits have never been implemented here, which is a sign that the trait could be removed from the interface temporarily. My next immediate steps after this PR is merged are:

Address Refactoring of codebase LearnBase.jl#35
Refactor LossFunctions.jl again to use the new LearnBase.jl
Add support for CategoricalArrays.jl in LossFunctions.jl
Come back to writing this paper that depends on these PRs :)

johnnychen94 · 2020-04-15T12:23:03Z

The documentation failure can be fixed by removing docs/Manifest.toml; it locks a different uuid of LossFunctions (ref: ad125ad)

juliohm · 2020-04-15T13:11:01Z

Thanks, will take a look at it today. Basically I need to update Manifest.toml to look at the parent folder in dev version (] dev ..). I think I had used the UUID from LearnBase.jl in some previous commit, but that is fixed now thanks for JuliaRegistrator that pointed the issue.

juliohm · 2020-04-17T15:46:47Z

This PR is finally ready after a great amount of refactoring in LearnBase.jl and here. There are a number of code simplifications that I would like to share with you before we proceed and merge.

Improvements

The interface for all types of losses is now fully defined in LearnBase.jl. This means that the docstrings for the end users (e.g. value, deriv, deriv2) all live there. LossFunctions.jl only implements the concepts now, and adds methods which may depend on additional packages (e.g. GPU implementations). This also means that the concept of AggregationMode and the submodule AggMode now live in LearnBase.jl for reuse in other implementation packages like PenaltyFunctions.jl
The source code is now organized with the different concepts in mind so that new potential contributors can navigate more easily. In particular, we have a clear separation between loss types and meta-loss types. This refactoring also helped identify source files that need further work such as sparse.jl which currently only contains a single implementation of a single method for margin-based losses with sparse arrays.

Breaking

Scaled losses are now a single family as opposed to multiple ScaledDistanceLoss, ScaledMarginLoss, etc. The function scaled(l, a) has been deprecated in favor of the simple constructor ScaledLoss(l, a), or alternatively a * l syntax.
Weighted margin losses are now constructed with WeightedMarginLoss(l, w). Before we had a WeightedBinaryLoss type with some typos in the constructors allowing the weighting of non-margin-based losses. We also had this floating function called weightedloss(l, w) that has been deprecated in favor of the constructor for consistency.
The value_deriv function has been deprecated in favor of the simple value and deriv pair. This small optimization was not used by any downstream package, and increases the maintenance burden. I've also deprecated the methods of value, deriv and deriv2 which accepted a single array with precomputed "agreement" like value(l, rand(100)). This can be useful for internal optimizations, and we can always reintroduce these methods internally if we find the need. End users will rarely precompute margins like yt = y.*t and then call the methods above. They will likely use non-allocating versions value(l, t, y, AggMode.Mean()).

Other functions that could perhaps be deprecated are the mutating versions value!, deriv!, and deriv2!. I didn't deprecate them in this PR because that requires more thought. I understand that we can always achieve the same non-allocating result with broadcasting @. buffer = value(l, t, y) for normal arrays, but I am not sure this line would also work with sparse arrays for example? Because I don't have enough experience with sparse arrays in Julia, I decided to not touch the functions and maintain them.

I've also updated the documentation to reflect the latest version of the code. In order to build it successfully, you need to get the master version of LearnBase.jl from inside the docs folder:

] add LearnBase#master
] instantiate
include("make.jl")

Appreciate if you can take a second look at this PR. The plan is to merge it by the end of the weekend. Meanwhile I will be working on adding support for CategoricalArrays.jl, which is the last item necessary for this paper I am working on.

Reorganize codebase

4d72772

juliohm added the enhancement label Apr 13, 2020

juliohm requested review from oxinabox and joshday April 13, 2020 12:42

joshday reviewed Apr 13, 2020

View reviewed changes

src/LossFunctions.jl Show resolved Hide resolved

joshday reviewed Apr 13, 2020

View reviewed changes

src/supervised.jl Outdated Show resolved Hide resolved

juliohm added 11 commits April 13, 2020 10:25

Add depwarnfor OrdinalHingeLoss

0cf95ac

Move AggMode.None to its own file

16f923c

Move AggMode.Sum to its own file

a16e147

Move AggMode.Mean to its own file

4c178fb

Move AggMode.WeightedSum to its own file

0ec5124

Move AggMode.WeightedMean to its own file

3e02d0f

Rename avgmode to agg in supervised.jl

fccb161

Move AggMode fallbacks to aggmode.jl

afa11e7

Eliminate unnecessary docstrings

1e73870

Update imports from LearnBase.jl

144fde7

Move docstrings to LearnBase.jl

1bdc758

juliohm requested review from joshday and removed request for oxinabox April 13, 2020 22:11

juliohm mentioned this pull request Apr 14, 2020

Refactor codebase JuliaML/LearnBase.jl#36

Merged

juliohm added 7 commits April 16, 2020 10:22

Fix WeightedBinaryLoss

3b90f41

Replace weightedloss by WeightedBinaryLoss

304bbab

Update tests

375b90a

Update to new LearnBase.jl

e4b18b7

Rename WeightedBinaryLoss to WeightedMarginLoss

05c77bc

Move sparse.jl to root folder

423d56b

Simplify ScaledLoss

057da38

juliohm added 6 commits April 16, 2020 16:28

Update tests

74bd5c0

Update tests

9480ae1

Remove remaining value_deriv

4f1c03a

Add deprecations

1131935

Simplify WeightedMarginLoss

57b06a5

Update docs

609c6a8

juliohm merged commit b0763e9 into master Apr 20, 2020

juliohm deleted the srcrefactor branch April 20, 2020 12:03

johnnychen94 mentioned this pull request Apr 22, 2020

Add support for CategoricalArrays.jl #132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorganize codebase #130

Reorganize codebase #130

juliohm commented Apr 13, 2020

juliohm commented Apr 13, 2020

johnnychen94 commented Apr 15, 2020

juliohm commented Apr 15, 2020

juliohm commented Apr 17, 2020 •

edited

Loading

Reorganize codebase #130

Reorganize codebase #130

Conversation

juliohm commented Apr 13, 2020

juliohm commented Apr 13, 2020

Reorganization

Upcoming changes

johnnychen94 commented Apr 15, 2020

juliohm commented Apr 15, 2020

juliohm commented Apr 17, 2020 • edited Loading

Improvements

Breaking

juliohm commented Apr 17, 2020 •

edited

Loading