Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code style #227

Open
CarloLucibello opened this issue Apr 24, 2023 · 7 comments
Open

code style #227

CarloLucibello opened this issue Apr 24, 2023 · 7 comments

Comments

@CarloLucibello
Copy link
Member

CarloLucibello commented Apr 24, 2023

I'm having a hard time understanding how a simple model like ResNet(18) is created.

I think in general for a library like this it would be good to favor

  • simplicity
  • readability
  • locality

at the cost of some code duplication and verbosity. This is basically the philosophy adopted by hugging face transformers.
So each model should have its own file and define its own submodules.

@darsnack
Copy link
Member

This is exactly the opposite approach of what we set out to do with the Metalhead re-write in 0.6. Copying and pasting a bottleneck block over and over is good for pedagogy or examples like in the model zoo, but it isn't great software practice. If we saw that level of repetition in a codebase, someone would end up writing tooling or a framework to generalize that.

There's a balance between HF transformers and generalizing to the point of confusion. Perhaps we've gone too far in one direction, but right now I think code organization/naming and documentation are what's keeping the current codebase from being clear to fresh eyes.

@darsnack
Copy link
Member

We made an effort to separate the building blocks from the models. But maybe we can reorganize to have the building block files (mbconv.jl, etc.) live right next to the model file that uses them. So still following the model + sub-module organization that Carlo suggested above.

@theabhirath
Copy link
Member

I think most of the readability question can easily be addressed by good docs. I was short on time and secretly hoping that Pollen would be more mature by the time I could find a block to work on this, but it looks like that might take a while longer. I am busy for a little bit more but over the course of this summer I hope to polish the docs, make other improvements and get a proper v0.8 release in.

@darsnack
Copy link
Member

It's not all on you. I will make time this week to do these cleanup steps so that additional contributors (like Carlo) can help, and we can release 0.8 soon (it is way overdue).

@CarloLucibello
Copy link
Member Author

now that I'm more familiar with the library I can see the logic of it

@CarloLucibello
Copy link
Member Author

CarloLucibello commented Apr 27, 2023

Reopening with another stylistic comment. Models here rely a lot on Chain, Parallel and SkipConnection instead of defining custom blocks and their forward pass. This is fine, makes the implementation concise, but as a byproduct the instantiated model becomes very hard to "navigate" by field access.

As an example, below I access a specific layer inside two equivalent implementations of ViT, one from here and the other from torchvision:

julia> pymodel.encoder.layers[0].ln_1
Python LayerNorm: LayerNorm((768,), eps=1e-06, elementwise_affine=True)

julia> jlmodel.layers[1][5][1][1].layers[1]
LayerNorm(768)      # 1_536 parameters

@darsnack
Copy link
Member

darsnack commented Apr 27, 2023

One thing we want to do but have not done yet is to use the named syntax for Chain and Parallel. So we should have

Chain(resblock0 = Parallel(..., residual = ..., bottleneck = ...), ...)

Similar to Flax but without defining new types or monolith types that "do it all." Part of the reason that syntax got added to Flux is to address this style issue in Metalhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants