Skip to content

Add SimVP model computational graph #1549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

easyeasydev
Copy link
Collaborator

@easyeasydev easyeasydev commented Dec 5, 2024

Description of changes:

This PR is to add the computational graph of the SimVP model.

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

Copy link

codecov bot commented Dec 5, 2024

Codecov Report

Attention: Patch coverage is 77.45098% with 23 lines in your changes missing coverage. Please review.

Project coverage is 60.59%. Comparing base (3ee5e48) to head (7d39b8b).

Files with missing lines Patch % Lines
lib/models/src/models/simvp/simvp.cc 77.45% 23 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1549      +/-   ##
==========================================
+ Coverage   60.47%   60.59%   +0.11%     
==========================================
  Files         606      607       +1     
  Lines       14725    14827     +102     
==========================================
+ Hits         8905     8984      +79     
- Misses       5820     5843      +23     
Flag Coverage Δ
unittests 60.59% <77.45%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
lib/models/src/models/simvp/simvp.cc 77.45% <77.45%> (ø)

@lockshaw lockshaw changed the base branch from repo-refactor to master December 16, 2024 08:35
Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you double checked the model using the model viewer to make sure everything looks as expected?

Also, there need to be a lot more comments/links referencing the original code--the goal with these is to make it trivial for someone else a year or two in the future to look through the code without you and easily double check everything against the reference implementation without having to track a whole bunch of stuff down. Ideally also some assertions on tensor shapes.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @easyeasydev and @reyna-abhyankar)


lib/models/src/models/simvp/simvp.cc line 24 at r1 (raw file):

std::vector<bool> create_simvp_samplings(size_t N_S, bool reverse) {
  size_t N_S_even_floor = (N_S / 2) * 2;

Pull out a separate function named round_down_to_nearest_even to improve readbility

Code quote:

 size_t N_S_even_floor = (N_S / 2) * 2;

lib/models/src/models/simvp/simvp.cc line 26 at r1 (raw file):

  size_t N_S_even_floor = (N_S / 2) * 2;

  auto const change_to_true = [&](size_t idx) -> bool {

Is this actually the name used in the reference implementation? It seems rather weird.

Code quote:

change_to_true

lib/models/src/models/simvp/simvp.cc line 37 at r1 (raw file):

  }

  return samplings;

In all of these PRs use functions from utils/containers more frequently to improve readbility.

Suggestion:

  return transform(range(N_S_even_floor), change_to_true);

lib/models/src/models/simvp/simvp.cc line 43 at r1 (raw file):

                                  SimVPConfig const &config,
                                  tensor_guid_t const &input,
                                  size_t in_dim,

Prefer int over size_t


lib/models/src/models/simvp/simvp.cc line 68 at r1 (raw file):

                         SimVPConfig const &config,
                         tensor_guid_t const &input) {
  size_t C = config.in_shape.at(1); // Channel

It seems like in_shape has a fixed number of dimension, so it would probably be better to use four named fields (or even an additional struct) rather than a vector

Code quote:

  size_t C = config.in_shape.at(1); // Channel

lib/models/src/models/simvp/simvp.cc line 81 at r1 (raw file):

  tensor_guid_t latent = enc1;

  for (size_t i = 1; i < samplings.size(); i++) {

Suggestion:

  for (int sampling : subvec(samplings, 1, std::nullopt)) {

lib/models/src/models/simvp/simvp.cc line 88 at r1 (raw file):

                                 config.hid_S,
                                 config.spatio_kernel_enc,
                                 samplings[i],

Suggestion:

                                 samplings.at(i)

lib/models/src/models/simvp/simvp.cc line 105 at r1 (raw file):

                                        float drop_path,
                                        float init_value) {
  return input;

Seems to be missing an actual implementation?


lib/models/src/models/simvp/simvp.cc line 134 at r1 (raw file):

                                      float drop,
                                      float drop_path) {
  if (config.model_type != "gSTA") {

Prefer an dtgen enum instead of a string for the model type


lib/models/src/models/simvp/simvp.cc line 147 at r1 (raw file):

  // Downsample
  z = create_simvp_gsta_meta_block(
      cgb, config, z, channel_in, channel_hid, mlp_ratio, drop, drop_path);

Add argument name comments for all invocations of these many-argument functions for readability

Suggestion:

  z = create_simvp_gsta_meta_block(
      /*cgb=*/cgb, 
      /*config=*/config, 
      /*input=*/z, 
      /*in_channels=*/channel_in, 
      /*out_channels=*/channel_hid, 
      /*mlp_ratio=*/mlp_ratio, 
      /*drop=*/drop, 
      /*drop_path=*/drop_path);

lib/models/src/models/simvp/simvp.cc line 150 at r1 (raw file):

  // Middle layers
  for (size_t i = 1; i < config.N_T - 1; i++) {

Suggestion:

  for (int i : range(1, config.N_T - 1)) {

lib/models/src/models/simvp/simvp.cc line 153 at r1 (raw file):

    z = create_simvp_gsta_meta_block(
        cgb, config, z, channel_hid, channel_hid, mlp_ratio, drop, drop_path);
  }

Considering that this f(f(f(f(...f(x_0)...)))) pattern keeps showing up in model definitions, it might be nice to pull it out into a separate function in utils/containers named something like "primitive_recurse" or "recurse_n" or something. Thoughts? I'm sure @Marsella8 would be happy to add such a function

Code quote:

  for (size_t i = 1; i < config.N_T - 1; i++) {
    z = create_simvp_gsta_meta_block(
        cgb, config, z, channel_hid, channel_hid, mlp_ratio, drop, drop_path);
  }

lib/models/src/models/simvp/simvp.cc line 168 at r1 (raw file):

  std::cout << "hid shape: " << cgb.get_shape(hid) << std::endl;
  std::cout << "skip shape: " << cgb.get_shape(skip) << std::endl;

Remove prints.

Also, I assume if you were looking at the shapes then you know what they should be, so you should add in asserts on the tensor shapes where possible to improve readability and reduce the possibility of bugs

Code quote:

  std::cout << "hid shape: " << cgb.get_shape(hid) << std::endl;
  std::cout << "skip shape: " << cgb.get_shape(skip) << std::endl;

lib/models/src/models/simvp/simvp.cc line 174 at r1 (raw file):

  tensor_guid_t latent = hid;
  for (size_t i = 0; i < samplings.size() - 1; i++) {

Suggestion:

 for (bool sampling : subvec(samplings, 1, std::nullopt)) {

lib/models/src/models/simvp/simvp.cc line 192 at r1 (raw file):

                                          config.spatio_kernel_dec,
                                          false,
                                          samplings[samplings.size() - 1]);

Suggestion:

                                          samplings.back());

lib/models/src/models/simvp/simvp.cc line 207 at r1 (raw file):

  size_t W = config.in_shape.at(3); // Width

  // std::cout << "B T C H W: " << B << " " << T << " " << C << " " << H << " "

Remove prints


lib/models/src/models/simvp/simvp.cc line 219 at r1 (raw file):

  auto [embed, skip] = create_simvp_encoder(cgb, config, input);

  // std::cout << "embed shape: " << cgb.get_shape(embed) << std::endl;

Remove prints, add assertions on shape


lib/models/src/models/simvp/simvp.cc line 232 at r1 (raw file):

                                              config.drop_path);

  // TODO: need to reshape hid here

What's the plan for all of these TODOs? Are you waiting on something else to be implemented?


lib/models/include/models/simvp/simvp_config.struct.toml line 28 at r1 (raw file):

[[fields]]
name = "batch_size"
type = "size_t"

Prefer int over size_t


lib/models/include/models/simvp/simvp_config.struct.toml line 43 at r1 (raw file):

[[fields]]
name = "N_T"

Let's keep variable names lower-case

Suggestion:

name = "n_t"

lib/models/include/models/simvp/simvp.h line 18 at r1 (raw file):

std::vector<bool> create_simvp_samplings(size_t N_S, bool reverse = false);

tensor_guid_t create_simvp_convsc(ComputationGraphBuilder &cgb,

Would be nice to have links here in the docstrings to wherever the equivalent code in OpenSTL is for each of these functions

Copy link
Contributor

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @easyeasydev, @lockshaw, and @reyna-abhyankar)


lib/models/src/models/simvp/simvp.cc line 153 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Considering that this f(f(f(f(...f(x_0)...)))) pattern keeps showing up in model definitions, it might be nice to pull it out into a separate function in utils/containers named something like "primitive_recurse" or "recurse_n" or something. Thoughts? I'm sure @Marsella8 would be happy to add such a function

See #1563

Copy link
Collaborator Author

@easyeasydev easyeasydev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 6 files reviewed, 21 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)


lib/models/include/models/simvp/simvp.h line 18 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Would be nice to have links here in the docstrings to wherever the equivalent code in OpenSTL is for each of these functions

Added the links to docstrings.


lib/models/include/models/simvp/simvp_config.struct.toml line 28 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer int over size_t

Done


lib/models/include/models/simvp/simvp_config.struct.toml line 43 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Let's keep variable names lower-case

This was actually to match with original implementation. Do we want to follow the OpenSTL convention or change here?


lib/models/src/models/simvp/simvp.cc line 24 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Pull out a separate function named round_down_to_nearest_even to improve readbility

Done.


lib/models/src/models/simvp/simvp.cc line 26 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Is this actually the name used in the reference implementation? It seems rather weird.

Hmm, actually this is what I named it. This function was a bit nonintuitive in the original function.

I have changed this function name to change_to_true_at_idx


lib/models/src/models/simvp/simvp.cc line 37 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

In all of these PRs use functions from utils/containers more frequently to improve readbility.

TBH, I doubt it will be a bit hard to understand by converting to this transform call in this function. But I have updated it, which is definitely more concise.


lib/models/src/models/simvp/simvp.cc line 43 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer int over size_t

Could you remind me why we try to avoid size_t in flexflow? I remember that's some limitation in FF codebase right?


lib/models/src/models/simvp/simvp.cc line 68 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

It seems like in_shape has a fixed number of dimension, so it would probably be better to use four named fields (or even an additional struct) rather than a vector

This was mainly to mimic the original implementation. I can change this if we really want. https://github.com/chengtan9907/OpenSTL/blob/b658dab3da427c8750c8595316e7ae9d70b818df/examples/tutorial.ipynb


lib/models/src/models/simvp/simvp.cc line 105 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Seems to be missing an actual implementation?

Hmm, missed this somehow. Will add this.


lib/models/src/models/simvp/simvp.cc line 134 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer an dtgen enum instead of a string for the model type

Done.


lib/models/src/models/simvp/simvp.cc line 147 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Add argument name comments for all invocations of these many-argument functions for readability

Done.


lib/models/src/models/simvp/simvp.cc line 168 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Remove prints.

Also, I assume if you were looking at the shapes then you know what they should be, so you should add in asserts on the tensor shapes where possible to improve readability and reduce the possibility of bugs

This function looks like this because we may need a reshape op for this model. Without reshape op, I wasn't sure how to implement this the same as the reference implementation.


lib/models/src/models/simvp/simvp.cc line 207 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Remove prints

Same as above, leaving this due to missing of reshape op.


lib/models/src/models/simvp/simvp.cc line 219 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Remove prints, add assertions on shape

Same as above, leaving this due to missing of reshape op.


lib/models/src/models/simvp/simvp.cc line 232 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What's the plan for all of these TODOs? Are you waiting on something else to be implemented?

Same as above, leaving this due to missing of reshape op.


lib/models/src/models/simvp/simvp.cc line 81 at r1 (raw file):

  tensor_guid_t latent = enc1;

  for (size_t i = 1; i < samplings.size(); i++) {

Wasn't even aware of this subvec function before...

But for this, I think changing to subvec makes the code a bit more confusing actually... It's a clear indicator to people what the for loop is doing, but people may need some effort to understand what subvec really does.

But I can still change this if you think the subvec is better (by mimicing python style).


lib/models/src/models/simvp/simvp.cc line 88 at r1 (raw file):

                                 config.hid_S,
                                 config.spatio_kernel_enc,
                                 samplings[i],

Done.


lib/models/src/models/simvp/simvp.cc line 150 at r1 (raw file):

  // Middle layers
  for (size_t i = 1; i < config.N_T - 1; i++) {

Done.


lib/models/src/models/simvp/simvp.cc line 174 at r1 (raw file):

  tensor_guid_t latent = hid;
  for (size_t i = 0; i < samplings.size() - 1; i++) {

Similar to above, I doubt this subvec will make the code harder to understand. But can change this if we indeed want subvec.


lib/models/src/models/simvp/simvp.cc line 192 at r1 (raw file):

                                          config.spatio_kernel_dec,
                                          false,
                                          samplings[samplings.size() - 1]);

Done.

Copy link
Collaborator

@reyna-abhyankar reyna-abhyankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 26 unresolved discussions (waiting on @easyeasydev and @lockshaw)


lib/models/include/models/simvp/simvp_config.struct.toml line 28 at r1 (raw file):

Previously, easyeasydev wrote…

Done

We now have a nonnegative_int type that should be used over int/size_t whenever applicable, so you can use that instead.


lib/models/src/models/simvp/simvp.cc line 68 at r1 (raw file):

Previously, easyeasydev wrote…

This was mainly to mimic the original implementation. I can change this if we really want. https://github.com/chengtan9907/OpenSTL/blob/b658dab3da427c8750c8595316e7ae9d70b818df/examples/tutorial.ipynb

How about using TensorShape (since that's actually what in_shape is)?


lib/models/src/models/simvp/simvp.cc line 105 at r1 (raw file):

Previously, easyeasydev wrote…

Hmm, missed this somehow. Will add this.

Still missing


lib/models/src/models/simvp/simvp.cc line 134 at r1 (raw file):

Previously, easyeasydev wrote…

Done.

Also https://reviewable.io/reviews/flexflow/flexflow-train/1549#-OKXAcOJ2qsSRySR1KS2


lib/models/src/models/simvp/simvp.cc line 16 at r2 (raw file):

      /*N_S=*/4,
      /*N_T=*/4,
      /*model_type=*/FlexFlow::SimVPModelType::gSTA,

Don't think FlexFlow:: is needed here


lib/models/src/models/simvp/simvp.cc line 23 at r2 (raw file):

      /*spatio_kernel_dec=*/3,
      /*in_shape=*/
      {10, 3, 32, 32},

Where is this shape obtained from?


lib/models/src/models/simvp/simvp.cc line 151 at r2 (raw file):

  tensor_guid_t z = embed;

Missing implementation for stochastic depth decay rule? https://github.com/chengtan9907/OpenSTL/blob/b658dab3da427c8750c8595316e7ae9d70b818df/openstl/models/simvp_model.py#L220


lib/models/src/models/simvp/simvp.cc line 206 at r2 (raw file):

                                 config.hid_S,
                                 config.spatio_kernel_dec,
                                 false,

Add comment for what this boolean value is (same for next line)


lib/models/include/models/simvp/simvp.h line 63 at r2 (raw file):

// Refer to
// https://github.com/chengtan9907/OpenSTL/blob/b658dab3da427c8750c8595316e7ae9d70b818df/openstl/models/simvp_model.py#L100
tensor_guid_t create_simvp_middle_net(ComputationGraphBuilder &cgb,

This definition doesn't match the link provided. Did you mean https://github.com/chengtan9907/OpenSTL/blob/b658dab3da427c8750c8595316e7ae9d70b818df/openstl/models/simvp_model.py#L211

Also there seem to be two "middle nets", one that's inception-based and one that isn't. Which one(s) are we trying to add in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants