Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional API (keras.models.Model) yields erroneous graph when initialized by the input attribute of another Model constructed from an intermediate symbolic tensor. #20668

Open
briango28 opened this issue Dec 19, 2024 · 1 comment

Comments

@briango28
Copy link

When a keras.models.Model instance is initialized from the input attribute from another Model that was constructed by passing an intermediate symbolic tensor that was an output of another external layer as the input, the resulting Model incorrectly includes layers that were used to construct the intermediate input tensor.

I'm afraid the above sentence will amount to little more than gibberish without an example, so please refer to the following notebook:
https://colab.research.google.com/drive/1lfif-YosIn4wgzL8t8WjX520C_0eDTGV?usp=sharing

The notebook illustrates a simplified version of a text processing scenario, in which the full model is split into a preprocessing portion and a trainable portion.

@dhantule dhantule added the keras-team-review-pending Pending review by a Keras team member. label Jan 8, 2025
@harshaljanjani
Copy link
Contributor

harshaljanjani commented Jan 15, 2025

Hello @briango28, thanks for posting the issue and apologies for the fact that you haven't received a timely response.

The resulting Model incorrectly includes layers that were used to construct the intermediate input tensor.

Since you're referencing an input tensor tied to the computational graph of the trainable_model, this graph includes all layers and operations upstream of the input (input_layer and string_lookup even though they were not explicitly defined in the intermediate model), hence the behavior. The simple fix, as you also pointed out in the gist is to include the input_indices tensor itself as even though it carries dependencies of its own, it's treated as an atomic unit so to speak. Hence the behavior might be just as intended, the tensor is used when no inherited history is wanted, hence no extra unnecessary layers are included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants