Skip to content

Failure running a SavedModel exported from a tf.Module with a Keras model as an instance variable #20095

@ivansoban

Description

@ivansoban

I have been advised by the Tensorflow team to post this issue here. I will restate issue below.

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No, because the sample code produces a core dump.

Source

binary

TensorFlow version

2.17.0

Custom code

No

OS platform and distribution

Linux Ubuntu 22.04.4 LTS

Mobile device

No response

Python version

3.10.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Saving a tf.Module using tf.saved_model.save when that class contains a Keras model in an instance variable results in a FAILED_PRECONDITION when run using saved_model_cli or libtensorflow.

In Tensorflow v2.15.0, the behavior is as expected: the graph execution proceeds without any errors and the expected results are produced.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

SHAPE = (1, 5)

class TestModel(tf.Module):
    def __init__(self):
        super().__init__()
        self.dense_layer = tf.keras.layers.Dense(10)

    @tf.function(input_signature=[tf.TensorSpec(shape=SHAPE, dtype=tf.float32)])
    def run(self, x):
        return self.dense_layer(x)


module = TestModel()
sample_input = tf.random.normal(SHAPE, dtype=tf.float32)
module.run(sample_input)

np.save('sample_input.npy', sample_input.numpy())
tf.saved_model.save(module, "test_model")

# # To reproduce, run the following:
# python test.py && saved_model_cli run --dir test_model --tag_set serve --signature_def serving_default --inputs 'x=sample_input.npy'

Relevant log output

2024-08-01 15:25:35.204057: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-01 15:25:35.261217: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-01 15:25:35.278801: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-01 15:25:35.313892: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-01 15:25:37.270110: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-08-01 15:25:38.898105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4281 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:3b:00.0, compute capability: 7.0
2024-08-01 15:25:38.898868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 944 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB, pci bus id: 0000:d8:00.0, compute capability: 7.0
WARNING:tensorflow:From /home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py:716: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
W0801 15:25:38.903731 139977869341120 deprecation.py:50] From /home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py:716: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
INFO:tensorflow:Restoring parameters from test_model/variables/variables
I0801 15:25:38.936800 139977869341120 saver.py:1417] Restoring parameters from test_model/variables/variables
2024-08-01 15:25:38.941206: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-08-01 15:25:39.144248: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
	 [[{{function_node __inference_run_106}}{{node dense_1/Add/ReadVariableOp}}]]
2024-08-01 15:25:39.144340: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
	 [[{{function_node __inference_run_106}}{{node dense_1/Add/ReadVariableOp}}]]
	 [[StatefulPartitionedCall/_21]]
2024-08-01 15:25:39.144423: I tensorflow/core/framework/local_rendezvous.cc:423] Local rendezvous recv item cancelled. Key hash: 12615348601576968325
Traceback (most recent call last):
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1401, in _do_call
    return fn(*args)
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1384, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1477, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
  (0) FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
	 [[{{function_node __inference_run_106}}{{node dense_1/Add/ReadVariableOp}}]]
	 [[StatefulPartitionedCall/_21]]
  (1) FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
	 [[{{function_node __inference_run_106}}{{node dense_1/Add/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/iantolic-soban/tf_bug/.venv/bin/saved_model_cli", line 8, in <module>
    sys.exit(main())
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1340, in main
    app.run(smcli_main)
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1338, in smcli_main
    args.func()
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1036, in run
    run_saved_model_with_feed_dict(
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py", line 721, in run_saved_model_with_feed_dict
    outputs = sess.run(output_tensor_names_sorted, feed_dict=inputs_feed_dict)
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 971, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1214, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1394, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1420, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.FailedPreconditionError: Graph execution error:

2 root error(s) found.
  (0) FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
	 [[{{node dense_1/Add/ReadVariableOp}}]]
	 [[StatefulPartitionedCall/_21]]
  (1) FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
	 [[{{node dense_1/Add/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions