MLIR failure error when using `DenseHashTable` in `tf.map_fun` #2246

adriangay · 2024-07-23T15:36:48Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google GKE 1.28.10-gke.1075000, but also occurs with Docker on Mac OSX 14.5
TensorFlow Serving installed from (source or binary): Docker Hub container tensorflow/serving:2.11.0
TensorFlow Serving version: 2.11, but occurs on versions up to 2.16.1
TensorFlow version (of saved model): 2.13.1

Describe the problem

When TFS starts up serving a model with the code described below in the model signature, we observe this message when Grappler runs before TFS is finally initialised:

error: 'tfg.While' op body function argument #6 type 'tensor<!tf_type.resource<tensor<!tf_type.string>>>' is not compatible with corresponding operand type: 'tensor<!tf_type.resource<tensor<!tf_type.string>, tensor<i32>>>'
2024-04-16 08:14:58.492146: E external/org_tensorflow/tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] tfg_optimizer{any(tfg-consolidate-attrs,tfg-toposort,tfg-shape-inference{graph-version=0},tfg-prepare-attrs-export)} failed: INVALID_ARGUMENT: MLIR Graph Optimizer failed:

Source code / logs

In the code below, a DenseHashTable is generated on every Predict request from an N-element string Tensor, sorted_list, the keys, and values simply the index of the entry. Also on every request, rails_to_sort is a 2D RaggedTensor of type string. rails_to_sort is passed to a map_fn which iterates the rows. The mapped function, reorder_rail, looks up the row against the sorted_list DenseHashTable to find the indexes of match positions, then sorts the row according to matches. The output is the sorted version of the input RaggedTensor.

The problem is not related to the sort itself.

The following tf.function is part of the concrete function traced on the model's serving signature:

@tf.function
def sort_rail(sorted_list, rails_to_sort):
    @tf.function()
    def reorder_rail(rail):
        lookup_indexes = lookup_table.lookup(rail)
        match_mask = ~tf.equal(lookup_indexes, -1)
        match_indexes = tf.where(condition=match_mask)
        extracted_match_indexes = tf.gather(
            lookup_indexes, match_indexes[:, 0], name="gather_extracted_match_indexes"
        )
        extracted_sorted_list = tf.gather(
            sorted_list, extracted_match_indexes, name="gather_extracted_sorted_list"
        )
        sorted_match_indexes = tf.argsort(extracted_match_indexes)
        reordered_extracted_sorted_list = tf.gather(
            extracted_sorted_list,
            sorted_match_indexes,
            name="gather_reordered_extracted_sorted_list",
        )
        composite_rail = tf.tensor_scatter_nd_update(
            tensor=rail,
            indices=match_indexes,
            updates=reordered_extracted_sorted_list,
            name="composite_rail_scatter",
        )
        return composite_rail

    lookup_table = tf.lookup.experimental.DenseHashTable(
        key_dtype=tf.string,
        value_dtype=tf.int32,
        default_value=-1,
        empty_key="$",
        deleted_key="£",
    )
    lookup_table.insert(sorted_list, tf.range(0, tf.size(sorted_list)))

    ragged_rails = tf.map_fn(
        reorder_rail,
        rails_to_sort,
        parallel_iterations=50,
        fn_output_signature=tf.RaggedTensorSpec(shape=[None], dtype=tf.string),
        name="rails_to_sort_map_fn",
    )

The graph node generated for the map_fns while loop is as follows:

node_def {
      name: "rails_to_sort_map_fn/while"
      op: "While"
      input: "rails_to_sort_map_fn/while/loop_counter:output:0"
      input: "rails_to_sort_map_fn/strided_slice:output:0"
      input: "rails_to_sort_map_fn/Const:output:0"
      input: "rails_to_sort_map_fn/TensorArrayV2_1:handle:0"
      input: "rails_to_sort_map_fn/strided_slice:output:0"
      input: "rails_to_sort_map_fn/TensorArrayUnstack/TensorListFromTensor:output_handle:0"
      input: "MutableDenseHashTable:table_handle:0"
      input: "default_value:output:0"
      input: "sorted_list"
      input: "^lookup_table_insert/LookupTableInsertV2"
      attr {
        key: "T"
        value {
          list {
            type: DT_INT32
            type: DT_INT32
            type: DT_INT32
            type: DT_VARIANT
            type: DT_INT32
            type: DT_VARIANT
            type: DT_RESOURCE
            type: DT_INT32
            type: DT_STRING
          }
        }
      }
      attr {
        key: "_lower_using_switch_merge"
        value {
          b: true
        }
      }
      attr {
        key: "_num_original_outputs"
        value {
          i: 9
        }
      }
      attr {
        key: "_read_only_resource_inputs"
        value {
          list {
          }
        }
      }
      attr {
        key: "body"
        value {
          func {
            name: "rails_to_sort_map_fn_while_body_18098"
          }
        }
      }
      attr {
        key: "cond"
        value {
          func {
            name: "rails_to_sort_map_fn_while_cond_18097"
          }
        }
      }
      attr {
        key: "output_shapes"
        value {
          list {
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
              dim {
                size: 828
              }
            }
          }
        }
      }
      attr {
        key: "parallel_iterations"
        value {
          i: 50
        }
      }
    }

The error message refers to argument 6, MutableDenseHashTable:table_handle:0, and of course it is 'tensor<!tf_type.resource<tensor<!tf_type.string>, tensor<i32>>>' as we have initialised it with string keys and int32 values. But Grappler seems to think it should be 'tensor<!tf_type.resource<tensor<!tf_type.string>>>' .

The error does not seem to affect the functionality of the graph, ie. it works as intended in unit tests and when served with TFS 'live'.

We would like to be re-assured that the error will have no functional impact, but also to understand if this failure is affecting graph optimization, ie. does this failure stop optimization, so we are losing potential performance gains?

The text was updated successfully, but these errors were encountered:

janasangeetha · 2024-12-13T04:09:13Z

Hi @adriangay
Sorry for responding this issue so late. I can see you have raised an issue in AI Forum. Please let us know if you were able to find a solution or looking for any help.
Thank you!

github-actions · 2024-12-21T02:01:10Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

janasangeetha self-assigned this Dec 10, 2024

janasangeetha added the stat:awaiting response label Dec 13, 2024

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLIR failure error when using `DenseHashTable` in `tf.map_fun` #2246

MLIR failure error when using `DenseHashTable` in `tf.map_fun` #2246

adriangay commented Jul 23, 2024

janasangeetha commented Dec 13, 2024

github-actions bot commented Dec 21, 2024

MLIR failure error when using DenseHashTable in tf.map_fun #2246

MLIR failure error when using DenseHashTable in tf.map_fun #2246

Comments

adriangay commented Jul 23, 2024

System information

Describe the problem

Source code / logs

janasangeetha commented Dec 13, 2024

github-actions bot commented Dec 21, 2024

MLIR failure error when using `DenseHashTable` in `tf.map_fun` #2246

MLIR failure error when using `DenseHashTable` in `tf.map_fun` #2246