Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLIR failure error when using DenseHashTable in tf.map_fun #2246

Open
adriangay opened this issue Jul 23, 2024 · 2 comments
Open

MLIR failure error when using DenseHashTable in tf.map_fun #2246

adriangay opened this issue Jul 23, 2024 · 2 comments
Assignees
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response

Comments

@adriangay
Copy link

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google GKE 1.28.10-gke.1075000, but also occurs with Docker on Mac OSX 14.5
  • TensorFlow Serving installed from (source or binary): Docker Hub container tensorflow/serving:2.11.0
  • TensorFlow Serving version: 2.11, but occurs on versions up to 2.16.1
  • TensorFlow version (of saved model): 2.13.1

Describe the problem

When TFS starts up serving a model with the code described below in the model signature, we observe this message when Grappler runs before TFS is finally initialised:

error: 'tfg.While' op body function argument #6 type 'tensor<!tf_type.resource<tensor<!tf_type.string>>>' is not compatible with corresponding operand type: 'tensor<!tf_type.resource<tensor<!tf_type.string>, tensor<i32>>>'
2024-04-16 08:14:58.492146: E external/org_tensorflow/tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] tfg_optimizer{any(tfg-consolidate-attrs,tfg-toposort,tfg-shape-inference{graph-version=0},tfg-prepare-attrs-export)} failed: INVALID_ARGUMENT: MLIR Graph Optimizer failed:

Source code / logs

In the code below, a DenseHashTable is generated on every Predict request from an N-element string Tensor, sorted_list, the keys, and values simply the index of the entry. Also on every request, rails_to_sort is a 2D RaggedTensor of type string. rails_to_sort is passed to a map_fn which iterates the rows. The mapped function, reorder_rail, looks up the row against the sorted_list DenseHashTable to find the indexes of match positions, then sorts the row according to matches. The output is the sorted version of the input RaggedTensor.

The problem is not related to the sort itself.

The following tf.function is part of the concrete function traced on the model's serving signature:

@tf.function
def sort_rail(sorted_list, rails_to_sort):
    @tf.function()
    def reorder_rail(rail):
        lookup_indexes = lookup_table.lookup(rail)
        match_mask = ~tf.equal(lookup_indexes, -1)
        match_indexes = tf.where(condition=match_mask)
        extracted_match_indexes = tf.gather(
            lookup_indexes, match_indexes[:, 0], name="gather_extracted_match_indexes"
        )
        extracted_sorted_list = tf.gather(
            sorted_list, extracted_match_indexes, name="gather_extracted_sorted_list"
        )
        sorted_match_indexes = tf.argsort(extracted_match_indexes)
        reordered_extracted_sorted_list = tf.gather(
            extracted_sorted_list,
            sorted_match_indexes,
            name="gather_reordered_extracted_sorted_list",
        )
        composite_rail = tf.tensor_scatter_nd_update(
            tensor=rail,
            indices=match_indexes,
            updates=reordered_extracted_sorted_list,
            name="composite_rail_scatter",
        )
        return composite_rail

    lookup_table = tf.lookup.experimental.DenseHashTable(
        key_dtype=tf.string,
        value_dtype=tf.int32,
        default_value=-1,
        empty_key="$",
        deleted_key="£",
    )
    lookup_table.insert(sorted_list, tf.range(0, tf.size(sorted_list)))

    ragged_rails = tf.map_fn(
        reorder_rail,
        rails_to_sort,
        parallel_iterations=50,
        fn_output_signature=tf.RaggedTensorSpec(shape=[None], dtype=tf.string),
        name="rails_to_sort_map_fn",
    )

The graph node generated for the map_fns while loop is as follows:

node_def {
      name: "rails_to_sort_map_fn/while"
      op: "While"
      input: "rails_to_sort_map_fn/while/loop_counter:output:0"
      input: "rails_to_sort_map_fn/strided_slice:output:0"
      input: "rails_to_sort_map_fn/Const:output:0"
      input: "rails_to_sort_map_fn/TensorArrayV2_1:handle:0"
      input: "rails_to_sort_map_fn/strided_slice:output:0"
      input: "rails_to_sort_map_fn/TensorArrayUnstack/TensorListFromTensor:output_handle:0"
      input: "MutableDenseHashTable:table_handle:0"
      input: "default_value:output:0"
      input: "sorted_list"
      input: "^lookup_table_insert/LookupTableInsertV2"
      attr {
        key: "T"
        value {
          list {
            type: DT_INT32
            type: DT_INT32
            type: DT_INT32
            type: DT_VARIANT
            type: DT_INT32
            type: DT_VARIANT
            type: DT_RESOURCE
            type: DT_INT32
            type: DT_STRING
          }
        }
      }
      attr {
        key: "_lower_using_switch_merge"
        value {
          b: true
        }
      }
      attr {
        key: "_num_original_outputs"
        value {
          i: 9
        }
      }
      attr {
        key: "_read_only_resource_inputs"
        value {
          list {
          }
        }
      }
      attr {
        key: "body"
        value {
          func {
            name: "rails_to_sort_map_fn_while_body_18098"
          }
        }
      }
      attr {
        key: "cond"
        value {
          func {
            name: "rails_to_sort_map_fn_while_cond_18097"
          }
        }
      }
      attr {
        key: "output_shapes"
        value {
          list {
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
            }
            shape {
              dim {
                size: 828
              }
            }
          }
        }
      }
      attr {
        key: "parallel_iterations"
        value {
          i: 50
        }
      }
    }

The error message refers to argument 6, MutableDenseHashTable:table_handle:0, and of course it is 'tensor<!tf_type.resource<tensor<!tf_type.string>, tensor<i32>>>' as we have initialised it with string keys and int32 values. But Grappler seems to think it should be 'tensor<!tf_type.resource<tensor<!tf_type.string>>>' .

The error does not seem to affect the functionality of the graph, ie. it works as intended in unit tests and when served with TFS 'live'.

We would like to be re-assured that the error will have no functional impact, but also to understand if this failure is affecting graph optimization, ie. does this failure stop optimization, so we are losing potential performance gains?

@janasangeetha janasangeetha self-assigned this Dec 10, 2024
@janasangeetha
Copy link

Hi @adriangay
Sorry for responding this issue so late. I can see you have raised an issue in AI Forum. Please let us know if you were able to find a solution or looking for any help.
Thank you!

Copy link

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response
Projects
None yet
Development

No branches or pull requests

2 participants