Skip to content

Conversation

@mariankrotil
Copy link
Contributor

Description

We now return component-config references in bucket and table detail tools. We udpated the output with createdBy, lastUpdatedBy reference fields targeting configs which affected the given storage item. This information is fetched from the given items' metadata. Additionally, we improved the search inner methods to enable searching in stringified configurations (raw jsons in strings) and use this improved search to look for table-ids usage across different components / transformations storage configurations. When getting a table detail, we can optionally retrieve and compute usedBy component references for a given table to see where the table is used (input / output mappings of different components).

Linear: AI-2161-buckets-tables-references

Change Type

  • Major (breaking changes, significant new features)
  • Minor (new features, enhancements, backward compatible)
  • Patch (bug fixes, small improvements, no new features)

Summary

  • Update search methods to include matched patterns and search in config payloads (raw jsons)
  • Add search method for id in configurations based on the item type using inner search functions
  • Add metadata fetcher / parser to get component-configs references which created and lastly updated given bucket and table, wire this to the get_buckets and get_tables tool outputs.
  • Add include_usage optional parameter to get_tables tool which if enabled runs search for given table-id usage within configs in components / transformations.

Testing

  • Tested with Cursor AI desktop (Streamable-HTTP transports)

Optional testing

  • Tested with Cursor AI desktop (all transports)
  • Tested with claude.ai web and canary-orion MCP (SSE and Streamable-HTTP)
  • Tested with In Platform Agent on canary-orion
  • Tested with RO chat on canary-orion

Checklist

  • Self-review completed
  • Unit tests added/updated (if applicable)
  • Integration tests added/updated (if applicable)
  • Project version bumped according to the change type (if applicable)
  • Documentation updated (if applicable)

@linear
Copy link

linear bot commented Jan 12, 2026

Copy link
Contributor

@vita-stejskal vita-stejskal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lot of nice changes, but some parts feel a bit too complex .

async def _fetch_configurations(
client: KeboolaClient, patterns: list[re.Pattern], item_types: Iterable[ItemType] | None = None
) -> list[SearchHit]:
async def _fetch_configurations(client: KeboolaClient, spec: SearchSpec) -> list[SearchHit]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should no longer be marked as private, if it's used/called from outside of this module.

@staticmethod
def _stringify(value: JsonDict) -> str:
try:
return json.dumps(value, sort_keys=True, default=str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use ensure_ascii=False.

)

if self.return_matched_patterns:
return list(matches)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return values of this function are pretty confusing. According to the docstring you should always return list(matches), but if not self.return_matched_patterns the function returns only the first match.

Should the return_matched_patterns be renamed to return_all_matched_patterns? If so, then you should change the for-loop above and break out of it as soon as some pattern matches.

component_type
for item in self.item_types
if item in SEARCH_ITEM_TYPE_TO_COMPONENT_TYPES
for component_type in SEARCH_ITEM_TYPE_TO_COMPONENT_TYPES[item]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to read if you used for compnent_type in SEARCH_ITEM_TYPE_TO_COMPONENT_TYPES.get(item, []) and removed the if-statement.

"""
Checks configuration fields within specified scopes for pattern matches.
:param cfg: The search configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this. There is no cfg parameter in this function.

'fully_qualified_name': db_table_info.fqn.identifier if db_table_info else None,
'links': links,
'created_by': created_by,
'last_updated_by': last_updated_by,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't you use the validator function for this too so that the created_by and last_updated_by fields are handled the same way in TableDetail and in BucketDetail?

# Add the component usage to the table detail
if include_usage:
usage_by_ids = await find_id_usage(
client, table_ids, ['component', 'transformation'], ['storage.input', 'storage.output']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't use table_ids here. First, it may contain IDs of tables that don't exist or live on some other branch and there is no point in finding their usages.

Second, and that's more important, if you are on a branch you need to translate the branch-specific table IDs to their "production" (or "main") branch equivalents. So, you should calculate the target_ids parameter for the find_id_usage() function like this:

target_ids = [table.prod_id for table in tables_by_id.values()]

client, table_ids, ['component', 'transformation'], ['storage.input', 'storage.output']
)
for usage_by_id in usage_by_ids:
if usage_by_id.target_id in tables_by_id and usage_by_id.usage_references:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can usage_by_id.usage_references really be empty? I don't think so, but if it can then it is a bug in the find_id_usage() function. It should not return "empty" usage objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this file is the other way around. It should be test_tools.py and not tools_test.py. Did you actually run those tests? Did pytest find this file and run its tests?

assert result[0].table_id == expected_first_table_id


class TestSearchSpec:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell Claude or your preferred coding agent to rewrite these tests as a single parameterized test function. It'll be so much easier to read and maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants