Ai 2161 include component usage references in buckets and tables tools #349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

mariankrotil wants to merge 8 commits into main from AI-2161-include-links-in-get-tables

Contributor

mariankrotil commented Jan 12, 2026

Description

We now return component-config references in bucket and table detail tools. We udpated the output with createdBy, lastUpdatedBy reference fields targeting configs which affected the given storage item. This information is fetched from the given items' metadata. Additionally, we improved the search inner methods to enable searching in stringified configurations (raw jsons in strings) and use this improved search to look for table-ids usage across different components / transformations storage configurations. When getting a table detail, we can optionally retrieve and compute usedBy component references for a given table to see where the table is used (input / output mappings of different components).

Linear: AI-2161-buckets-tables-references

Change Type

Major (breaking changes, significant new features)
Minor (new features, enhancements, backward compatible)
Patch (bug fixes, small improvements, no new features)

Summary

Update search methods to include matched patterns and search in config payloads (raw jsons)
Add search method for id in configurations based on the item type using inner search functions
Add metadata fetcher / parser to get component-configs references which created and lastly updated given bucket and table, wire this to the get_buckets and get_tables tool outputs.
Add include_usage optional parameter to get_tables tool which if enabled runs search for given table-id usage within configs in components / transformations.

Testing

Tested with Cursor AI desktop (Streamable-HTTP transports)

Optional testing

Tested with Cursor AI desktop (all transports)
Tested with claude.ai web and canary-orion MCP (SSE and Streamable-HTTP)
Tested with In Platform Agent on canary-orion
Tested with RO chat on canary-orion

Checklist

Self-review completed
Unit tests added/updated (if applicable)
Integration tests added/updated (if applicable)
Project version bumped according to the change type (if applicable)
Documentation updated (if applicable)

mariankrotil added 8 commits

January 6, 2026 14:24


          AI-2161 feat: add usage utility methods retrieving data from componen…

854357c

…t configurations


          AI-2161 feat: use usage methods in tool outputs for buckets and tables


          AI-2161 feat: add methods getting table/bucket usage, wire features t…

8fea69d

…o the search inner methods


          AI-2161 feat: add usage and references to the storage tooling

f33e99f


          AI-2161 chore: bump version

ad8e2e5


          AI-2161 fix: apply tox

d754ab0


          AI-2161 test: follow src structure, add tests for search

30ec06a


          AI-2161 refactor: remove unused file

52b5497

linear bot commented Jan 12, 2026

AI-2161 Include upstream links in table lists/detail

mariankrotil requested a review from vita-stejskal

January 12, 2026 18:12

vita-stejskal reviewed

View reviewed changes

Contributor

vita-stejskal left a comment

It's a lot of nice changes, but some parts feel a bit too complex .

src/keboola_mcp_server/tools/search/tools.py

    
              async def _fetch_configurations(

                  client: KeboolaClient, patterns: list[re.Pattern], item_types: Iterable[ItemType] | None = None

              ) -> list[SearchHit]:

              async def _fetch_configurations(client: KeboolaClient, spec: SearchSpec) -> list[SearchHit]:

Contributor

vita-stejskal Jan 15, 2026

This function should no longer be marked as private, if it's used/called from outside of this module.

src/keboola_mcp_server/tools/search/tools.py

    
                  @staticmethod

                  def _stringify(value: JsonDict) -> str:

                      try:

                          return json.dumps(value, sort_keys=True, default=str)

Contributor

vita-stejskal Jan 15, 2026

You should use ensure_ascii=False.

src/keboola_mcp_server/tools/search/tools.py

    
                      )

                      if self.return_matched_patterns:

                          return list(matches)

Contributor

vita-stejskal Jan 15, 2026

The return values of this function are pretty confusing. According to the docstring you should always return list(matches), but if not self.return_matched_patterns the function returns only the first match.

Should the return_matched_patterns be renamed to return_all_matched_patterns? If so, then you should change the for-loop above and break out of it as soon as some pattern matches.

src/keboola_mcp_server/tools/search/tools.py

    
                                  component_type

                                  for item in self.item_types

                                  if item in SEARCH_ITEM_TYPE_TO_COMPONENT_TYPES

                                  for component_type in SEARCH_ITEM_TYPE_TO_COMPONENT_TYPES[item]

Contributor

vita-stejskal Jan 15, 2026

It would be easier to read if you used for compnent_type in SEARCH_ITEM_TYPE_TO_COMPONENT_TYPES.get(item, []) and removed the if-statement.

src/keboola_mcp_server/tools/search/tools.py

    
                      """

                      Checks configuration fields within specified scopes for pattern matches.

                      :param cfg: The search configuration.

Contributor

vita-stejskal Jan 15, 2026

Remove this. There is no cfg parameter in this function.

src/keboola_mcp_server/tools/storage.py

    
                          'fully_qualified_name': db_table_info.fqn.identifier if db_table_info else None,

                          'links': links,

                          'created_by': created_by,

                          'last_updated_by': last_updated_by,

Contributor

vita-stejskal Jan 15, 2026

Why didn't you use the validator function for this too so that the created_by and last_updated_by fields are handled the same way in TableDetail and in BucketDetail?

src/keboola_mcp_server/tools/storage.py

    
                      # Add the component usage to the table detail

                      if include_usage:

                          usage_by_ids = await find_id_usage(

                              client, table_ids, ['component', 'transformation'], ['storage.input', 'storage.output']

Contributor

vita-stejskal Jan 15, 2026

You can't use table_ids here. First, it may contain IDs of tables that don't exist or live on some other branch and there is no point in finding their usages.

Second, and that's more important, if you are on a branch you need to translate the branch-specific table IDs to their "production" (or "main") branch equivalents. So, you should calculate the target_ids parameter for the find_id_usage() function like this:

target_ids = [table.prod_id for table in tables_by_id.values()]

src/keboola_mcp_server/tools/storage.py

    
                              client, table_ids, ['component', 'transformation'], ['storage.input', 'storage.output']

                          )

                          for usage_by_id in usage_by_ids:

                              if usage_by_id.target_id in tables_by_id and usage_by_id.usage_references:

Contributor

vita-stejskal Jan 15, 2026

Can usage_by_id.usage_references really be empty? I don't think so, but if it can then it is a bug in the find_id_usage() function. It should not return "empty" usage objects.

tests/search/tools_test.py

Contributor

vita-stejskal Jan 15, 2026

The name of this file is the other way around. It should be test_tools.py and not tools_test.py. Did you actually run those tests? Did pytest find this file and run its tests?

tests/search/tools_test.py

    
                          assert result[0].table_id == expected_first_table_id

              class TestSearchSpec:

Contributor

vita-stejskal Jan 15, 2026

Please tell Claude or your preferred coding agent to rewrite these tests as a single parameterized test function. It'll be so much easier to read and maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet