-
Notifications
You must be signed in to change notification settings - Fork 21
Ai 2161 include component usage references in buckets and tables tools #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…o the search inner methods
vita-stejskal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a lot of nice changes, but some parts feel a bit too complex .
| async def _fetch_configurations( | ||
| client: KeboolaClient, patterns: list[re.Pattern], item_types: Iterable[ItemType] | None = None | ||
| ) -> list[SearchHit]: | ||
| async def _fetch_configurations(client: KeboolaClient, spec: SearchSpec) -> list[SearchHit]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should no longer be marked as private, if it's used/called from outside of this module.
| @staticmethod | ||
| def _stringify(value: JsonDict) -> str: | ||
| try: | ||
| return json.dumps(value, sort_keys=True, default=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use ensure_ascii=False.
| ) | ||
|
|
||
| if self.return_matched_patterns: | ||
| return list(matches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return values of this function are pretty confusing. According to the docstring you should always return list(matches), but if not self.return_matched_patterns the function returns only the first match.
Should the return_matched_patterns be renamed to return_all_matched_patterns? If so, then you should change the for-loop above and break out of it as soon as some pattern matches.
| component_type | ||
| for item in self.item_types | ||
| if item in SEARCH_ITEM_TYPE_TO_COMPONENT_TYPES | ||
| for component_type in SEARCH_ITEM_TYPE_TO_COMPONENT_TYPES[item] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to read if you used for compnent_type in SEARCH_ITEM_TYPE_TO_COMPONENT_TYPES.get(item, []) and removed the if-statement.
| """ | ||
| Checks configuration fields within specified scopes for pattern matches. | ||
| :param cfg: The search configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this. There is no cfg parameter in this function.
| 'fully_qualified_name': db_table_info.fqn.identifier if db_table_info else None, | ||
| 'links': links, | ||
| 'created_by': created_by, | ||
| 'last_updated_by': last_updated_by, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't you use the validator function for this too so that the created_by and last_updated_by fields are handled the same way in TableDetail and in BucketDetail?
| # Add the component usage to the table detail | ||
| if include_usage: | ||
| usage_by_ids = await find_id_usage( | ||
| client, table_ids, ['component', 'transformation'], ['storage.input', 'storage.output'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't use table_ids here. First, it may contain IDs of tables that don't exist or live on some other branch and there is no point in finding their usages.
Second, and that's more important, if you are on a branch you need to translate the branch-specific table IDs to their "production" (or "main") branch equivalents. So, you should calculate the target_ids parameter for the find_id_usage() function like this:
target_ids = [table.prod_id for table in tables_by_id.values()]
| client, table_ids, ['component', 'transformation'], ['storage.input', 'storage.output'] | ||
| ) | ||
| for usage_by_id in usage_by_ids: | ||
| if usage_by_id.target_id in tables_by_id and usage_by_id.usage_references: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can usage_by_id.usage_references really be empty? I don't think so, but if it can then it is a bug in the find_id_usage() function. It should not return "empty" usage objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this file is the other way around. It should be test_tools.py and not tools_test.py. Did you actually run those tests? Did pytest find this file and run its tests?
| assert result[0].table_id == expected_first_table_id | ||
|
|
||
|
|
||
| class TestSearchSpec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please tell Claude or your preferred coding agent to rewrite these tests as a single parameterized test function. It'll be so much easier to read and maintain.
Description
We now return component-config references in bucket and table detail tools. We udpated the output with createdBy, lastUpdatedBy reference fields targeting configs which affected the given storage item. This information is fetched from the given items' metadata. Additionally, we improved the search inner methods to enable searching in stringified configurations (raw jsons in strings) and use this improved search to look for table-ids usage across different components / transformations storage configurations. When getting a table detail, we can optionally retrieve and compute usedBy component references for a given table to see where the table is used (input / output mappings of different components).
Linear: AI-2161-buckets-tables-references
Change Type
Summary
get_bucketsandget_tablestool outputs.include_usageoptional parameter toget_tablestool which if enabled runs search for given table-id usage within configs in components / transformations.Testing
Streamable-HTTPtransports)Optional testing
canary-orionMCP (SSEandStreamable-HTTP)canary-orioncanary-orionChecklist