-
Notifications
You must be signed in to change notification settings - Fork 775
[OPIK-2031] Add tags support to datasets #2709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[OPIK-2031] Add tags support to datasets #2709
Conversation
Backend Tests Results 184 files 184 suites 20m 46s ⏱️ Results for commit ea1158a. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for tagging datasets across the backend API, persistence layer, filtering, sorting, and tests.
- Adds a new
tags
column via Liquibase and updates the domain model and DAO to persist and query tags. - Extends the filtering and sorting infrastructure to handle
tags
on datasets. - Updates API resources, criteria, and integration tests to accept and validate dataset tags.
Reviewed Changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
DatasetsResourceTest.java | Updated tests to cover sorting and filtering by tags |
000022_add_tags_to_datasets.sql | Liquibase migration adding the tags column |
FilterStrategy.java | Added DATASET strategy for dataset-level filters |
FilterQueryBuilder.java | Mapped DatasetField.TAGS to the DB column |
DatasetService.java | Wired filter query builder into service find methods |
DatasetDAO.java | Extended SQL queries and mapper for tags and filters |
DatasetCriteria.java | Added filters list to criteria |
SortingFactoryDatasets.java | Included TAGS in sortable fields |
DatasetsResource.java | Accepts filters query param and populates criteria |
DatasetFilter.java | New class representing dataset filter payloads |
DatasetField.java | Enum for dataset filterable fields |
DatasetUpdate.java | Added tags field to update payload |
Dataset.java | Added tags field to the dataset API model |
Comments suppressed due to low confidence (4)
apps/opik-backend/src/test/java/com/comet/opik/api/resources/v1/priv/DatasetsResourceTest.java:1923
- List in Java does not have a reversed() method. Consider using Collections.reverse on a mutable list or ListUtils.reversed() to produce a reversed view.
List<Dataset> expectedDatasets = getExpectedDatasets.apply(datasets).reversed();
apps/opik-backend/src/test/java/com/comet/opik/api/resources/v1/priv/DatasetsResourceTest.java:1936
- List in Java does not provide a getFirst() method. You should use datasets.get(0) to access the first element.
.value(datasets.getFirst().tags().iterator().next())
apps/opik-backend/src/test/java/com/comet/opik/api/resources/v1/priv/DatasetsResourceTest.java:1938
- List in Java does not provide a getFirst() method; use datasets.get(0) when creating the singleton list.
(Function<List<Dataset>, List<Dataset>>) datasets -> List.of(datasets.getFirst())),
apps/opik-backend/src/main/java/com/comet/opik/api/filter/DatasetField.java:9
- The symbol TAGS_QUERY_PARAM is not defined in this enum. Define a constant for the request query parameter name (e.g., "tags") or reference the correct constant.
TAGS(TAGS_QUERY_PARAM, FieldType.STRING_STATE_DB),
@RequiredArgsConstructor | ||
@Getter | ||
public enum DatasetField implements Field { | ||
TAGS(TAGS_QUERY_PARAM, FieldType.STRING_STATE_DB), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the "," ? or we can just ; in the same line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better for extending to additional values. Nicer to review in those cases as well.
public class DatasetFilter extends FilterImpl { | ||
|
||
public static final TypeReference<List<DatasetFilter>> LIST_TYPE_REFERENCE = new TypeReference<>() { | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz make it the same line as 13 (dont break line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be like it is, and these are handled and formatted automatically by spotless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a couple of minor recommendations.
@RequiredArgsConstructor | ||
@Getter | ||
public enum DatasetField implements Field { | ||
TAGS(TAGS_QUERY_PARAM, FieldType.STRING_STATE_DB), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better for extending to additional values. Nicer to review in those cases as well.
public class DatasetFilter extends FilterImpl { | ||
|
||
public static final TypeReference<List<DatasetFilter>> LIST_TYPE_REFERENCE = new TypeReference<>() { | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be like it is, and these are handled and formatted automatically by spotless.
private static final String DATASET_ID_DB = "id"; | ||
private static final String DATASET_NAME_DB = "name"; | ||
private static final String DATASET_VISIBILITY_MODE_DB = "visibility"; | ||
private static final String DATASET_CREATED_AT_DB = "created_at"; | ||
private static final String DATASET_LAST_UPDATED_AT_DB = "last_updated_at"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: id, name, created_at and last_updated_at already exists. I'd reuse them and make the name neutral of the DB, as we follow the same naming convention in both. If you prefer not to, I wouldn't tie this naming to the dataset. I'd just rename them for the state DB.
|
||
private final String queryParamField; | ||
private final FieldType type; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: add new line to all files. Maybe we can automate this with spotless.
Details
Add tags support to datasets
Testing
Integration tests