Skip to content

[OPIK-2031] Add tags support to datasets #2709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

BorisTkachenko
Copy link
Contributor

Details

Add tags support to datasets

Testing

Integration tests

@BorisTkachenko BorisTkachenko self-assigned this Jul 11, 2025
@BorisTkachenko BorisTkachenko requested a review from a team as a code owner July 11, 2025 13:21
Copy link
Contributor

Backend Tests Results

  184 files    184 suites   20m 46s ⏱️
4 282 tests 4 279 ✅ 3 💤 0 ❌
4 271 runs  4 268 ✅ 3 💤 0 ❌

Results for commit ea1158a.

@YarivHashaiComet YarivHashaiComet requested a review from Copilot July 13, 2025 06:57
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for tagging datasets across the backend API, persistence layer, filtering, sorting, and tests.

  • Adds a new tags column via Liquibase and updates the domain model and DAO to persist and query tags.
  • Extends the filtering and sorting infrastructure to handle tags on datasets.
  • Updates API resources, criteria, and integration tests to accept and validate dataset tags.

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
DatasetsResourceTest.java Updated tests to cover sorting and filtering by tags
000022_add_tags_to_datasets.sql Liquibase migration adding the tags column
FilterStrategy.java Added DATASET strategy for dataset-level filters
FilterQueryBuilder.java Mapped DatasetField.TAGS to the DB column
DatasetService.java Wired filter query builder into service find methods
DatasetDAO.java Extended SQL queries and mapper for tags and filters
DatasetCriteria.java Added filters list to criteria
SortingFactoryDatasets.java Included TAGS in sortable fields
DatasetsResource.java Accepts filters query param and populates criteria
DatasetFilter.java New class representing dataset filter payloads
DatasetField.java Enum for dataset filterable fields
DatasetUpdate.java Added tags field to update payload
Dataset.java Added tags field to the dataset API model
Comments suppressed due to low confidence (4)

apps/opik-backend/src/test/java/com/comet/opik/api/resources/v1/priv/DatasetsResourceTest.java:1923

  • List in Java does not have a reversed() method. Consider using Collections.reverse on a mutable list or ListUtils.reversed() to produce a reversed view.
            List<Dataset> expectedDatasets = getExpectedDatasets.apply(datasets).reversed();

apps/opik-backend/src/test/java/com/comet/opik/api/resources/v1/priv/DatasetsResourceTest.java:1936

  • List in Java does not provide a getFirst() method. You should use datasets.get(0) to access the first element.
                                    .value(datasets.getFirst().tags().iterator().next())

apps/opik-backend/src/test/java/com/comet/opik/api/resources/v1/priv/DatasetsResourceTest.java:1938

  • List in Java does not provide a getFirst() method; use datasets.get(0) when creating the singleton list.
                            (Function<List<Dataset>, List<Dataset>>) datasets -> List.of(datasets.getFirst())),

apps/opik-backend/src/main/java/com/comet/opik/api/filter/DatasetField.java:9

  • The symbol TAGS_QUERY_PARAM is not defined in this enum. Define a constant for the request query parameter name (e.g., "tags") or reference the correct constant.
    TAGS(TAGS_QUERY_PARAM, FieldType.STRING_STATE_DB),

@RequiredArgsConstructor
@Getter
public enum DatasetField implements Field {
TAGS(TAGS_QUERY_PARAM, FieldType.STRING_STATE_DB),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the "," ? or we can just ; in the same line?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better for extending to additional values. Nicer to review in those cases as well.

public class DatasetFilter extends FilterImpl {

public static final TypeReference<List<DatasetFilter>> LIST_TYPE_REFERENCE = new TypeReference<>() {
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz make it the same line as 13 (dont break line)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be like it is, and these are handled and formatted automatically by spotless.

Copy link
Collaborator

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of minor recommendations.

@RequiredArgsConstructor
@Getter
public enum DatasetField implements Field {
TAGS(TAGS_QUERY_PARAM, FieldType.STRING_STATE_DB),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better for extending to additional values. Nicer to review in those cases as well.

public class DatasetFilter extends FilterImpl {

public static final TypeReference<List<DatasetFilter>> LIST_TYPE_REFERENCE = new TypeReference<>() {
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be like it is, and these are handled and formatted automatically by spotless.

Comment on lines +72 to +76
private static final String DATASET_ID_DB = "id";
private static final String DATASET_NAME_DB = "name";
private static final String DATASET_VISIBILITY_MODE_DB = "visibility";
private static final String DATASET_CREATED_AT_DB = "created_at";
private static final String DATASET_LAST_UPDATED_AT_DB = "last_updated_at";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: id, name, created_at and last_updated_at already exists. I'd reuse them and make the name neutral of the DB, as we follow the same naming convention in both. If you prefer not to, I wouldn't tie this naming to the dataset. I'd just rename them for the state DB.


private final String queryParamField;
private final FieldType type;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: add new line to all files. Maybe we can automate this with spotless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants