Skip to content

Add uniqueness restriction on names#567

Open
niemela wants to merge 1 commit intomasterfrom
stricter-naming
Open

Add uniqueness restriction on names#567
niemela wants to merge 1 commit intomasterfrom
stricter-naming

Conversation

@niemela
Copy link
Member

@niemela niemela commented Dec 19, 2025

Closes #539

I don't feel we had consensus on the issue yet, but this is how it could be done.

Please chime in...

@niemela niemela requested review from Matistjati and Tagl December 19, 2025 02:03
It is good practice to use a numbered prefix such as `00`, `01`, `02`, `03`, and so on, to get the desired order of test cases, while keeping the file names descriptive.
Remember that the numbered prefixes should be zero padded to the same length to get the expected lexicographical order.

Test case file names (the base name of the `.in` file) must be unique across the entire `data/` directory, unless the test cases are equivalent.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should be a tiny bit more specific with the definition of equivalence? Surely we don't care about output validator flags in this definition, right?
For any two test cases, if the contents of their .in and .files directory are equivalent, as well as the args sequence in the .yaml file, then the input of the two test cases is equivalent. For any two test cases, if their input, output validator arguments and the contents of their .ans files are equivalent, then the test cases are equivalent.

At the very least, we should say "if their inputs are equivalent". Additionally, we should probably either copy paste the definition or link to it.

Copy link
Collaborator

@Tagl Tagl Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely we don't care about output validator flags in this definition, right?

Agreed, I want to be able to reuse a .in file with different output validator flags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe we do, but then we have two different kinds of equivalence. The one you want to use is "the inputs are equivalent", the one we already have defined and that I used is "the test cases are equivalent". The latter allows a judge system to reuse the results of judging the test case, the former does not. This is why I would like to use that definition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe we do, but then we have two different kinds of equivalence.

Yes. For a concrete example: Sweden has a problem asking "find the min and max possible thing for a given input", with subtask "you only need to find the max correctly". I would argue that the most correct solution in this instance is that this property is part of the group via output_validator_flags, not any testcase itself, and we want to be able to reuse them (problem in question: https://po2punkt0.kattis.com/problems/robottavlingen)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... I see you have not read the entirety of me and @thorehusfeldt's discussion in #523 😏.

In that discussion the consensus seems to go towards output_validator_flags being part of "the test case". I think @thorehusfeldt's is arguing from a point of "the sameness of a test case should imply the sameness of the judgement of said test case", and I would agree with that. I feels strange to say that you could pass a test case and then fail "the same" test case? They are quite obviously not the same then.

So, IMO, what you are talking about is identical input, not identical test cases. I would argue that that can be sufficiently handled by symlinks of copying?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem format explicitly allows systems to assume determinism though.

This is also broken in a different way: it breaks every single randomized submission. The judge is basically allowed to rerun a submission an infinite number of times and check that it fails 0 times (so basically until it fails). Clearly that is not what we want right?

We very specifically mean: run this test case, and run it exactly once.

So I guess the discussion here is: can we change it to at most once for 'identical' testcases (for some definition of identical).

I would suggest that every data/**/*.in corresponds to exactly 1 run, as it I have always understood it.

If you want to avoid this, use require_pass: easier_group instead to be explicit.

If we want some kind of uniqueness constraints for names of test cases I would want the following meaning "If two files have the same base name + extension they should have the same content".

this sounds reasonable. Note that this does not imply the opposite: if two files have the same content, they should have the same base name.

Copy link
Collaborator

@Matistjati Matistjati Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the discussion here is: can we change it to at most once for 'identical' testcases (for some definition of identical).

Playing the devil's advocate: if your submission gets TLE on codeforces, it will be rerun under certain conditions. Even if it's strange, is this something we want to forbid?

require_pass: easier_group

This is not sufficient. For example: hamiltonian path. 50% of the points for deciding existence, and 100% for outputting an actual path. The behavior we want here for strict subgroups is to rerun every test case and take min (which can be optimized by the determinism assumption). If we remove the determinism assumption, we could of course patch this by assigning the caching behavior to symlinks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something or would a setting on the problem such as

assume_deterministic: false

which defaults to true not solve this?
I feel like this is always a configuration on the problem and in almost all cases you should be allowed to assume determinism without any real issues.

If you want more granularity you could even set it on the test group instead of the problem itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if your submission gets TLE on codeforces, it will be rerun under certain conditions.

Can you expand on this?

50% of the points for deciding existence, and 100% for outputting an actual path.

Ah, interesting motvating example. I hadn't considered this so far.

assume_deterministic: false

That's sort of fine, but then requires implementations to actually implement and read this flag at the right times; not sure that will happen in practice.


A problem remains though: there can be fundamentally non-deterministic solutions to eg the hamiltonpath problem. If you assume a non-deterministic solution is deterministic, that kinda sounds like undefined behaviour.
Instead, I'd argue this is somewhat orthogonal to whether the solution is deterministic, and the flag should be called dedup_identical_testcases which could default to true I suppose. (Or it could default to true when groups are used and default to false otherwise, but that is kinda complex.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing the devil's advocate: if your submission gets TLE on codeforces, it will be rerun under certain conditions. Even if it's strange, is this something we want to forbid?

I am not sure if this is still a thing because it was abused to by time heavy/random submission... (it was better to timeout instead of producing WA because than you got another shot). So yeah I would want to forbid this.

assume_deterministic: false

This does not at all solve the issue I have. If you assume something that is not true you will get undefined behavior. We should definitely avoid this.

As far as I can see we have two somewhat independent issues:

  • The first thing is about assuming determinism. This does not solve the issue that it was intended to solve and is produces a lot of other issues. Why don't we simply define what we (as far as I can tell) all want:
    • The judging should behave as if the submission was executed on each unique testcase exactly once.
    • Note that this still allows stuff like lazy judging because this is indistinguishable
    • And judges can obviously still overrule that and do rejudgings
  • Now the second part is about what makes a testcase unique (and hence allows caching). Here we have two opinions:
    • Its either the name/path, or
    • its the files/args/... everything that in some way influence the submission/validator/verdict

For the second part I am technically fine with both options, but I personally prefer option 1. Mostly because I have not seen any good argument for option 2. The arguments I saw were all of the form "we want to reuse the outcome of a previous run" but then it is clear to me that we should "reuse the outcome of a previous run" and not "reuse a testcase".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stricter limits on naming test cases

5 participants