Feature Request: Enhanced Support for "Hack-Like Problems" with Non-WA Verdicts #38

baluteshih · 2024-11-12T17:52:55Z

For problems where the goal is to "provide the original problem and a fake solution, and try to construct a test case to make it fail," it's straightforward to include both the correct and fake solutions in the checker to achieve "Wrong Answer (WA) Hacking" behavior, as demonstrated in this example.

However, if the expected verdicts are TLE, RE, or other non-WA results, implementing this requires unconventional workarounds, such as using clock() (as in this example) or try statements. Currently, there seems to be no standardized way to handle this.

Since there is a "Custom Summary" feature, adding an option to "run the judge's program only" and detect its verdict directly could simplify the process. A possible approach might be to use the "Number of Execution Stages" feature, one can place the judge’s program in the second stage since the input is the output from the first stage.

The text was updated successfully, but these errors were encountered:

adrien1018 · 2024-11-13T05:59:49Z

My current thought is to add an option to run a judge-provided program as the last stage, and a "hack wrapper" option that does the following things:

Give the original verdict if the last stage is not run (previous stages failed; default behavior of multistage)
Give WA if the judge-provided stage exited with status code 1 (or whatever constant number)
Give WA if the final verdict of the normal judge flow is AC
Give AC otherwise
This way, the problem setter can just write a program that exit(1) if the output format is incorrect, otherwise run the code to be hacked. This also naturally handles the case that the problem being hacked is a special judge problem.
Does this design sound reasonable?

baluteshih · 2024-11-13T06:13:01Z

Give WA if the judge-provided stage exited with status code 1 (or whatever constant number)

This seems unnecessary, as output format validation can already be handled within the special judge program. Adding this feature may bring some potential issues, especially when validation is time-consuming. In such cases, the judge-provided program's time limit will become less.

The other proposed design ideas sound excellent!

adrien1018 · 2024-11-13T07:54:09Z

Oh so in that case the special judge will need to validate the output after the first stage, and do a manual comparison to determine AC after the second stage?

baluteshih · 2024-11-13T07:57:08Z

Yes, this is a more reasonable process in my opinion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Enhanced Support for "Hack-Like Problems" with Non-WA Verdicts #38

Feature Request: Enhanced Support for "Hack-Like Problems" with Non-WA Verdicts #38

baluteshih commented Nov 12, 2024 •

edited

Loading

adrien1018 commented Nov 13, 2024

baluteshih commented Nov 13, 2024

adrien1018 commented Nov 13, 2024

baluteshih commented Nov 13, 2024

Feature Request: Enhanced Support for "Hack-Like Problems" with Non-WA Verdicts #38

Feature Request: Enhanced Support for "Hack-Like Problems" with Non-WA Verdicts #38

Comments

baluteshih commented Nov 12, 2024 • edited Loading

adrien1018 commented Nov 13, 2024

baluteshih commented Nov 13, 2024

adrien1018 commented Nov 13, 2024

baluteshih commented Nov 13, 2024

baluteshih commented Nov 12, 2024 •

edited

Loading