Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Enhanced Support for "Hack-Like Problems" with Non-WA Verdicts #38

Open
baluteshih opened this issue Nov 12, 2024 · 4 comments

Comments

@baluteshih
Copy link

baluteshih commented Nov 12, 2024

For problems where the goal is to "provide the original problem and a fake solution, and try to construct a test case to make it fail," it's straightforward to include both the correct and fake solutions in the checker to achieve "Wrong Answer (WA) Hacking" behavior, as demonstrated in this example.

However, if the expected verdicts are TLE, RE, or other non-WA results, implementing this requires unconventional workarounds, such as using clock() (as in this example) or try statements. Currently, there seems to be no standardized way to handle this.

Since there is a "Custom Summary" feature, adding an option to "run the judge's program only" and detect its verdict directly could simplify the process. A possible approach might be to use the "Number of Execution Stages" feature, one can place the judge’s program in the second stage since the input is the output from the first stage.

@adrien1018
Copy link
Member

My current thought is to add an option to run a judge-provided program as the last stage, and a "hack wrapper" option that does the following things:

  • Give the original verdict if the last stage is not run (previous stages failed; default behavior of multistage)
  • Give WA if the judge-provided stage exited with status code 1 (or whatever constant number)
  • Give WA if the final verdict of the normal judge flow is AC
  • Give AC otherwise
    This way, the problem setter can just write a program that exit(1) if the output format is incorrect, otherwise run the code to be hacked. This also naturally handles the case that the problem being hacked is a special judge problem.
    Does this design sound reasonable?

@baluteshih
Copy link
Author

  • Give WA if the judge-provided stage exited with status code 1 (or whatever constant number)

This seems unnecessary, as output format validation can already be handled within the special judge program. Adding this feature may bring some potential issues, especially when validation is time-consuming. In such cases, the judge-provided program's time limit will become less.

The other proposed design ideas sound excellent!

@adrien1018
Copy link
Member

Oh so in that case the special judge will need to validate the output after the first stage, and do a manual comparison to determine AC after the second stage?

@baluteshih
Copy link
Author

Yes, this is a more reasonable process in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants