-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow new steps/actions if the isCompleted flag is set #31
Comments
Should that be handled on the server side? |
It sounds like a good idea to handle it on the server side. It's also been suggested that we have this as a flag that can be set, so let folks choose whether they continue to run after task failure -- that should be easy enough to add, as long as we make sure that folks report this in their papers, and add some very obvious output somewhere to remind folks which mode they're using. :) |
That seems weird. What is the need for ignoring task failure? Do they want a no-goal variation, ie just exploration mode? |
Hey Marc and Peter, I actually really want the feature for "ignoring task failure" so that we can have an env that enables agent to learn from their failures and acquire the knowledge for avoid such failures. I can manually set this to be False, but it seems that I cannot change the negative score (-1) back to its original score on server side, even though the game can be continued. Any suggestions? More ideas on this issue: Thank you very much! :D |
The issue with focusing is that it's a critical part of the evaluation for most tasks, and that removing the hard criterion that the agent has to focus on the right thing would allow an agent to quickly game the tasks. For example, if the task is to measure something, and focus on box A if it's greater than some threshold, and focus on box B if it's less than some threshold, the agents will quickly learn to just focus on box A then B and get 100% task performance in two steps. Adding the focus mechanism was a method of (a) ensuring that the actions are intentional, and (b) providing a method of scoring that doesn't rely on natural language generation, but instead uses an analog of a forced choice task. If you remove the failure of the forced choice task, then it's like being able to select every multiple choice answer on a multiple choice test. :) |
Thank you very much for the explanations, Peter! :D Really appreciate it. |
That seems like an issue. The agent should be able to focus on the task object even if it is already in its inventory. |
Hmm, I think in this case the object it's focusing on isn't a task object, right @yuchenlin ? Could you copy/paste the playthrough, and we could figure out if there's an issue? |
Certain conditions set the isCompleted flag to be set -- for example, a negative score in the task, signifying task failure:
ScienceWorld/scienceworld/scienceworld.py
Line 328 in 10dd21a
Right now we set the isCompleted flag to be true, but if an agent doesn't check for this, it's still possible for it to continue to send commands to the environment. There is a report that this might let the tasks be gamed (especially the forced-choice tasks), as the agent might be able to take steps that ultimately further increase the score. We should likely modify the step() code so that if the isCompleted flag is set, it disallows further steps to be processed, to prevent any issues with agents reporting erroneously high scores in the future.
The text was updated successfully, but these errors were encountered: