Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Modal #237

Merged
merged 3 commits into from
Jan 11, 2025
Merged

Add support for Modal #237

merged 3 commits into from
Jan 11, 2025

Conversation

azliu0
Copy link
Contributor

@azliu0 azliu0 commented Oct 23, 2024

What does this implement/fix? Explain your changes.

Adds support for running the evaluation harness on Modal Labs.

Adding a --modal flag to the normal run command will run the instances remotely rather than locally. The current images resolves ~300/300 tasks on Lite and ~495/500 tasks on Verified.

Benefits of running this way:

  • users don't have to set up their own infrastructure, everything runs in the cloud ☁️
  • everything runs in parallel, so building and running larger numbers of instances is effortless.
  • with full image caching, Lite runs in ~14 minutes (excluding sympy__sympy-11870, only ~7 minutes). Verified runs in ~7 minutes
  • Modal's image caching is built-in and follows essentially the same layered abstraction as the images already in the repo

@wenting-zhao
Copy link
Contributor

I am curious if Modal has the plan to fix the remaining 10/300?

justinchiu-cohere added a commit to justinchiu-cohere/SWE-bench that referenced this pull request Dec 18, 2024
@pawalt
Copy link

pawalt commented Jan 10, 2025

@wenting-zhao as of the most commit, we achieve 300/300 on lite and 495/500 on verified.

@carlosejimenez carlosejimenez self-assigned this Jan 11, 2025
@carlosejimenez carlosejimenez merged commit 25f5620 into swe-bench:main Jan 11, 2025
1 check passed
@carlosejimenez
Copy link
Member

Thanks @azliu0 and @pawalt!
Just tested it out and it's great!

@john-b-yang
Copy link
Member

Support added back in d1ede7d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants