Skip to content

Switch OpenAI Image Generation to use new gpt-image-1 model #897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 11, 2025

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Apr 23, 2025

Description of the Change

OpenAI recently introduced a new image generation model into their API, gpt-image-1, which can be used as a replacement for the dall-e-3 and dall-e-2 models. We upgraded from dall-e-2 to dall-e-3 in #717 and seems worthwhile to now upgrade to gpt-image-1, as the quality of the images generated is better (and pricing is similar).

There are some differences between these models that we need to account for:

  1. Different quality options: New: auto, low, medium, high. Old: hd, standard
  2. Different size options; New: 1024x1024, 1536x1024, 1024x1536. Old: 1024x1024, 1792x1024, 1024x1792
  3. New model doesn't support the style options at all (vivid and natural)
  4. New model only returns base64 encoded images, not an image URL

For new users, this won't matter but for existing users, there's code in place that will automatically use the new options when needed. As an example, for an existing user that has an image size set of 1792x1024, when they generate an image, it will force them to use 1024x1024. They'll need to go to the Feature settings screen to update their defaults.

Also worth noting I've updated the use of DALLE to be more general (OpenAI Images) in multiple places, as this makes more sense to what we're doing. I did not deprecate anything and I'm personally fine with that as it's not likely someone is directly using the Provider class. I did leave things like settings alone so none of those configurations will be lost.

How to test the Change

  1. Turn on and configure the Image Generation Feature
  2. Ensure you can generate images in the stand-alone page and within the media modal
  3. Ensure the various settings work as expected
  4. If desired, configure an environment using the current released version of the plugin and then update to this PR. Test and ensure image generation still works even if using old settings

Changelog Entry

Added - Support for the new OpenAI gpt-image-1 image generation model
Developer - Rename the DallE Provider class to Images. If you directly extend that class yourself, you'll need to update your code to account for this. Also updated a handful of other references to DALLE to Images

Credits

Props @dkotter

Checklist:

@dkotter dkotter self-assigned this Apr 23, 2025
@github-actions github-actions bot added this to the 3.4.0 milestone Apr 23, 2025

class DallE extends Provider {
class Images extends Provider {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am changing the class name here but don't think we need to worry about deprecating this as it's unlikely anyone was directly extending this class

@dkotter dkotter marked this pull request as ready for review June 5, 2025 17:21
@dkotter dkotter requested review from jeffpaul and a team as code owners June 5, 2025 17:21
@github-actions github-actions bot added the needs:code-review This requires code review. label Jun 5, 2025
@dkotter
Copy link
Collaborator Author

dkotter commented Jun 5, 2025

Couple things to keep in mind here:

  1. As noted here, your account needs to be verified to use the gpt-image-1 model. There may be some sites using image generation that aren't verified and will need to complete that before things will work, I think it's worth the minor pain there as the images generated are a much higher quality
  2. You can get way more specific in the image generation prompt and the more specific you get, the longer it will take for the image to be generated. We currently have a timeout of 60 seconds on the request and in my testing, that seemed okay. But we may want to consider increasing that limit. Also a chance some sites have server-level limits (max execution time) that could be run into

As an example complex prompt, I pulled this from OpenAI's cookbook to test. Took about 45 seconds for me locally:

Render a realistic image of this character:
Blobby Alien Character Spec Name: Glorptak (or nickname: "Glorp")
Visual Appearance Body Shape: Amorphous and gelatinous. Overall silhouette resembles a teardrop or melting marshmallow, shifting slightly over time. Can squish and elongate when emotional or startled.
Material Texture: Semi-translucent, bio-luminescent goo with a jelly-like wobble. Surface occasionally ripples when communicating or moving quickly.
Color Palette:
- Base: Iridescent lavender or seafoam green
- Accents: Subsurface glowing veins of neon pink, electric blue, or golden yellow
- Mood-based color shifts (anger = dark red, joy = bright aqua, fear = pale gray)
Facial Features:
- Eyes: 3–5 asymmetrical floating orbs inside the blob that rotate or blink independently
- Mouth: Optional—appears as a rippling crescent on the surface when speaking or emoting
- No visible nose or ears; uses vibration-sensitive receptors embedded in goo
- Limbs: None by default, but can extrude pseudopods (tentacle-like limbs) when needed for interaction or locomotion. Can manifest temporary feet or hands.
Movement & Behavior Locomotion:
- Slides, bounces, and rolls.
- Can stick to walls and ceilings via suction. When scared, may flatten and ooze away quickly.
Mannerisms:
- Constant wiggling or wobbling even at rest
- Leaves harmless glowing slime trails
- Tends to absorb nearby small objects temporarily out of curiosity

@jeffpaul
Copy link
Member

jeffpaul commented Jun 5, 2025

Also worth noting I've updated the use of DALLE to be more general (OpenAI Images) in multiple places, as this makes more sense to what we're doing. I did not deprecate anything and I'm personally fine with that as it's not likely someone is directly using the Provider class. I did leave things like settings alone so none of those configurations will be lost.

That all sounds good to me.

@jeffpaul
Copy link
Member

jeffpaul commented Jun 5, 2025

As noted here, your account needs to be verified to use the gpt-image-1 model. There may be some sites using image generation that aren't verified and will need to complete that before things will work, I think it's worth the minor pain there as the images generated are a much higher quality

Do we have any handling (or even a way with the OAI API) to detect someone's not verified and to throw a notice for them to complete that to ensure the image gen features continue to work as expected?

@jeffpaul
Copy link
Member

jeffpaul commented Jun 5, 2025

You can get way more specific in the image generation prompt and the more specific you get, the longer it will take for the image to be generated. We currently have a timeout of 60 seconds on the request and in my testing, that seemed okay. But we may want to consider increasing that limit. Also a chance some sites have server-level limits (max execution time) that could be run into

Perhaps as part of an error messaging in-context we add a simple note that if they're seeing timeouts beyond 60 seconds that they consider filtering at the app or server level to adjust that limitation?

@dkotter
Copy link
Collaborator Author

dkotter commented Jun 5, 2025

Do we have any handling (or even a way with the OAI API) to detect someone's not verified and to throw a notice for them to complete that to ensure the image gen features continue to work as expected?

Not that I can find (as far as an API that tells us that). We could try and make an image generation request and use that response but not sure I love the idea of automatically doing that (as it costs them money).

Perhaps as part of an error messaging in-context we add a simple note that if they're seeing timeouts beyond 60 seconds that they consider filtering at the app or server level to adjust that limitation?

If the request times out, an error message will already show (though that's just the HTTP request timing out, not a PHP-level timeout, which I don't think we can capture).

@jeffpaul
Copy link
Member

jeffpaul commented Jun 9, 2025

Not that I can find (as far as an API that tells us that). We could try and make an image generation request and use that response but not sure I love the idea of automatically doing that (as it costs them money).

I agree.

@jeffpaul jeffpaul requested review from Sidsector9 and removed request for jeffpaul June 9, 2025 15:05
@github-actions github-actions bot added the needs:refresh This requires a refreshed PR to resolve. label Jun 9, 2025
@github-actions github-actions bot removed the needs:refresh This requires a refreshed PR to resolve. label Jun 9, 2025
Copy link
Member

@Sidsector9 Sidsector9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, the PR tested well for me 👍

@dkotter dkotter merged commit 4eb0d6b into develop Jun 11, 2025
19 checks passed
@dkotter dkotter deleted the feature/new-openai-image-model branch June 11, 2025 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:code-review This requires code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants