This Python script processes images in a specified folder, sends them to the OpenAI API and saves the responses as text files.
- Image Encoding: Encodes images into base64 format for API requests.
- API Interaction: Sends images and prompts to the OpenAI API to generate descriptions or answers related to the images.
- MIME Type Handling: Determines the correct MIME type for various image formats (.jpg, .jpeg, .png, .gif, .bmp).
- Folder Management: Automatically creates necessary folders (
imagesandimages/answers) if they don't exist. - Error Handling: Includes basic error handling for file operations and API requests.
-
Python 3.8 or higher
-
OpenAI Python library (
pip install openai) -
python-dotenvlibrary (pip install python-dotenv) -
An OpenAI API key
-
A
.envfile in the root directory of the script containing the following:OPENAI_API_KEY=<your_openai_api_key> ROLE_PROMPT=<your_system_role_prompt> CONTENT_PROMPT=<your_user_content_prompt>OPENAI_API_KEY: Your OpenAI API key.ROLE_PROMPT: The role prompt (system prompt) to use for the OpenAI API.CONTENT_PROMPT: The content prompt (user prompt) to use for the OpenAI API.OPENAI_MODEL: The model to use for the OpenAI API requests (optional, defaults togpt-4o-mini).
-
Clone the repository:
git clone <repository_url> cd <repository_name>
-
Install dependencies:
pip install -r requirements.txt
(Assuming you have a
requirements.txtfile withopenaiandpython-dotenv) -
Create a
.envfile:- Create a file named
.envin the root directory of your project. - Add your OpenAI API key, role prompt, and content prompt to the
.envfile as described in the "Prerequisites" section.
- Create a file named
-
Place images in the
imagesfolder:- Put the images you want to process into the
imagesfolder, which will be created automatically in the same directory as the script if it doesn't exist.
- Put the images you want to process into the
-
Run the script:
python main.py
-
Find the responses:
- The script will process each image in the
imagesfolder. - For each image (e.g.,
image1.jpg), a corresponding text file (e.g.,image1_answer.txt) will be created in theimages/answersfolder containing the response from the OpenAI API.
- The script will process each image in the
- Takes an image path as input.
- Opens the image in binary read mode (
"rb"). - Reads the image content.
- Encodes the image data into a base64 string using
base64.b64encode(). - Decodes the base64 string to UTF-8 for compatibility with JSON.
- Returns the base64 encoded image string.
- Takes an image path and its extension as input.
- Defines a dictionary
mime_typesto map image file extensions to their corresponding MIME types. - Calls
encode_image()to get the base64 representation of the image. - Sends a request to the OpenAI API using
client.chat.completions.create().- Specifies the model as
"gpt-4o-mini". - Constructs the message with a system role and a user role.
- System role includes the
role_promptdefined in the.envfile. - User role includes the
content_promptand the image data. - The image data is formatted as an
image_urlwith the appropriate MIME type and the base64 encoded image.
- System role includes the
- Sets
max_tokensto 300 to limit the response length.
- Specifies the model as
- Prints the raw API response.
- Extracts the content of the response (the description or answer) from
response.choices[0].message.content. - Returns the extracted content.
- Takes a file path as input.
- Uses
mimetypes.guess_type()to determine the MIME type of the file based on its extension. - Returns
Trueif the MIME type starts with"image", indicating it's an image file; otherwise, returnsFalse.
- Takes a folder path as input.
- Iterates through each file in the specified folder using
os.listdir(). - For each file, checks if it's an image using
is_image(). - If it's an image:
- Extracts the file name and extension using
os.path.splitext(). - Constructs the full path to the image file.
- Constructs the path for the corresponding answer file in the
images/answersfolder. - Checks if an answer file already exists. If not:
- Calls
image_requests()to get the response from the OpenAI API. - Writes the response to the answer file.
- Prints a message indicating that the image was processed and the answer was saved.
- Calls
- Extracts the file name and extension using
- Takes a folder path as input.
- Checks if the folder exists using
os.path.exists(). - If the folder doesn't exist, it creates it using
os.makedirs().
- Ensures that the code inside this block is executed only when the script is run directly (not imported as a module).
- Gets the current working directory using
os.getcwd()and sets it asapp_path. - Constructs the path to the
imagesfolder. - Calls
check_folder()to create theimagesfolder and theimages/answerssubfolder if they don't exist. - Calls
process_images_files()to process the images in theimagesfolder. - Includes a
try...exceptblock to catch any exceptions during the process and print an error message.
- The script assumes you are using the
gpt-4o-minimodel. You can modify themodelparameter inimage_requests()if you want to use a different model. - The script currently has a hardcoded
max_tokensvalue of 300. You might need to adjust this based on your needs and the complexity of the expected responses. - Make sure to replace the placeholder values in the
.envfile with your actual API key and prompts. - This is a basic implementation. You can extend it further by adding features like batch processing, more sophisticated error handling, logging, and user interface elements.
If you have any feedback about the project, please let me know. I am always looking for ways to improve the user experience.