Skip to content

batch api doesn't work as expected. #1645

@Jeffwan

Description

@Jeffwan

🐛 Describe the bug

I feel it does work as expected.. I can not find jobs, the model name is actually wrong but executor seems process the inputs.

metadata server logs

INFO:     10.1.0.1:63376 - "GET /healthz HTTP/1.1" 200 OK
INFO:     10.1.0.1:63364 - "GET /readyz HTTP/1.1" 200 OK
2025-10-10 22:19:45,008 - batch.py:232 - create_batch - INFO - {"input_file_id": "102983c4-92ef-4de9-a03b-8e05066b16fd", "endpoint": "/v1/chat/completions", "completion_window": 86400, "session_id": "5b042868-88e6-44bb-9677-765d2a1c0fe5", "event": "Creating batch", "logger": "aibrix.metadata.api.v1.batch", "level": "info", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:45,009 - job_manager.py:549 - job_committed_handler - DEBUG - {"category": "_pending_jobs", "event": "Job added to a category", "logger": "aibrix.batch.job_manager", "level": "debug", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:45,010 - job_manager.py:557 - job_committed_handler - INFO - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "event": "Add job to scheduler", "logger": "aibrix.batch.job_manager", "level": "info", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:45,010 - batch.py:254 - create_batch - INFO - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "session_id": "5b042868-88e6-44bb-9677-765d2a1c0fe5", "event": "Batch created successfully", "logger": "aibrix.metadata.api.v1.batch", "level": "info", "timestamp": "2025-10-10 22:19:45 UTC"}
INFO:     127.0.0.1:52722 - "POST /v1/batches HTTP/1.1" 200 OK
2025-10-10 22:19:45,045 - scheduler.py:188 - schedule_next_job - INFO - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "event": "Job scheduler is scheduling job", "logger": "aibrix.batch.scheduler", "level": "info", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:45,046 - job_manager.py:777 - start_execute_job - DEBUG - {"old_category": "_pending_jobs", "new_category": "_in_progress_jobs", "event": "Job moved to a new category", "logger": "aibrix.batch.job_manager", "level": "debug", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:45,049 - job_driver.py:104 - execute_job - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "event": "Temp files not created, creating...", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:45,054 - job_driver.py:107 - execute_job - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "temp_output_file_id": "bdb4fc4d-a367-4d02-b7cf-cd3f9f536d8d", "temp_error_file_id": "f34f37e1-37b2-4365-b1b4-83a6f0483d6d", "event": "Confirmed temp files", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:45,055 - job_driver.py:161 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "opts": null, "event": "Start processing job", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:45,059 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 0, "requset": {"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."}], "max_tokens": 1000}, "_request_index": 0}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:45,059 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 0, "request_id": 0, "custom_id": "request-1", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:45 UTC"}
2025-10-10 22:19:46,060 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 0, "response": {"id": "c5388", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-0", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."}], "max_tokens": 1000}}, "custom_id": "request-1"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:46 UTC"}
2025-10-10 22:19:46,063 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 0, status: output:38b55bb1c391362a8ce09d621f734c5f", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:46 UTC"}
2025-10-10 22:19:46,063 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 0, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:46 UTC"}
2025-10-10 22:19:46,063 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 1, "last_line_no": 0, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:46 UTC"}
2025-10-10 22:19:46,064 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 1, "requset": {"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a creative writing assistant."}, {"role": "user", "content": "Write a short story about a robot discovering emotions."}], "max_tokens": 1000}, "_request_index": 1}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:46 UTC"}
2025-10-10 22:19:46,064 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 1, "request_id": 1, "custom_id": "request-2", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:46 UTC"}
2025-10-10 22:19:47,064 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 1, "response": {"id": "78cad", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-1", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a creative writing assistant."}, {"role": "user", "content": "Write a short story about a robot discovering emotions."}], "max_tokens": 1000}}, "custom_id": "request-2"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:47 UTC"}
2025-10-10 22:19:47,067 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 1, status: output:2b0419ab8dd65dbfa36207e6e591c5a6", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:47 UTC"}
2025-10-10 22:19:47,067 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 1, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:47 UTC"}
2025-10-10 22:19:47,067 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 2, "last_line_no": 1, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:47 UTC"}
2025-10-10 22:19:47,069 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 2, "requset": {"custom_id": "request-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a code reviewer."}, {"role": "user", "content": "Review this Python function: def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)"}], "max_tokens": 1000}, "_request_index": 2}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:47 UTC"}
2025-10-10 22:19:47,069 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 2, "request_id": 2, "custom_id": "request-3", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:47 UTC"}
2025-10-10 22:19:48,070 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 2, "response": {"id": "64276", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-2", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a code reviewer."}, {"role": "user", "content": "Review this Python function: def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)"}], "max_tokens": 1000}}, "custom_id": "request-3"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:48 UTC"}
2025-10-10 22:19:48,077 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 2, status: output:33d0f8a4c027f48e2adfde83125ad063", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:48 UTC"}
2025-10-10 22:19:48,078 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 2, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:48 UTC"}
2025-10-10 22:19:48,078 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 3, "last_line_no": 2, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:48 UTC"}
2025-10-10 22:19:48,081 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 3, "requset": {"custom_id": "request-4", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a cooking instructor."}, {"role": "user", "content": "How do I make perfect scrambled eggs?"}], "max_tokens": 1000}, "_request_index": 3}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:48 UTC"}
2025-10-10 22:19:48,081 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 3, "request_id": 3, "custom_id": "request-4", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:48 UTC"}
2025-10-10 22:19:49,082 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 3, "response": {"id": "9c63f", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-3", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a cooking instructor."}, {"role": "user", "content": "How do I make perfect scrambled eggs?"}], "max_tokens": 1000}}, "custom_id": "request-4"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:49 UTC"}
2025-10-10 22:19:49,084 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 3, status: output:851d461e0ba7a2bc8d19304b16a3e4dd", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:49 UTC"}
2025-10-10 22:19:49,085 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 3, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:49 UTC"}
2025-10-10 22:19:49,085 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 4, "last_line_no": 3, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:49 UTC"}
2025-10-10 22:19:49,085 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 4, "requset": {"custom_id": "request-5", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a travel advisor."}, {"role": "user", "content": "What are the top 5 must-see attractions in Tokyo for first-time visitors?"}], "max_tokens": 1000}, "_request_index": 4}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:49 UTC"}
2025-10-10 22:19:49,086 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 4, "request_id": 4, "custom_id": "request-5", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:49 UTC"}
INFO:     10.1.0.1:63462 - "GET /readyz HTTP/1.1" 200 OK
2025-10-10 22:19:50,086 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 4, "response": {"id": "b88e2", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-4", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a travel advisor."}, {"role": "user", "content": "What are the top 5 must-see attractions in Tokyo for first-time visitors?"}], "max_tokens": 1000}}, "custom_id": "request-5"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:50 UTC"}
2025-10-10 22:19:50,089 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 4, status: output:df08e4270092a29d3e1d271fcc9a2c3c", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:50 UTC"}
2025-10-10 22:19:50,089 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 4, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:50 UTC"}
2025-10-10 22:19:50,089 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 5, "last_line_no": 4, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:50 UTC"}
2025-10-10 22:19:50,090 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 5, "requset": {"custom_id": "request-6", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a fitness coach."}, {"role": "user", "content": "Design a 30-minute beginner workout routine that requires no equipment."}], "max_tokens": 1000}, "_request_index": 5}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:50 UTC"}
2025-10-10 22:19:50,090 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 5, "request_id": 5, "custom_id": "request-6", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:50 UTC"}
2025-10-10 22:19:51,089 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 5, "response": {"id": "e40fc", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-5", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a fitness coach."}, {"role": "user", "content": "Design a 30-minute beginner workout routine that requires no equipment."}], "max_tokens": 1000}}, "custom_id": "request-6"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:51 UTC"}
2025-10-10 22:19:51,092 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 5, status: output:5fbb0219c7f4c93a8afa22d015ee667b", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:51 UTC"}
2025-10-10 22:19:51,092 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 5, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:51 UTC"}
2025-10-10 22:19:51,093 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 6, "last_line_no": 5, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:51 UTC"}
2025-10-10 22:19:51,094 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 6, "requset": {"custom_id": "request-7", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a history teacher."}, {"role": "user", "content": "Explain the causes and consequences of the Industrial Revolution."}], "max_tokens": 1000}, "_request_index": 6}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:51 UTC"}
2025-10-10 22:19:51,094 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 6, "request_id": 6, "custom_id": "request-7", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:51 UTC"}
2025-10-10 22:19:52,094 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 6, "response": {"id": "0e4c9", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-6", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a history teacher."}, {"role": "user", "content": "Explain the causes and consequences of the Industrial Revolution."}], "max_tokens": 1000}}, "custom_id": "request-7"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:52 UTC"}
2025-10-10 22:19:52,097 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 6, status: output:0da7d46aff987e2f04cae0f284b31123", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:52 UTC"}
2025-10-10 22:19:52,097 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 6, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:52 UTC"}
2025-10-10 22:19:52,097 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 7, "last_line_no": 6, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:52 UTC"}
2025-10-10 22:19:52,098 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 7, "requset": {"custom_id": "request-8", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a language tutor."}, {"role": "user", "content": "Teach me the most important Spanish phrases for ordering food at a restaurant."}], "max_tokens": 1000}, "_request_index": 7}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:52 UTC"}
2025-10-10 22:19:52,098 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 7, "request_id": 7, "custom_id": "request-8", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:52 UTC"}
2025-10-10 22:19:53,099 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 7, "response": {"id": "6c415", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-7", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a language tutor."}, {"role": "user", "content": "Teach me the most important Spanish phrases for ordering food at a restaurant."}], "max_tokens": 1000}}, "custom_id": "request-8"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:53 UTC"}
2025-10-10 22:19:53,102 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 7, status: output:b0f30a6c8006398e5d8b251ccee8f275", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:53 UTC"}
2025-10-10 22:19:53,102 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 7, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:53 UTC"}
2025-10-10 22:19:53,102 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 8, "last_line_no": 7, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:53 UTC"}
2025-10-10 22:19:53,103 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 8, "requset": {"custom_id": "request-9", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a science explainer."}, {"role": "user", "content": "Why do leaves change color in autumn? Explain the biological process."}], "max_tokens": 1000}, "_request_index": 8}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:53 UTC"}
2025-10-10 22:19:53,103 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 8, "request_id": 8, "custom_id": "request-9", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:53 UTC"}
2025-10-10 22:19:54,104 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 8, "response": {"id": "cfd29", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-8", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a science explainer."}, {"role": "user", "content": "Why do leaves change color in autumn? Explain the biological process."}], "max_tokens": 1000}}, "custom_id": "request-9"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:54 UTC"}
2025-10-10 22:19:54,107 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 8, status: output:3e005aa24a747bf672bc46800e366d49", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:54 UTC"}
2025-10-10 22:19:54,107 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 8, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:54 UTC"}
2025-10-10 22:19:54,107 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 9, "last_line_no": 8, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:54 UTC"}
2025-10-10 22:19:54,107 - adapter.py:117 - read_job_next_input_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line_no": 9, "requset": {"custom_id": "request-10", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a financial advisor."}, {"role": "user", "content": "What are the basic principles of investing for a complete beginner?"}], "max_tokens": 1000}, "_request_index": 9}, "event": "Locked and will processing request in the job", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:54 UTC"}
2025-10-10 22:19:54,108 - job_driver.py:231 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "line": 9, "request_id": 9, "custom_id": "request-10", "event": "Executing job request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:54 UTC"}
INFO:     10.1.0.1:63472 - "GET /healthz HTTP/1.1" 200 OK
INFO:     10.1.0.1:63468 - "GET /readyz HTTP/1.1" 200 OK
2025-10-10 22:19:55,109 - job_driver.py:249 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 9, "response": {"id": "06442", "error": null, "response": {"status_code": 200, "request_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b-9", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a financial advisor."}, {"role": "user", "content": "What are the basic principles of investing for a complete beginner?"}], "max_tokens": 1000}}, "custom_id": "request-10"}, "event": "Got request response", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,111 - adapter.py:211 - write_job_output_data - DEBUG - {"event": "Stored result for job af7a30e8-7f65-4d60-b4be-d3773c5c431b request 9, status: output:6117aa0bb5c3f60eebe1ef62dbda68fa", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,112 - job_driver.py:259 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "request_id": 9, "event": "Job request executed", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,112 - job_driver.py:278 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "next_unexecuted": 10, "last_line_no": 9, "event": "Confirmed next request", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,112 - job_driver.py:295 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "total": 10, "next_unexecuted": -1, "event": "Confirmed total requests", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,112 - job_driver.py:305 - execute_worker - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "total": 10, "state": "finalizing", "event": "Worker completed, job state:", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,114 - adapter.py:229 - finalize_job_output_data - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "prefix": "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/", "keys": ["batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/0", "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/1", "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/2", "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/3", "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/4", "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/5", "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/6", "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/7", "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/8", "batch:af7a30e8-7f65-4d60-b4be-d3773c5c431b:done/9"], "event": "Metastore keys found during job finalizing", "logger": "aibrix.batch.storage.adapter", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,114 - adapter.py:259 - finalize_job_output_data - INFO - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "launched": 10, "total": 10, "event": "Finalizing job output data using metastore keys", "logger": "aibrix.batch.storage.adapter", "level": "info", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,141 - job_driver.py:129 - execute_job - DEBUG - {"job_id": "af7a30e8-7f65-4d60-b4be-d3773c5c431b", "event": "Completed job", "logger": "aibrix.batch.job_driver", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,143 - job_manager.py:1002 - apply_job_changes - DEBUG - {"event": "Job status synced to job entity manager", "logger": "aibrix.batch.job_manager", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,144 - job_manager.py:642 - job_updated_handler - DEBUG - {"old_state": "finalizing", "new_state": "finalized", "finalizing_needed": false, "event": "job_updated_handler passed state transition", "logger": "aibrix.batch.job_manager", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}
2025-10-10 22:19:55,144 - job_manager.py:659 - job_updated_handler - DEBUG - {"old_category": "_in_progress_jobs", "new_category": "_done_jobs", "event": "Job moved to a new category", "logger": "aibrix.batch.job_manager", "level": "debug", "timestamp": "2025-10-10 22:19:55 UTC"}

Steps to Reproduce

  1. create file
  2. create batch
    curl -X POST http://${ENDPOINT}/v1/batches \
      -H "Content-Type: application/json" \
      -d '{
        "input_file_id": "102983c4-92ef-4de9-a03b-8e05066b16fd",
        "endpoint": "/v1/chat/completions",
        "completion_window": "24h"
      }'

{"id":"af7a30e8-7f65-4d60-b4be-d3773c5c431b","object":"batch","endpoint":"/v1/chat/completions","errors":null,"input_file_id":"102983c4-92ef-4de9-a03b-8e05066b16fd","completion_window":"24h","status":"created","output_file_id":null,"error_file_id":null,"created_at":1760134785,"in_progress_at":null,"expires_at":1760221185,"finalizing_at":null,"completed_at":null,"failed_at":null,"expired_at":null,"cancelling_at":null,"cancelled_at":null,"request_counts":null,"metadata":null}%

Expected behavior

it should fail actually, since we do not have the model endpoint

Environment

nightly

Metadata

Metadata

Assignees

Labels

area/batchkind/bugSomething isn't workingpriority/critical-urgentHighest priority. Must be actively worked on as someone's top priority right now.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions