Skip to content

[Workers AI] Batch-api #21413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Apr 10, 2025
Merged
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
9c43fe2
CLI
daisyfaithauma Mar 31, 2025
3b60888
Initial documentation
daisyfaithauma Apr 4, 2025
2e5f403
Update src/content/docs/workers-ai/get-started/workers-wrangler.mdx
kodster28 Apr 4, 2025
5cebaeb
removed the why
daisyfaithauma Apr 7, 2025
3218dc0
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
27a9944
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
4d6d683
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
4377c68
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
8435e21
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
1e9e606
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
2911331
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
c335075
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
95838b8
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
508fde3
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 7, 2025
837b314
supported models
daisyfaithauma Apr 8, 2025
dc23ec7
rest API
daisyfaithauma Apr 8, 2025
0b01f90
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 8, 2025
fe3c647
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 9, 2025
bcb0d0f
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 9, 2025
ac4e9a3
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 9, 2025
cb9c4de
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 9, 2025
578b436
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 9, 2025
870d2fa
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 9, 2025
c1ff1e2
minor fixes
daisyfaithauma Apr 9, 2025
ac0ffe2
curl fix
daisyfaithauma Apr 9, 2025
44c162a
typescript fixes
daisyfaithauma Apr 10, 2025
cbb0802
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 10, 2025
707c17c
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma Apr 10, 2025
7712905
file restructure and added template
daisyfaithauma Apr 10, 2025
4b22ee5
deleted file
daisyfaithauma Apr 10, 2025
244d0ff
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma Apr 10, 2025
5ea7532
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma Apr 10, 2025
b85bae9
edits
daisyfaithauma Apr 10, 2025
3c1fc05
template link
daisyfaithauma Apr 10, 2025
c461b93
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma Apr 10, 2025
855ad6a
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma Apr 10, 2025
03088a8
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma Apr 10, 2025
7a9ed2f
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma Apr 10, 2025
b2b8ca9
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma Apr 10, 2025
1e190b6
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma Apr 10, 2025
741b59b
Update src/content/docs/workers-ai/features/batch-api/index.mdx
kodster28 Apr 10, 2025
1c8f8ce
Added beta badge
kodster28 Apr 10, 2025
f5ca4bc
Small updates
kodster28 Apr 10, 2025
50501e9
update
kodster28 Apr 10, 2025
02b681c
change title of code block
kodster28 Apr 10, 2025
7192946
Updated response
kodster28 Apr 10, 2025
eff4fbe
match order
kodster28 Apr 10, 2025
4bed317
Updates
kodster28 Apr 10, 2025
b6fe2d6
Remove unused components
kodster28 Apr 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions src/content/docs/workers-ai/features/batch-api/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
pcx_content_type: configuration
title: Asynchronous Batch API
sidebar:
order: 1
group:
badge: Beta
---

import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";

Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.

Batch processing is useful for large workloads such as summarization or embeddings when there is no human interaction. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if Cloudflare does have enough capacity at a given time.

When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.

You can use the Batch API by either creating and deploying a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/workers-binding/), using the [REST API](/workers-ai/features/batch-api/rest-api/) directly or by starting from a [template](https://github.com/craigsdennis/batch-please-workers-ai).

:::note[Note]

Ensure that the total payload is under 10 MB.

:::

## Demo application

If you want to get started quickly, click the button below:

[![Deploy to Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/craigsdennis/batch-please-workers-ai)

This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Asynchronous Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately.

## Supported Models

- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
- [@cf/baai/bge-small-en-v1.5](/workers-ai/models/bge-small-en-v1.5/)
- [@cf/baai/bge-base-en-v1.5](/workers-ai/models/bge-base-en-v1.5/)
- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/)
- [@cf/baai/bge-m3](/workers-ai/models/bge-m3/)
- [@cf/meta/m2m100-1.2b](/workers-ai/models/m2m100-1.2b/)
90 changes: 90 additions & 0 deletions src/content/docs/workers-ai/features/batch-api/rest-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
pcx_content_type: how-to
title: REST API
sidebar:
order: 4
---

If you prefer to work directly with the REST API instead of a [Cloudflare Worker](/workers-ai/features/batch-api/workers-binding/), below are the steps on how to do it:

## 1. Sending a Batch Request

Make a POST request using the following pattern. You can pass `external_reference` as a unique ID per-prompt that will be returned in the response.

```bash title="Sending a batch request" {11,15,19}
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
--header "Authorization: Bearer $API_TOKEN" \
--header 'Content-Type: application/json' \
--json '{
"requests": [
{
"query": "This is a story about Cloudflare",
"contexts": [
{
"text": "This is a story about an orange cloud",
"external_reference": "story1"
},
{
"text": "This is a story about a llama",
"external_reference": "story2"
},
{
"text": "This is a story about a hugging emoji",
"external_reference": "story3"
}
]
}
]
}'
```

```json output {4}
{
"result": {
"status": "queued",
"request_id": "768f15b7-4fd6-4498-906e-ad94ffc7f8d2",
"model": "@cf/baai/bge-m3"
},
"success": true,
"errors": [],
"messages": []
}
```

## 2. Retrieving the Batch Response

After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request:

```bash title="Retrieving a response"
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
--header "Authorization: Bearer $API_TOKEN" \
--header 'Content-Type: application/json' \
--json '{
"request_id": "<uuid>"
}'
```

```json output
{
"result": {
"responses": [
{
"id": 0,
"result": {
"response": [
{ "id": 0, "score": 0.73974609375 },
{ "id": 1, "score": 0.642578125 },
{ "id": 2, "score": 0.6220703125 }
]
},
"success": true,
"external_reference": null
}
],
"usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }
},
"success": true,
"errors": [],
"messages": []
}
```
118 changes: 118 additions & 0 deletions src/content/docs/workers-ai/features/batch-api/workers-binding.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
pcx_content_type: how-to
title: Workers Binding
sidebar:
order: 2
---

import {
Render,
PackageManagers,
TypeScriptExample,
WranglerConfig,
CURL,
} from "~/components";

You can use Workers Bindings to interact with the Batch API.

## Send a Batch request

Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests and the `queueRequest: true` property (which is what controlls queueing behavior).

:::note[Note]

Ensure that the total payload is under 10 MB.

:::

```ts {26} title="src/index.ts"
export interface Env {
AI: Ai;
}
export default {
async fetch(request, env): Promise<Response> {
const embeddings = await env.AI.run(
"@cf/baai/bge-m3",
{
requests: [
{
query: "This is a story about Cloudflare",
contexts: [
{
text: "This is a story about an orange cloud",
},
{
text: "This is a story about a llama",
},
{
text: "This is a story about a hugging emoji",
},
],
},
],
},
{ queueRequest: true },
);

return Response.json(embeddings);
},
} satisfies ExportedHandler<Env>;
```

```json output {4}
{
"status": "queued",
"model": "@cf/baai/bge-m3",
"request_id": "000-000-000"
}
```

You will get a response with the following values:

- **`status`**: Indicates that your request is queued.
- **`request_id`**: A unique identifier for the batch request.
- **`model`**: The model used for the batch inference.

Of these, the `request_id` is important for when you need to [poll the batch status](#poll-batch-status).

### Poll batch status

Once your batch request is queued, use the `request_id` to poll for its status. During processing, the API returns a status `queued` or `running` indicating that the request is still in the queue or being processed.

```typescript title=src/index.ts
export interface Env {
AI: Ai;
}

export default {
async fetch(request, env): Promise<Response> {
const status = await env.AI.run("@cf/baai/bge-m3", {
request_id: "000-000-000",
});

return Response.json(status);
},
} satisfies ExportedHandler<Env>;
```

```json output
{
"responses": [
{
"id": 0,
"result": {
"response": [
{ "id": 0, "score": 0.73974609375 },
{ "id": 1, "score": 0.642578125 },
{ "id": 2, "score": 0.6220703125 }
]
},
"success": true,
"external_reference": null
}
],
"usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }
}
```

When the inference is complete, the API returns a final HTTP status code of `200` along with an array of responses. Each response object corresponds to an individual input prompt, identified by an `id` that maps to the index of the prompt in your original request.