Skip to content

Commit b8df335

Browse files
committed
Add devcontainer for running the sample
1 parent d44081f commit b8df335

File tree

8 files changed

+184
-0
lines changed

8 files changed

+184
-0
lines changed

.devcontainer/Dockerfile

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Use a base image that supports Python.
2+
FROM mcr.microsoft.com/vscode/devcontainers/python:1-3.12-bullseye
3+
4+
# Install Python dependencies
5+
COPY requirements.txt /tmp/pip-tmp/
6+
RUN pip3 --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt \
7+
&& rm -rf /tmp/pip-tmp
8+
9+
# Install additional tools and dependencies
10+
RUN apt-get update \
11+
&& apt-get upgrade -y \
12+
&& export DEBIAN_FRONTEND=noninteractive \
13+
&& apt-get -y install --no-install-recommends unzip jq poppler-utils
14+
15+
# Install yq
16+
RUN wget -qO /usr/local/bin/yq "https://github.com/mikefarah/yq/releases/download/v4.25.1/yq_linux_amd64" \
17+
&& chmod +x /usr/local/bin/yq
18+
19+
# Default to bash shell
20+
ENV SHELL=/bin/bash \
21+
DOCKER_BUILDKIT=1
22+
23+
# Mount for docker-in-docker
24+
VOLUME [ "/var/lib/docker" ]

.devcontainer/devcontainer.json

+55
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
{
2+
"name": "Document Data Extraction Prompt Flow Evaluation",
3+
"build": {
4+
"dockerfile": "Dockerfile",
5+
"context": ".."
6+
},
7+
"features": {
8+
"ghcr.io/devcontainers/features/git:1": {
9+
"version": "latest",
10+
"ppa": "false"
11+
},
12+
"ghcr.io/devcontainers/features/azure-cli:1": {},
13+
"ghcr.io/azure/azure-dev/azd:0": {},
14+
"ghcr.io/devcontainers/features/git-lfs:1": {
15+
"version": "latest"
16+
},
17+
"ghcr.io/devcontainers/features/github-cli:1": {
18+
"version": "latest"
19+
},
20+
"ghcr.io/devcontainers/features/docker-in-docker:2": {
21+
"version": "latest"
22+
},
23+
"./local-features/setup": "latest"
24+
},
25+
"overrideFeatureInstallOrder": [
26+
"ghcr.io/devcontainers/features/git",
27+
"ghcr.io/devcontainers/features/azure-cli",
28+
"ghcr.io/azure/azure-dev/azd",
29+
"./local-features/setup",
30+
"ghcr.io/devcontainers/features/git-lfs",
31+
"ghcr.io/devcontainers/features/github-cli",
32+
"ghcr.io/devcontainers/features/docker-in-docker"
33+
],
34+
"remoteUser": "vscode",
35+
"containerUser": "vscode",
36+
"forwardPorts": [],
37+
"otherPortsAttributes": {
38+
"onAutoForward": "ignore"
39+
},
40+
"customizations": {
41+
"vscode": {
42+
"extensions": [
43+
"ms-python.vscode-pylance",
44+
"ms-python.python",
45+
"ms-python.debugpy",
46+
"ms-toolsai.jupyter",
47+
"tomoki1207.pdf",
48+
"ms-azuretools.vscode-bicep",
49+
"ms-vscode.vscode-node-azure-pack",
50+
"GitHub.vscode-pull-request-github",
51+
"prompt-flow.prompt-flow"
52+
]
53+
}
54+
}
55+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"id": "local-setup",
3+
"name": "Setup for Local Environment",
4+
"installsAfter": [
5+
"ghcr.io/devcontainers/features/azure-cli"
6+
],
7+
"install": {
8+
"app": "",
9+
"file": "install.sh"
10+
}
11+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
#!/usr/bin/env bash
2+
3+
USERNAME=${USERNAME:-"vscode"}
4+
5+
set -eux
6+
7+
if [ "$(id -u)" -ne 0 ]; then
8+
echo -e 'Script must be run as root. Use sudo, su, or add "USER root" to your Dockerfile before running this script.'
9+
exit 1
10+
fi
11+
12+
export DEBIAN_FRONTEND=noninteractive
13+
14+
sudo_if() {
15+
COMMAND="$*"
16+
if [ "$(id -u)" -eq 0 ] && [ "$USERNAME" != "root" ]; then
17+
su - "$USERNAME" -c "$COMMAND"
18+
else
19+
"$COMMAND"
20+
fi
21+
}
22+
23+
install_azcli_extension() {
24+
EXTENSION_NAME=$1
25+
26+
sudo_if "az extension add -n $EXTENSION_NAME"
27+
sudo_if "az extension update -n $EXTENSION_NAME"
28+
}
29+
30+
# Install the Azure CLI Machine Learning extension
31+
install_azcli_extension ml
32+
33+
# Register the Bash Kernel with Jupyter
34+
sudo_if "python3 -m bash_kernel.install"

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -168,3 +168,4 @@ cython_debug/
168168

169169
# Outputs
170170
*Outputs.json
171+
tests/**/*.jpg

.vscode/extensions.json

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{
2+
"recommendations": [
3+
"ms-python.vscode-pylance",
4+
"ms-python.python",
5+
"ms-python.debugpy",
6+
"ms-toolsai.jupyter",
7+
"tomoki1207.pdf",
8+
"ms-azuretools.vscode-bicep",
9+
"ms-vscode.vscode-node-azure-pack",
10+
"GitHub.vscode-pull-request-github",
11+
"prompt-flow.prompt-flow"
12+
]
13+
}

README.md

+44
Original file line numberDiff line numberDiff line change
@@ -1 +1,45 @@
11
# Document Data Extraction with GPT-4o and Evaluation using Prompt Flow
2+
3+
This sample demonstrates [how to use GPT-4o](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4o-and-gpt-4-turbo) to extract structured JSON data from PDF documents and evaluate the extracted data using the [Prompt Flow](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/flow-bulk-test-evaluation) feature in Azure AI Studio.
4+
5+
The approach builds on the understanding that [Azure OpenAI GPT-4o is effective at analyzing document images and extracting structured JSON objects](https://github.com/Azure-Samples/azure-openai-gpt-4-vision-pdf-extraction-sample) from them based on a provided extraction prompt including an expected output schema. The approach for evaluating document data extraction with Prompt Flow in Azure AI Studio highlights the following advantages:
6+
7+
- **Automated evaluation**: Custom Prompt Flow evaluations allow you to create an automated run which can evaluate multiple test cases in parallel, providing a comprehensive report and analysis of all the results in one place.
8+
- **Prompt engineering testing**: Similar to creating traditional test cases for code, you can create various extraction prompt scenarios to evaluate changes in the prompt's performance. This can include variations on the schema, GPT model parameters, and the rules for extracting data.
9+
- **Simplicity**: The approach using Prompt Flow narrows the scope of data extraction evaluation to discrete tasks in your AI application's workflow, making it easier to evaluate and improve the performance of your extraction prompts in a controlled environment before integrating the changes into your application.
10+
11+
The provided [Sample notebook](./Sample.ipynb) provides all the necessary steps to deploy the infrastructure and run the sample in your Azure subscription. It provides a dedicated learning environment for you to understand how to use GPT-4o for document data extraction and evaluate the extracted data using Prompt Flow in Azure AI Studio.
12+
13+
> [!IMPORTANT]
14+
> Running the evaluation prompt flow for each test case with GPT-4o accrues token-based charges as would be expected running this in application code. Images are converted into tokens by converting your high resolution images into separate 512px tiled images. For more information, see the [Azure OpenAI image token overview](https://learn.microsoft.com/en-us/azure/ai-services/openai/overview#image-tokens-gpt-4-turbo-with-vision).
15+
16+
## Getting Started
17+
18+
### Prerequisites
19+
20+
The sample repository comes with a [**Dev Container**](https://code.visualstudio.com/docs/remote/containers) that contains all the necessary tools and dependencies to run the sample. To use the Dev Container, you need to have the following tools installed on your local machine:
21+
22+
- Install [**Visual Studio Code**](https://code.visualstudio.com/download)
23+
- Install [**Docker Desktop**](https://www.docker.com/products/docker-desktop)
24+
- Install [**Remote - Containers**](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension for Visual Studio Code
25+
26+
### Run the sample notebook
27+
28+
Before running the notebook, open the project in Visual Studio Code and start the Dev Container. This will ensure that all the necessary dependencies are installed and the environment is ready to run the notebook.
29+
30+
Once the Dev Container is running, open the [**Sample.ipynb**](./Sample.ipynb) notebook and follow the instructions in the notebook to run the sample.
31+
32+
> [!NOTE]
33+
> The sample will guide you through the process of deploying the necessary infrastructure, deploying the Prompt Flows to the Azure AI Studio, and finally running the evaluation for the document data extraction.
34+
35+
### Clean up resources
36+
37+
After you have finished running the sample, you can clean up the resources using the following steps:
38+
39+
1. Run the `az group delete` command to delete the resource group and all the resources within it.
40+
41+
```bash
42+
az group delete --name <resource-group-name> --yes --no-wait
43+
```
44+
45+
The `<resource-group-name>` is the name of the resource group that can be found in the **resourceGroupInfo** JSON object in the [**EnvironmentOutputs.json**](./EnvironmentOutputs.json) file created after running the Sample notebook.

requirements.txt

+2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
azure-ai-resources==1.0.0.b8
22
azure-identity==1.17.1
3+
bash_kernel==0.9.3
34
ipykernel==6.29.4
45
notebook==7.2.1
6+
pdf2image==1.17.0
57
promptflow==1.13.0
68
promptflow-tools==1.4.0
79
python-dotenv==1.0.1

0 commit comments

Comments
 (0)