Python program to transform Shopify bulk output object set into a format suitable to load into a Bloomreach Discovery Catalog.
Example Usage:
python3 src/main.py \
--shopify-url="stg-store.myshopify.com" \
--shopify-pat="shpat_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
--br-environment="staging" \
--br-account-id="6490" \
--br-catalog-name="test_us_feed" \
--br-api-token="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
--output-dir="/Users/user/output"
The program will:
- Submit a Bulk Operation job via GraphQL to the shopify store using a PAT token that has sufficient privileges
- If there is a current Bulk Operation job already running, the script will continue to retry until it can successfully submit a job
- Poll for the completion of the Bulk Operation job to retrieve the URL of the jsonl file that contains a dump of all product, variant, collection, and metafield data needed
- Transform that file into an additional file that aggregates in memory the individual product, variant, collection, and metafield data into a single product model
- Transform that single product model into an additional generic Bloomreach product model
- Transform the generic model into an additional specific business logic model
- Transform the final product modoel into an additional Bloomreach patch file that can be used as a patch for a full feed
- Submit the patch file as a full feed via the Discovery Feed API
Additonal details about the transform phases
-
transforms Shopify bulk output of products and their associated objects (metafields, collections, variants, variants metafields) into a single aggregated product record.
-
transforms Shopify aggregated products into Bloomreach Product model with no reserved attribute mappings, apart from setting product and variant identifiers. The product and variant identifiers may be specified prior to running, however, they default to
handle
for the product identifier andsku
for the variant identifier. All other shopify properties are prefixed with a namespace to prevent collisions with any Bloomreach reserved attributes. Product properties are prefixed withsp.
, Product metafield properties are prefixed withspm.
, Variant properties are prefixed withsv.
, and Variant metafield properties are prefixed withsvm.
. This output may be loaded directly into a Bloomreach Discovery catalog as is. -
transforms generic products with custom logic specific to an individual catalog. This is more or less a place holder script to add any transformations necessary that need to be made on top of the generic product transforms. For instance, if shopify product tags are used in a special way, custom transforms can be created. Also, generic transforms can be overriden should it be necessary for a catalog specific behavior. The values of the shopify prefixed attributes should not be modified.
-
transforms bloomreach products into a Bloomreach Discovery catalog patch, where each patch operation is an
Add Product
operation. This patch can be used as a Full or Delta feed data source either directly in API request or SFTP.
You will need an access token from Shopify.
To get the access token, setup a custom app with required scope to read_products.
Create and install a custom app Generate access tokens for custom apps in the Shopify admin
Exact steps:
- Go to store admin interface
- Click Settings in bottom of left nav
- Click Apps and sales channels in the left nav
- Click Develop apps
- Click Create an app
- Give it a name like
Bloomreach Test
- Click Configure Admin API scopes
- Select
read_products
as a scope - Click Save
- Click Install app and confirm
- Click Reveal token once and use the value as the access token for this reference code.
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -r requirements.txt
- Python3 (3.8 or >)
- jsonlines
- ShopifyAPI
- polling
To run tests and work with jsonl files:
There is a template_env
file in the root dir that contains required environment variables.
Copy this file, rename it something ending in .env, and add in the empty variable values.
This file can be combined with the run_feed.sh
bash script that will run the app with the given environment variables.
Don't check in your environment file as it contains sensitive keys.
If it has a suffix of .env
, it will be ignored via the .gitignore
file.
BASH_ENV=staging.env ./run_feed.sh
# build image and give it a tag
docker build -t shopify-to-bloomreach .
# source in environment variables
. staging.env
# run image with a mounted volume (pass --rm if you want to auto cleanup)
docker run --env-file docker.env.list --env BR_OUTPUT_DIR=/feed_data --mount source=feed_data,target=/feed_data shopify-to-bloomreach
# view pretty formatted patch.jsonl
jq -C ' . ' patch.jsonl | less -R
# flatten out patch for easy grepping and analysis (memory intensive)
jq -s ' . ' patch.jsonl | gron | sort > patch.gron.sorted.txt
# vim diff is handy to do small, non-semantic difference check
vimdiff <(cat patch.expected.selected.jsonl | jq -s . | gron | sort) <(cat patch.selected.jsonl | jq -s . | gron | sort)
The below commands assume you've already created environment files for each of the 4 environments based off the template_env
file.
BASH_ENV=test_staging.env ./run_feed.sh
BASH_ENV=test_production.env ./run_feed.sh
WARNING, OPERATES ON REAL ACCOUNT
BASH_ENV=staging.env ./run_feed.sh
WARNING, OPERATES ON REAL ACCOUNT
BASH_ENV=production.env ./run_feed.sh