Skip to content

Commit 60e8730

Browse files
authored
Test WASM Translations in CI (#927)
* Consolidate .gitignore file to inference directory This commit consolidates all preexisting inference-related gitignore directives into the `.gitignore` file in the `inference` directory. * Add eslint to project This commit adds an eslint config for the WASM tests. I think this overall adheres to Mozilla's code-format preferences and is good enough for a first pass. I have found the linting config to be a bit finnicky, so my preference would be to improve the linting in a follow-up, if needed/desired at all. * Add new lint tasks for eslint This commit adds new tasks to run the eslint linter on relevant JavaScript files in the project, as well as hooks up the tasks to a kind.yml file to run in CI. * Rename bergamot-translator directive to bergamot-translator-source The name of the file that we use in the mozilla-unified source tree is `bergamot-translator.js`, but the name of the file generated here is `bergamot-translator-worker.js`. I wanted the names to match, so I am renaming the CMake directives that dictate the generated file's names such that the generated WASM code will be `bergamot-translator.js`. This is the first step of that process. * Rename bergamot-translator-worker directive to bergamot-tarnslator This is the second step of the previous commit, which renames the WASM-related directives to simply be `bergamot-translator`. This results in the generated JavaScript file being `bergamot-translator.js` instead of `bergamot-translator-worker.js`. * Move thread-count default logic into build_wasm.py Given the issue where building the WASM within the Docker container fails on multiple threads only if the host operating system is macOS, I have moved that default logic within the script itself. The default can still be overridden by passing the `-j` flag, but rather than call sites having to know to do the "right" thing for macOS, I'm making it the default intrinsic behavior within the script. * Prepare Bergamot module for mozilla-unified in build-bergamot.py This moves the logic that is currently in the mozilla-unified tree, of adding the licensing, and wrapping the generated WASM JavaScript module in a function. This will be paired with a downstream-pr that removes this step on the mozilla-unified end. * Add Typescript bindings the for Bergamot This commit adds some Typescript bindings to the test directory that match the generated JS. I spent some time trying to get emscripten to generate these automatically, but I gave up on my time-boxed effort. * Add support for `git-lfs` to base docker image This commit adds support for pulling files via `git-lfs` to the Dockerfile for the base docker image. In order to pull the files, we need to install `git-lfs` from apt, but also add github.com to the list of known ssh hosts. * Add a subset of models for testing using `git-lfs` This commit adds the gzipped artifacts for * `enes` * `enfr` * `esen` * `fren` These are used for testing for the moment, but I view this as a temporary solution that is good enough for this PR. In the future, we will need to merge the `firefox-translations-models` repository here. * Add test-wasm.py script This commit adds a Python script for testing the WASM, which runs the WASM build script (if needed), and then invokes the test runner. * Extract test models from archives in test-wasm.py This commit modifies the new `test-wasm.py` script to extract the model artifacts from their gzipped files in the `models` test directory. The non-gzip artifacts are ignored in the .gitignore, as well as removed in the clean script. * Copy WASM build artifacts to test directory in test-wasm.py This commit taks the WASM artifacts generated by the build script and copies them to a directory for use in tests. * Produce hash of generated JS in test-wasm.py This commit computes a hash of the generated JavaScript, since the test runner adds it to the worker global scope using `eval`. This ensures that our test runner will only `eval` the intended script. * Add Web Worker simulation infrastructure This commit adds a minimal API surface of the WorkerGlobalScope API functionality that we use for Translations within Firefox, wrapping the Node.js worker_threads equivalent behavior underneath. This allows us to test the generated code in a Node.js environment with the same API calls that we use in Firefox. * Add Translations Engine and worker implementation This commit adds a simplified and minimal implementation of our Translations Engine from the mozilla-unified source tree, which is capable of starting a web-worker translator between a given language pair and translating a single message at a time. * Add test cases for current translations WASM bindings Adds test cases that test the current translation functionality end-to-end, including plaint-text translations and HTML translations.
1 parent db60f54 commit 60e8730

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2796
-110
lines changed

.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
inference/**/*.gz filter=lfs diff=lfs merge=lfs -text

Taskfile.yml

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -101,16 +101,19 @@ tasks:
101101
102102
inference-test-wasm:
103103
desc: Run inference build-wasm JS tests.
104-
deps:
105-
- task: inference-build-wasm
106-
vars:
107-
# When the host system is macOS, the WASM build fails when
108-
# building with multiple threads in the Docker container.
109-
# If the host system is macOS, pass -j 1.
110-
CLI_ARGS: '{{if eq (env "HOST_OS") "Darwin"}}-j 1{{end}}'
111104
cmds:
112105
- >-
113-
cd inference/wasm/tests && npm install && npm run test
106+
./inference/scripts/test-wasm.py {{.CLI_ARGS}}
107+
108+
lint-eslint:
109+
desc: Checks the styling of the JS code with eslint.
110+
cmds:
111+
- cd ./inference/wasm/tests && npm install && npm run lint
112+
113+
lint-eslint-fix:
114+
desc: Fixes the styling of the JS code with eslint.
115+
cmds:
116+
- cd ./inference/wasm/tests && npm install && npm run lint:fix
114117

115118
lint-black:
116119
desc: Checks the styling of the Python code with Black.
@@ -141,12 +144,14 @@ tasks:
141144
lint-fix:
142145
desc: Fix all automatically fixable errors. This is useful to run before pushing.
143146
cmds:
147+
- task: lint-eslint-fix
144148
- task: lint-black-fix
145149
- task: lint-ruff-fix
146150

147151
lint:
148152
desc: Run all available linting tools.
149153
cmds:
154+
- task: lint-eslint
150155
- task: lint-black
151156
- task: lint-ruff
152157

inference/.gitignore

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,18 @@ compile_commands.json
1515
CTestTestfile.cmake
1616
_deps
1717

18+
# Build paths
19+
build
20+
build-local
21+
build-native
22+
build-wasm
1823

19-
/build
20-
/build-local
21-
/build-native
22-
/build-wasm
23-
models
24-
wasm/test_page/node_modules
25-
wasm/module/worker/bergamot-translator-worker.*
26-
wasm/module/browsermt-bergamot-translator-*.tgz
24+
# WASM
25+
wasm/tests/generated
26+
wasm/tests/models/**/*.bin
27+
wasm/tests/models/**/*.spm
28+
wasm/tests/node_modules
29+
wasm/tests/.vitest-reports
2730

2831
# VSCode
2932
.vscode

inference/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ if(COMPILE_WASM)
162162
-sEXPORTED_FUNCTIONS=[_int8PrepareAFallback,_int8PrepareBFallback,_int8PrepareBFromTransposedFallback,_int8PrepareBFromQuantizedTransposedFallback,_int8PrepareBiasFallback,_int8MultiplyAndAddBiasFallback,_int8SelectColumnsOfBFallback]
163163
# Necessary for mozintgemm linking. This prepares the `wasmMemory` variable ahead of time as
164164
# opposed to delegating that task to the wasm binary itself. This way we can link MozIntGEMM
165-
# module to the same memory as the main bergamot-translator module.
165+
# module to the same memory as the main bergamot-translator-source module.
166166
-sIMPORTED_MEMORY=1
167167
# Dynamic execution is either frowned upon or blocked inside browser extensions
168168
-sDYNAMIC_EXECUTION=0

inference/scripts/build-wasm.py

Lines changed: 82 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,12 @@
1818
MARIAN_PATH = os.path.join(THIRD_PARTY_PATH, "browsermt-marian-dev")
1919
EMSDK_PATH = os.path.join(THIRD_PARTY_PATH, "emsdk")
2020
EMSDK_ENV_PATH = os.path.join(EMSDK_PATH, "emsdk_env.sh")
21-
WASM_PATH = os.path.join(BUILD_PATH, "bergamot-translator-worker.wasm")
22-
JS_PATH = os.path.join(BUILD_PATH, "bergamot-translator-worker.js")
21+
WASM_ARTIFACT = os.path.join(BUILD_PATH, "bergamot-translator.wasm")
22+
JS_ARTIFACT = os.path.join(BUILD_PATH, "bergamot-translator.js")
2323
PATCHES_PATH = os.path.join(INFERENCE_PATH, "patches")
2424
BUILD_DIRECTORY = os.path.join(INFERENCE_PATH, "build-wasm")
25-
GEMM_SCRIPT = os.path.join(INFERENCE_PATH, "wasm", "patch-artifacts-import-gemm-module.sh")
25+
WASM_PATH = os.path.join(INFERENCE_PATH, "wasm")
26+
GEMM_SCRIPT = os.path.join(WASM_PATH, "patch-artifacts-import-gemm-module.sh")
2627
DETECT_DOCKER_SCRIPT = os.path.join(SCRIPTS_PATH, "detect-docker.sh")
2728

2829
patches = [
@@ -95,6 +96,56 @@ def revert_git_patch(repo_path, patch_path):
9596
subprocess.check_call(["git", "apply", "-R", "--reject", patch_path], cwd=PROJECT_ROOT_PATH)
9697

9798

99+
def prepare_js_artifact():
100+
"""
101+
Prepares the Bergamot JS artifact for use in Gecko by adding the proper license header
102+
to the file, including helpful memory-growth logging, and wrapping the generated code
103+
in a single function that both takes and returns the Bergamot WASM module.
104+
"""
105+
# Start with the license header and function wrapper
106+
source = (
107+
"\n".join(
108+
[
109+
"/* This Source Code Form is subject to the terms of the Mozilla Public",
110+
" * License, v. 2.0. If a copy of the MPL was not distributed with this",
111+
" * file, You can obtain one at http://mozilla.org/MPL/2.0/. */",
112+
"",
113+
"function loadBergamot(Module) {",
114+
"",
115+
]
116+
)
117+
+ "\n"
118+
)
119+
120+
# Read the original JS file and indent its content
121+
with open(JS_ARTIFACT, "r", encoding="utf8") as file:
122+
for line in file:
123+
source += " " + line
124+
125+
# Close the function wrapper
126+
source += "\n return Module;\n}"
127+
128+
# Use the Module's printing
129+
source = source.replace("console.log(", "Module.print(")
130+
131+
# Add instrumentation to log memory size information
132+
source = source.replace(
133+
"function updateGlobalBufferAndViews(buf) {",
134+
"""
135+
function updateGlobalBufferAndViews(buf) {
136+
const mb = (buf.byteLength / 1_000_000).toFixed();
137+
Module.print(
138+
`Growing wasm buffer to ${mb}MB (${buf.byteLength} bytes).`
139+
);
140+
""",
141+
)
142+
143+
print(f"\n📄 Updating {JS_ARTIFACT} in place")
144+
# Write the modified content back to the original file
145+
with open(JS_ARTIFACT, "w", encoding="utf8") as file:
146+
file.write(source)
147+
148+
98149
def build_bergamot(args: Optional[list[str]]):
99150
if args.clobber and os.path.exists(BUILD_PATH):
100151
shutil.rmtree(BUILD_PATH)
@@ -127,7 +178,18 @@ def run_shell(command):
127178
print("\n🏃 Running CMake for Bergamot\n")
128179
run_shell(f"emcmake cmake -DCOMPILE_WASM=on -DWORMHOLE=off {flags} {INFERENCE_PATH}")
129180

130-
cores = args.j if args.j else multiprocessing.cpu_count()
181+
if args.j:
182+
# If -j is specified explicitly, use it.
183+
cores = args.j
184+
elif os.getenv("HOST_OS") == "Darwin":
185+
# There is an issue building with multiple cores when the Linux Docker container is
186+
# running on a macOS host system. If the Docker container was created with HOST_OS
187+
# set to Darwin, we should use only 1 core to build.
188+
cores = 1
189+
else:
190+
# Otherwise, build with as many cores as we have.
191+
cores = multiprocessing.cpu_count()
192+
131193
print(f"\n🏃 Building Bergamot with emmake using {cores} cores\n")
132194

133195
try:
@@ -142,14 +204,14 @@ def run_shell(command):
142204
subprocess.check_call(["bash", GEMM_SCRIPT, BUILD_PATH])
143205

144206
print("\n✅ Build complete\n")
145-
print(" " + JS_PATH)
146-
print(" " + WASM_PATH)
207+
print(" " + JS_ARTIFACT)
208+
print(" " + WASM_ARTIFACT)
147209

148210
# Get the sizes of the build artifacts.
149-
wasm_size = os.path.getsize(WASM_PATH)
211+
wasm_size = os.path.getsize(WASM_ARTIFACT)
150212
gzip_size = int(
151213
subprocess.run(
152-
f"gzip -c {WASM_PATH} | wc -c",
214+
f"gzip -c {WASM_ARTIFACT} | wc -c",
153215
check=True,
154216
shell=True,
155217
capture_output=True,
@@ -158,6 +220,8 @@ def run_shell(command):
158220
print(f" Uncompressed wasm size: {to_human_readable(wasm_size)}")
159221
print(f" Compressed wasm size: {to_human_readable(gzip_size)}")
160222

223+
prepare_js_artifact()
224+
161225
finally:
162226
print("\n🖌️ Reverting the source code patches\n")
163227
for repo_path, patch_path in patches[::-1]:
@@ -167,6 +231,16 @@ def run_shell(command):
167231
def main():
168232
args = parser.parse_args()
169233

234+
if (
235+
os.path.exists(BUILD_PATH)
236+
and os.path.isdir(BUILD_PATH)
237+
and os.listdir(BUILD_PATH)
238+
and not args.clobber
239+
):
240+
print(f"\n🏗️ Build directory {BUILD_PATH} already exists and is non-empty.\n")
241+
print(" Pass the --clobber flag to rebuild if desired.")
242+
return
243+
170244
if not os.path.exists(THIRD_PARTY_PATH):
171245
os.mkdir(THIRD_PARTY_PATH)
172246

inference/scripts/clean.sh

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,21 @@ cd "$(dirname $0)/.."
1010
# List of directories to clean
1111
dirs=("build-local" "build-wasm" "emsdk")
1212

13-
# Flag to track if any directories were cleaned
14-
cleaned=false
15-
1613
# Check and remove directories
1714
for dir in "${dirs[@]}"; do
1815
if [ -d "$dir" ]; then
1916
echo "Removing $dir..."
2017
rm -rf "$dir"
21-
cleaned=true
2218
fi
2319
done
2420

25-
# If no directories were cleaned, print a message
26-
if [ "$cleaned" = false ]; then
27-
echo "Nothing to clean"
28-
fi
21+
echo "Removing generated wasm artifacts..."
22+
rm -rf wasm/tests/generated/*.js
23+
rm -rf wasm/tests/generated/*.wasm
24+
rm -rf wasm/tests/generated/*.sha256
25+
26+
echo "Removing extracted model files..."
27+
rm -rf wasm/tests/models/**/*.bin
28+
rm -rf wasm/tests/models/**/*.spm
2929

30+
echo

inference/scripts/test-wasm.py

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
#!/usr/bin/env python3
2+
import argparse
3+
import hashlib
4+
import os
5+
import shutil
6+
import subprocess
7+
import sys
8+
9+
SCRIPTS_PATH = os.path.realpath(os.path.dirname(__file__))
10+
INFERENCE_PATH = os.path.dirname(SCRIPTS_PATH)
11+
BUILD_PATH = os.path.join(INFERENCE_PATH, "build-wasm")
12+
WASM_PATH = os.path.join(INFERENCE_PATH, "wasm")
13+
WASM_TESTS_PATH = os.path.join(WASM_PATH, "tests")
14+
GENERATED_PATH = os.path.join(WASM_TESTS_PATH, "generated")
15+
MODELS_PATH = os.path.join(WASM_TESTS_PATH, "models")
16+
WASM_ARTIFACT = os.path.join(BUILD_PATH, "bergamot-translator.wasm")
17+
JS_ARTIFACT = os.path.join(BUILD_PATH, "bergamot-translator.js")
18+
JS_ARTIFACT_HASH = os.path.join(GENERATED_PATH, "bergamot-translator.js.sha256")
19+
20+
21+
def calculate_sha256(file_path):
22+
sha256_hash = hashlib.sha256()
23+
with open(file_path, "rb") as f:
24+
for byte_block in iter(lambda: f.read(4096), b""):
25+
sha256_hash.update(byte_block)
26+
return sha256_hash.hexdigest()
27+
28+
29+
def main():
30+
parser = argparse.ArgumentParser(
31+
description="Test WASM by building and handling artifacts.",
32+
formatter_class=argparse.RawTextHelpFormatter,
33+
)
34+
35+
parser.add_argument("--clobber", action="store_true", help="Clobber the build artifacts")
36+
parser.add_argument(
37+
"--debug",
38+
action="store_true",
39+
help="Build with debug symbols, useful for profiling",
40+
)
41+
parser.add_argument(
42+
"-j",
43+
type=int,
44+
help="Number of cores to use for building (default: all available cores)",
45+
)
46+
args = parser.parse_args()
47+
48+
build_wasm_script = os.path.join(SCRIPTS_PATH, "build-wasm.py")
49+
build_command = [sys.executable, build_wasm_script]
50+
if args.clobber:
51+
build_command.append("--clobber")
52+
if args.debug:
53+
build_command.append("--debug")
54+
if args.j:
55+
build_command.extend(["-j", str(args.j)])
56+
57+
print("\n🚀 Starting build-wasm.py")
58+
subprocess.run(build_command, check=True)
59+
60+
print("\n📥 Pulling translations model files with git lfs\n")
61+
subprocess.run(["git", "lfs", "pull"], cwd=MODELS_PATH, check=True)
62+
print(f" Pulled all files in {MODELS_PATH}")
63+
64+
print("\n📁 Copying generated build artifacts to the WASM test directory\n")
65+
66+
os.makedirs(GENERATED_PATH, exist_ok=True)
67+
shutil.copy2(WASM_ARTIFACT, GENERATED_PATH)
68+
shutil.copy2(JS_ARTIFACT, GENERATED_PATH)
69+
70+
print(f" Copied the following artifacts to {GENERATED_PATH}:")
71+
print(f" - {JS_ARTIFACT}")
72+
print(f" - {WASM_ARTIFACT}")
73+
74+
print(f"\n🔑 Calculating SHA-256 hash of {JS_ARTIFACT}\n")
75+
hash_value = calculate_sha256(JS_ARTIFACT)
76+
with open(JS_ARTIFACT_HASH, "w") as hash_file:
77+
hash_file.write(f"{hash_value} {os.path.basename(JS_ARTIFACT)}\n")
78+
print(f" Hash of {JS_ARTIFACT} written to")
79+
print(f" {JS_ARTIFACT_HASH}")
80+
81+
print("\n📂 Decompressing model files required for WASM testing\n")
82+
subprocess.run(["gzip", "-dkrf", MODELS_PATH], check=True)
83+
print(f" Decompressed models in {MODELS_PATH}\n")
84+
85+
print("\n🔧 Installing npm dependencies for WASM JS tests\n")
86+
subprocess.run(["npm", "install"], cwd=WASM_TESTS_PATH, check=True)
87+
88+
print("\n📊 Running Translations WASM JS tests\n")
89+
subprocess.run(["npm", "run", "test"], cwd=WASM_TESTS_PATH, check=True)
90+
91+
print("\n✅ test-wasm.py completed successfully.\n")
92+
93+
94+
if __name__ == "__main__":
95+
main()

inference/src/tests/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ if(NOT MSVC)
1616
set(TEST_BINARIES async blocking intgemm-resolve wasm)
1717
foreach(binary ${TEST_BINARIES})
1818
add_executable("${binary}" "${binary}.cpp")
19-
target_link_libraries("${binary}" bergamot-translator)
19+
target_link_libraries("${binary}" bergamot-translator-source)
2020
set_target_properties("${binary}" PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/tests/")
2121
endforeach(binary)
2222

inference/src/tests/units/CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ foreach(test ${UNIT_TESTS})
1111
target_include_directories("run_${test}" PRIVATE ${CATCH_INCLUDE_DIR} "${CMAKE_SOURCE_DIR}/src")
1212

1313
if(CUDA_FOUND)
14-
target_link_libraries("run_${test}" ${EXT_LIBS} marian ${EXT_LIBS} marian_cuda ${EXT_LIBS} Catch bergamot-translator)
14+
target_link_libraries("run_${test}" ${EXT_LIBS} marian ${EXT_LIBS} marian_cuda ${EXT_LIBS} Catch bergamot-translator-source)
1515
else(CUDA_FOUND)
16-
target_link_libraries("run_${test}" marian ${EXT_LIBS} Catch bergamot-translator)
16+
target_link_libraries("run_${test}" marian ${EXT_LIBS} Catch bergamot-translator-source)
1717
endif(CUDA_FOUND)
1818

1919
if(msvc)

0 commit comments

Comments
 (0)