Skip to content

Commit 1bfc7ca

Browse files
authored
feat: add CLI support for json2xml-py (#266)
* feat: add CLI support for json2xml-py - Add json2xml/cli.py with same flags as Go version - Add console script entry point as json2xml-py - Add comprehensive CLI tests (15 tests) - Add benchmark scripts to compare Python vs Go performance Flags: -w/--wrapper, -r/--root, -p/--pretty, -t/--type, -i/--item-wrap, -x/--xpath, -c/--cdata, -l/--list-headers, -u/--url, -s/--string, -o/--output, -v/--version, -h/--help * fix: address PR review feedback - Add cdata and list_headers parameters to Json2xml class (bug fix) - Pass --cdata and --list-headers CLI flags to Json2xml converter - Make benchmark paths configurable via environment variables - Remove unused imports (os, pytest, CaptureFixture) - Add tests for cdata, list_headers, and stdin error paths - Total tests: 210 (19 CLI tests) * docs: add comprehensive benchmark results Compare performance across: - CPython 3.14.2 (baseline) - CPython 3.15.0a4 (1.16x faster) - PyPy 3.10.16 (1.22x faster) - Go json2xml-go (7.34x faster) Key findings: - Go is 7-20x faster than Python implementations - CPython 3.15 shows 13-35% improvement over 3.14 - PyPy excels at large inputs but has JIT overhead for small ones * docs: add benchmarks to ReadTheDocs documentation - Add benchmarks.rst with full performance comparison - Include in documentation index for ReadTheDocs visibility
1 parent 0b603c7 commit 1bfc7ca

File tree

10 files changed

+1867
-0
lines changed

10 files changed

+1867
-0
lines changed

BENCHMARKS.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# json2xml Benchmark Results
2+
3+
Comprehensive performance comparison between Python implementations and the Go version of json2xml.
4+
5+
## Test Environment
6+
7+
- **Machine**: Apple Silicon (aarch64)
8+
- **OS**: macOS
9+
- **Date**: January 14, 2026
10+
11+
### Implementations Tested
12+
13+
| Implementation | Version | Notes |
14+
|----------------|---------|-------|
15+
| CPython | 3.14.2 | Homebrew installation |
16+
| CPython | 3.15.0a4 | Latest alpha via uv |
17+
| PyPy | 3.10.16 | JIT-compiled Python |
18+
| Go | 1.0.0 | json2xml-go |
19+
20+
## Test Data
21+
22+
| Size | Description | Bytes |
23+
|------|-------------|-------|
24+
| Small | Simple object `{"name": "John", "age": 30, "city": "New York"}` | 47 |
25+
| Medium | `bigexample.json` (patent data) | 2,598 |
26+
| Large | 1,000 generated records with nested structures | 323,130 |
27+
| Very Large | 5,000 generated records with nested structures | 1,619,991 |
28+
29+
## Results
30+
31+
### Individual Test Results
32+
33+
| Test | CPython 3.14.2 | CPython 3.15.0a4 | PyPy 3.10.16 | Go |
34+
|------|----------------|------------------|--------------|-----|
35+
| **Small JSON** (47 bytes) | 75.46ms | 55.74ms (**1.4x faster**) | 121.47ms (1.6x slower) | 3.69ms (**20.4x faster**) |
36+
| **Medium JSON** (2.6KB) | 73.87ms | 57.98ms (**1.3x faster**) | 125.73ms (1.7x slower) | 4.32ms (**17.1x faster**) |
37+
| **Large JSON** (323KB) | 419.67ms | 328.98ms (**1.3x faster**) | 517.51ms (1.2x slower) | 67.13ms (**6.3x faster**) |
38+
| **Very Large JSON** (1.6MB) | 2.09s | 1.86s (**1.1x faster**) | 1.42s (**1.5x faster**) | 287.58ms (**7.3x faster**) |
39+
40+
### Summary (Average Across All Tests)
41+
42+
| Implementation | Avg Time | vs CPython 3.14.2 |
43+
|----------------|----------|-------------------|
44+
| **Go** | 90.68ms | **7.34x faster** 🚀 |
45+
| **PyPy 3.10.16** | 545.58ms | **1.22x faster** |
46+
| **CPython 3.15.0a4** | 575.45ms | **1.16x faster** |
47+
| **CPython 3.14.2** | 665.23ms | baseline |
48+
49+
## Key Observations
50+
51+
### 1. Go is the Clear Winner
52+
53+
Go outperforms all Python implementations by a significant margin:
54+
- **7.34x faster** than CPython 3.14.2 on average
55+
- Up to **20x faster** for small inputs due to minimal startup overhead
56+
- Consistent performance across all input sizes
57+
58+
### 2. CPython 3.15.0a4 Shows Promising Improvements
59+
60+
The latest Python alpha demonstrates consistent performance gains:
61+
- **13-35% faster** than CPython 3.14.2 across all test sizes
62+
- Improvements likely due to ongoing interpreter optimizations
63+
64+
### 3. PyPy Has Interesting Trade-offs
65+
66+
PyPy's JIT compiler creates a unique performance profile:
67+
- **Slower for small/medium inputs**: JIT compilation overhead hurts for quick operations
68+
- **Faster for very large inputs**: JIT shines on the 5K record test (1.5x faster than CPython)
69+
- Best suited for long-running processes or batch processing
70+
71+
### 4. Startup Overhead Dominates Small Inputs
72+
73+
Python's interpreter startup time is significant:
74+
- CPython takes **55-75ms** even for 47 bytes of JSON
75+
- Go takes only **3.7ms** for the same operation
76+
- For CLI tools processing small files, Go provides a much better user experience
77+
78+
## When to Use Each Implementation
79+
80+
| Use Case | Recommended |
81+
|----------|-------------|
82+
| CLI tool for small/medium files | **Go** (json2xml-go) |
83+
| High-throughput batch processing | **Go** or **PyPy** |
84+
| Integration with Python codebase | **CPython 3.15+** |
85+
| One-off conversions in scripts | **CPython** (any version) |
86+
87+
## Running the Benchmarks
88+
89+
### Python Multi-Implementation Benchmark
90+
91+
```bash
92+
# Set the Go CLI path
93+
export JSON2XML_GO_CLI=/path/to/json2xml-go
94+
95+
# Run the benchmark
96+
python benchmark_multi_python.py
97+
```
98+
99+
### Simple Python vs Go Benchmark
100+
101+
```bash
102+
# Set paths via environment variables (optional)
103+
export JSON2XML_GO_CLI=/path/to/json2xml-go
104+
export JSON2XML_EXAMPLES_DIR=/path/to/examples
105+
106+
# Run the benchmark
107+
python benchmark.py
108+
```
109+
110+
## Reproducing Results
111+
112+
1. Install required Python versions using `uv`:
113+
```bash
114+
uv python install 3.14 3.15.0a4 [email protected]
115+
```
116+
117+
2. Build the Go binary:
118+
```bash
119+
cd /path/to/json2xml-go
120+
go build -o json2xml-go ./cmd/json2xml-go
121+
```
122+
123+
3. Run the multi-Python benchmark:
124+
```bash
125+
cd /path/to/json2xml
126+
python benchmark_multi_python.py
127+
```

benchmark.py

Lines changed: 264 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,264 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Benchmark script for json2xml-py vs json2xml-go.
4+
5+
Compares performance of Python and Go implementations across
6+
different JSON sizes.
7+
8+
Environment variables:
9+
JSON2XML_GO_CLI: Path to the json2xml-go binary (default: json2xml-go in PATH)
10+
JSON2XML_EXAMPLES_DIR: Path to examples directory (default: ./examples relative to script)
11+
"""
12+
from __future__ import annotations
13+
14+
import json
15+
import os
16+
import random
17+
import string
18+
import subprocess
19+
import sys
20+
import tempfile
21+
import time
22+
from pathlib import Path
23+
24+
# Base directory for repo-relative defaults
25+
BASE_DIR = Path(__file__).resolve().parent
26+
27+
# Paths - configurable via environment variables
28+
PYTHON_CLI = [sys.executable, "-m", "json2xml.cli"]
29+
GO_CLI = Path(os.environ.get("JSON2XML_GO_CLI", "json2xml-go"))
30+
EXAMPLES_DIR = Path(os.environ.get("JSON2XML_EXAMPLES_DIR", str(BASE_DIR / "examples")))
31+
32+
# Colors for terminal output
33+
class Colors:
34+
RED = "\033[0;31m"
35+
GREEN = "\033[0;32m"
36+
BLUE = "\033[0;34m"
37+
YELLOW = "\033[1;33m"
38+
CYAN = "\033[0;36m"
39+
BOLD = "\033[1m"
40+
NC = "\033[0m" # No Color
41+
42+
43+
def colorize(text: str, color: str) -> str:
44+
"""Wrap text in color codes."""
45+
return f"{color}{text}{Colors.NC}"
46+
47+
48+
def random_string(length: int = 10) -> str:
49+
"""Generate a random string."""
50+
return "".join(random.choices(string.ascii_letters, k=length))
51+
52+
53+
def generate_large_json(num_records: int = 1000) -> str:
54+
"""Generate a large JSON file for benchmarking."""
55+
data = []
56+
for i in range(num_records):
57+
item = {
58+
"id": i,
59+
"name": random_string(20),
60+
"email": f"{random_string(8)}@example.com",
61+
"active": random.choice([True, False]),
62+
"score": round(random.uniform(0, 100), 2),
63+
"tags": [random_string(5) for _ in range(5)],
64+
"metadata": {
65+
"created": "2024-01-15T10:30:00Z",
66+
"updated": "2024-01-15T12:45:00Z",
67+
"version": random.randint(1, 100),
68+
"nested": {
69+
"level1": {
70+
"level2": {"value": random_string(10)}
71+
}
72+
},
73+
},
74+
}
75+
data.append(item)
76+
return json.dumps(data)
77+
78+
79+
def run_benchmark(
80+
cmd: list[str],
81+
iterations: int = 10,
82+
warmup: int = 2
83+
) -> dict[str, float]:
84+
"""
85+
Run a benchmark for the given command.
86+
87+
Returns dict with avg, min, max times in milliseconds.
88+
89+
Note: cmd is always a list constructed internally by this script,
90+
not from external/user input. The subprocess calls are safe.
91+
"""
92+
times = []
93+
94+
# Warmup runs
95+
# Security: cmd is a list constructed internally, not from user input
96+
for _ in range(warmup):
97+
subprocess.run(cmd, capture_output=True, check=False) # noqa: S603
98+
99+
# Timed runs
100+
for _ in range(iterations):
101+
start = time.perf_counter()
102+
result = subprocess.run(cmd, capture_output=True, check=False) # noqa: S603
103+
end = time.perf_counter()
104+
105+
if result.returncode != 0:
106+
print(f"Error: {result.stderr.decode()}")
107+
continue
108+
109+
duration_ms = (end - start) * 1000
110+
times.append(duration_ms)
111+
112+
if not times:
113+
return {"avg": 0, "min": 0, "max": 0}
114+
115+
return {
116+
"avg": sum(times) / len(times),
117+
"min": min(times),
118+
"max": max(times),
119+
}
120+
121+
122+
def format_time(ms: float) -> str:
123+
"""Format time in milliseconds."""
124+
if ms < 1:
125+
return f"{ms * 1000:.2f}µs"
126+
elif ms < 1000:
127+
return f"{ms:.2f}ms"
128+
else:
129+
return f"{ms / 1000:.2f}s"
130+
131+
132+
def print_header(title: str) -> None:
133+
"""Print a section header."""
134+
print(colorize("=" * 50, Colors.BLUE))
135+
print(colorize(f" {title}", Colors.BOLD))
136+
print(colorize("=" * 50, Colors.BLUE))
137+
138+
139+
def print_result(name: str, result: dict[str, float]) -> None:
140+
"""Print benchmark result."""
141+
print(f" {name}:")
142+
print(f" Avg: {format_time(result['avg'])} | "
143+
f"Min: {format_time(result['min'])} | "
144+
f"Max: {format_time(result['max'])}")
145+
146+
147+
def main() -> int:
148+
"""Run the benchmark suite."""
149+
print_header("json2xml Benchmark: Python vs Go")
150+
print()
151+
152+
# Check prerequisites
153+
print(colorize("Checking prerequisites...", Colors.YELLOW))
154+
155+
if not GO_CLI.exists():
156+
print(colorize(f"Error: Go binary not found at {GO_CLI}", Colors.RED))
157+
print("Please build it first: cd json2xml-go && go build -o json2xml-go ./cmd/json2xml-go")
158+
return 1
159+
160+
print(colorize("✓ Prerequisites met", Colors.GREEN))
161+
print()
162+
163+
# Test configurations
164+
iterations = 10
165+
results = {}
166+
167+
# Create temp files for testing
168+
with tempfile.TemporaryDirectory() as tmpdir:
169+
# Small JSON - inline string
170+
small_json = '{"name": "John", "age": 30, "city": "New York"}'
171+
172+
# Medium JSON - existing file
173+
medium_json_file = EXAMPLES_DIR / "bigexample.json"
174+
175+
# Large JSON - generated
176+
large_json = generate_large_json(1000)
177+
large_json_file = Path(tmpdir) / "large.json"
178+
large_json_file.write_text(large_json)
179+
180+
# Very large JSON
181+
very_large_json = generate_large_json(5000)
182+
very_large_json_file = Path(tmpdir) / "very_large.json"
183+
very_large_json_file.write_text(very_large_json)
184+
185+
print(colorize("Test file sizes:", Colors.CYAN))
186+
print(f" Small: {len(small_json)} bytes (inline)")
187+
print(f" Medium: {medium_json_file.stat().st_size:,} bytes")
188+
print(f" Large: {large_json_file.stat().st_size:,} bytes (1000 records)")
189+
print(f" Very Large: {very_large_json_file.stat().st_size:,} bytes (5000 records)")
190+
print()
191+
192+
# Benchmark: Small JSON (inline string)
193+
print(colorize("--- Small JSON (inline string) ---", Colors.BLUE))
194+
py_small = run_benchmark(PYTHON_CLI + ["-s", small_json], iterations)
195+
go_small = run_benchmark([str(GO_CLI), "-s", small_json], iterations)
196+
print_result("Python", py_small)
197+
print_result("Go", go_small)
198+
results["small"] = {"python": py_small, "go": go_small}
199+
print()
200+
201+
# Benchmark: Medium JSON (file)
202+
print(colorize("--- Medium JSON (bigexample.json) ---", Colors.BLUE))
203+
py_medium = run_benchmark(PYTHON_CLI + [str(medium_json_file)], iterations)
204+
go_medium = run_benchmark([str(GO_CLI), str(medium_json_file)], iterations)
205+
print_result("Python", py_medium)
206+
print_result("Go", go_medium)
207+
results["medium"] = {"python": py_medium, "go": go_medium}
208+
print()
209+
210+
# Benchmark: Large JSON (file)
211+
print(colorize("--- Large JSON (1000 records) ---", Colors.BLUE))
212+
py_large = run_benchmark(PYTHON_CLI + [str(large_json_file)], iterations)
213+
go_large = run_benchmark([str(GO_CLI), str(large_json_file)], iterations)
214+
print_result("Python", py_large)
215+
print_result("Go", go_large)
216+
results["large"] = {"python": py_large, "go": go_large}
217+
print()
218+
219+
# Benchmark: Very Large JSON (file)
220+
print(colorize("--- Very Large JSON (5000 records) ---", Colors.BLUE))
221+
py_vlarge = run_benchmark(PYTHON_CLI + [str(very_large_json_file)], iterations)
222+
go_vlarge = run_benchmark([str(GO_CLI), str(very_large_json_file)], iterations)
223+
print_result("Python", py_vlarge)
224+
print_result("Go", go_vlarge)
225+
results["very_large"] = {"python": py_vlarge, "go": go_vlarge}
226+
print()
227+
228+
# Summary
229+
print_header("SUMMARY")
230+
print()
231+
232+
for size, data in results.items():
233+
py_avg = data["python"]["avg"]
234+
go_avg = data["go"]["avg"]
235+
236+
if go_avg > 0:
237+
speedup = py_avg / go_avg
238+
speedup_str = colorize(f"{speedup:.1f}x faster", Colors.GREEN)
239+
else:
240+
speedup_str = "N/A"
241+
242+
print(colorize(f"{size.replace('_', ' ').title()} JSON:", Colors.BOLD))
243+
print(f" Python: {format_time(py_avg)}")
244+
print(f" Go: {format_time(go_avg)}")
245+
print(f" Go is {speedup_str}")
246+
print()
247+
248+
# Overall average speedup
249+
total_py = sum(r["python"]["avg"] for r in results.values())
250+
total_go = sum(r["go"]["avg"] for r in results.values())
251+
if total_go > 0:
252+
overall_speedup = total_py / total_go
253+
print(colorize(f"Overall: Go is {overall_speedup:.1f}x faster than Python", Colors.GREEN + Colors.BOLD))
254+
255+
print()
256+
print(colorize("=" * 50, Colors.BLUE))
257+
print(colorize("Benchmark complete!", Colors.GREEN))
258+
print(colorize("=" * 50, Colors.BLUE))
259+
260+
return 0
261+
262+
263+
if __name__ == "__main__":
264+
sys.exit(main())

0 commit comments

Comments
 (0)