Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/rust-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ jobs:
working-directory: rust
run: cargo clippy --all-targets --all-features -- -D warnings

- name: Run Rust unit tests
working-directory: rust
run: cargo test --no-default-features

rust-test:
name: Build & Test (${{ matrix.os }}, Python ${{ matrix.python-version }})
runs-on: ${{ matrix.os }}
Expand Down
64 changes: 34 additions & 30 deletions BENCHMARKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,77 +6,81 @@ Comprehensive performance comparison between all json2xml implementations.

- **Machine**: Apple Silicon (M-series, aarch64)
- **OS**: macOS
- **Date**: January 16, 2026
- **Date**: January 28, 2026

### Implementations Tested

| Implementation | Type | Notes |
|----------------|------|-------|
| Python | Library | Pure Python (json2xml) |
| Rust | Library | Native extension via PyO3 (json2xml-rs) |
| Go | CLI | Standalone binary (json2xml-go) |
| Go | CLI | Standalone binary (json2xml-go v1.0.0) |
| Zig | CLI | Standalone binary (json2xml-zig) |

## Test Data

| Size | Description | Bytes |
|------|-------------|-------|
| Small | Simple object `{"name": "John", "age": 30, "city": "New York"}` | 47 |
| Medium | 10 generated records with nested structures | 3,212 |
| Medium | 10 generated records with nested structures | ~3,208 |
| bigexample.json | Real-world patent data | 2,018 |
| Large | 100 generated records with nested structures | 32,226 |
| Very Large | 1,000 generated records with nested structures | 323,126 |
| Large | 100 generated records with nested structures | ~32,205 |
| Very Large | 1,000 generated records with nested structures | ~323,119 |

## Results

### Performance Summary

| Test Case | Python | Rust | Go | Zig |
|-----------|--------|------|-----|-----|
| Small (47B) | 40.12µs | 1.45µs | 4.65ms | 3.74ms |
| Medium (3.2KB) | 2.14ms | 71.28µs | 4.07ms | 3.28ms |
| bigexample (2KB) | 819.46µs | 32.88µs | 4.02ms | 2.96ms |
| Large (32KB) | 21.08ms | 739.89µs | 4.05ms | 6.11ms |
| Very Large (323KB) | 212.61ms | 7.55ms | 4.38ms | 33.24ms |
| Small (47B) | 41.88µs | 1.66µs | 4.52ms | 2.80ms |
| Medium (3.2KB) | 2.19ms | 71.85µs | 4.33ms | 2.18ms |
| bigexample (2KB) | 854.38µs | 30.89µs | 4.28ms | 2.12ms |
| Large (32KB) | 21.57ms | 672.96µs | 4.47ms | 2.48ms |
| Very Large (323KB) | 216.52ms | 6.15ms | 4.44ms | 5.54ms |

### Speedup vs Pure Python

| Test Case | Rust | Go | Zig |
|-----------|------|-----|-----|
| Small (47B) | **27.6x** | 0.0x* | 0.0x* |
| Medium (3.2KB) | **30.0x** | 0.5x* | 0.7x* |
| bigexample (2KB) | **24.9x** | 0.2x* | 0.3x* |
| Large (32KB) | **28.5x** | 5.2x | 3.5x |
| Very Large (323KB) | **28.2x** | **48.5x** | 6.4x |
| Small (47B) | **25.2x** | 0.0x* | 0.0x* |
| Medium (3.2KB) | **30.5x** | 0.5x* | 1.0x* |
| bigexample (2KB) | **27.7x** | 0.2x* | 0.4x* |
| Large (32KB) | **32.1x** | 4.8x | **8.7x** |
| Very Large (323KB) | **35.2x** | **48.8x** | **39.1x** |

*CLI tools have process spawn overhead (~3-4ms) which dominates for small inputs
*CLI tools have process spawn overhead (~2-4ms) which dominates for small inputs

## Key Observations

### 1. Rust Extension is the Best Choice for Python Users 🦀

The Rust extension (json2xml-rs) provides:
- **~28x faster** than pure Python consistently across all input sizes
- **~25-35x faster** than pure Python consistently across all input sizes
- **Zero process overhead** - called directly from Python
- **Automatic fallback** - pure Python used if Rust unavailable
- **Easy install**: `pip install json2xml[fast]`

### 2. Go Excels for Large CLI Workloads 🚀
### 2. Go Excels for Very Large CLI Workloads 🚀

For very large inputs (323KB+):
- **48.5x faster** than Python
- But ~3-4ms startup overhead hurts small file performance
- **48.8x faster** than Python
- But ~4ms startup overhead hurts small file performance
- Best for batch processing or large file conversions

### 3. Zig is Competitive but Has Trade-offs
### 3. Zig is Now Highly Competitive âš¡

- Consistent ~3ms startup overhead
- Good for medium-large files (3-6x faster than Python)
- Less optimized than Go for very large inputs
After recent optimizations:
- **39.1x faster** than Python for very large files
- **8.7x faster** for large files (32KB)
- Faster startup than Go (~2ms vs ~4ms)
- Best balance of startup time and throughput

### 4. Process Spawn Overhead Matters

CLI tools (Go, Zig) have ~3-4ms process spawn overhead:
CLI tools (Go, Zig) have process spawn overhead:
- Go: ~4ms startup overhead
- Zig: ~2ms startup overhead
- Dominates for small inputs (makes them appear slower than Python!)
- Negligible for large inputs where actual work dominates
- Rust extension avoids this entirely by being a native Python module
Expand All @@ -85,9 +89,9 @@ CLI tools (Go, Zig) have ~3-4ms process spawn overhead:

| Use Case | Recommended | Why |
|----------|-------------|-----|
| Python library calls | **Rust** (`pip install json2xml[fast]`) | 28x faster, no overhead |
| Small files via CLI | **Rust** via Python | CLI overhead dominates |
| Large files via CLI | **Go** (json2xml-go) | 48x faster for 300KB+ |
| Python library calls | **Rust** (`pip install json2xml[fast]`) | 25-35x faster, no overhead |
| Small files via CLI | **Zig** (json2xml-zig) | Fastest startup (~2ms) |
| Large files via CLI | **Go** or **Zig** | Both excellent (Go slightly faster) |
| Batch processing | **Go** or **Rust** | Both excellent |
| Pure Python required | **Python** (json2xml) | Always available |

Expand All @@ -104,7 +108,7 @@ pip install json2xml[fast]
go install github.com/vinitkumar/json2xml-go@latest

# Zig CLI
# See: github.com/nicholasgriffintn/json2xml-zig
# See: github.com/vinitkumar/json2xml-zig
```

## Running the Benchmarks
Expand All @@ -130,4 +134,4 @@ python benchmark_multi_python.py
## Related Projects

- **Go version**: [github.com/vinitkumar/json2xml-go](https://github.com/vinitkumar/json2xml-go)
- **Zig version**: [github.com/nicholasgriffintn/json2xml-zig](https://github.com/nicholasgriffintn/json2xml-zig)
- **Zig version**: [github.com/vinitkumar/json2xml-zig](https://github.com/vinitkumar/json2xml-zig)
8 changes: 6 additions & 2 deletions rust/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,14 @@ license = "Apache-2.0"

[lib]
name = "json2xml_rs"
crate-type = ["cdylib"]
crate-type = ["cdylib", "rlib"]

[features]
default = ["python"]
python = ["pyo3/extension-module", "dep:pyo3"]

[dependencies]
pyo3 = { version = "0.27", features = ["extension-module"] }
pyo3 = { version = "0.27", optional = true }

[profile.release]
lto = true
Expand Down
51 changes: 51 additions & 0 deletions rust/fuzz/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
[package]
name = "json2xml_rs-fuzz"
version = "0.0.0"
publish = false
edition = "2021"

[package.metadata]
cargo-fuzz = true

[dependencies]
libfuzzer-sys = "0.4"
arbitrary = { version = "1", features = ["derive"] }

[dependencies.json2xml_rs]
path = ".."
default-features = false

[[bin]]
name = "fuzz_escape_xml"
path = "fuzz_targets/fuzz_escape_xml.rs"
test = false
doc = false
bench = false

[[bin]]
name = "fuzz_wrap_cdata"
path = "fuzz_targets/fuzz_wrap_cdata.rs"
test = false
doc = false
bench = false

[[bin]]
name = "fuzz_is_valid_xml_name"
path = "fuzz_targets/fuzz_is_valid_xml_name.rs"
test = false
doc = false
bench = false

[[bin]]
name = "fuzz_make_valid_xml_name"
path = "fuzz_targets/fuzz_make_valid_xml_name.rs"
test = false
doc = false
bench = false

[[bin]]
name = "fuzz_make_attr_string"
path = "fuzz_targets/fuzz_make_attr_string.rs"
test = false
doc = false
bench = false
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@


Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ë
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
]I]]
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
£¶¢§
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
]]J(
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
]](
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
]J(
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
]
23 changes: 23 additions & 0 deletions rust/fuzz/fuzz_targets/fuzz_escape_xml.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#![no_main]

use libfuzzer_sys::fuzz_target;
use json2xml_rs::escape_xml;

fuzz_target!(|data: &str| {
let result = escape_xml(data);

// Verify invariants:
// 1. Result should not contain unescaped special chars
assert!(!result.contains('&') || result.contains("&") || result.contains(""")
|| result.contains("'") || result.contains("<") || result.contains(">"));

// 2. Result should be valid (no panics occurred)
// 3. If input had no special chars, output equals input
if !data.contains('&') && !data.contains('"') && !data.contains('\'')
&& !data.contains('<') && !data.contains('>') {
assert_eq!(result, data);
}

// 4. Output length should be >= input length (escaping only adds chars)
assert!(result.len() >= data.len());
});
33 changes: 33 additions & 0 deletions rust/fuzz/fuzz_targets/fuzz_is_valid_xml_name.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#![no_main]

use libfuzzer_sys::fuzz_target;
use json2xml_rs::is_valid_xml_name;

fuzz_target!(|data: &str| {
let result = is_valid_xml_name(data);

// Verify invariants:
// 1. Empty string is always invalid
if data.is_empty() {
assert!(!result);
}

// 2. String starting with digit is invalid
if let Some(first) = data.chars().next() {
if first.is_ascii_digit() {
assert!(!result);
}
}

// 3. String starting with "xml" (case-insensitive) is invalid
if data.to_lowercase().starts_with("xml") {
assert!(!result);
}

// 4. String containing spaces is invalid
if data.contains(' ') {
assert!(!result);
}

// 5. Function should never panic - reaching here means it didn't
});
42 changes: 42 additions & 0 deletions rust/fuzz/fuzz_targets/fuzz_make_attr_string.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#![no_main]

use libfuzzer_sys::fuzz_target;
use arbitrary::Arbitrary;
use json2xml_rs::make_attr_string;

#[derive(Arbitrary, Debug)]
struct AttrInput {
attrs: Vec<(String, String)>,
}

fuzz_target!(|input: AttrInput| {
let result = make_attr_string(&input.attrs);

// Verify invariants:
// 1. Empty attrs should produce empty string
if input.attrs.is_empty() {
assert!(result.is_empty());
return;
}

// 2. Result should start with space (for XML formatting)
assert!(result.starts_with(' '), "Attribute string should start with space");

// 3. Each attribute should produce a ` key="value"`-like fragment.
// We check for the more specific pattern ` {key}="` to avoid
// passing on overlapping keys (e.g. "a" vs "aa") or malformed formatting.
for (key, _value) in &input.attrs {
let expected_fragment = format!(" {}=\"", key);
assert!(
result.contains(&expected_fragment),
"Attribute fragment '{}' should appear in result '{}'",
expected_fragment,
result
);
}

Comment on lines +28 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): The comment about escaped values is not enforced; consider adding an explicit check to strengthen the fuzz target.

Since the fuzz target doesn’t currently check that escaping actually happens, it won’t catch regressions in escape_xml. Consider adding an assertion (or a minimal parser) that inspects attribute values and verifies they contain no unescaped &, <, >, ", or ' characters before concluding the output is valid.

Suggested change
for (key, _value) in &input.attrs {
let expected_fragment = format!(" {}=\"", key);
assert!(
result.contains(&expected_fragment),
"Attribute fragment '{}' should appear in result '{}'",
expected_fragment,
result
);
}
for (key, _value) in &input.attrs {
let expected_fragment = format!(" {}=\"", key);
assert!(
result.contains(&expected_fragment),
"Attribute fragment '{}' should appear in result '{}'",
expected_fragment,
result
);
}
// Additionally, verify that attribute values are properly escaped:
// - No raw <, >, " or ' characters may appear inside attribute values.
// - Any '&' inside a value must be part of an entity (it must be followed
// by some characters and then a terminating ';' before the closing quote).
for (key, _value) in &input.attrs {
let expected_prefix = format!(" {}=\"", key);
if let Some(start) = result.find(&expected_prefix) {
let value_start = start + expected_prefix.len();
if let Some(rel_end) = result[value_start..].find('"') {
let value_end = value_start + rel_end;
let value = &result[value_start..value_end];
// 1. Forbid raw <, >, " and ' in attribute values.
for forbidden in ['<', '>', '"', '\''] {
assert!(
!value.chars().any(|c| c == forbidden),
"Unescaped '{}' found in attribute value for key '{}' in '{}'",
forbidden,
key,
result
);
}
// 2. Ensure any '&' is part of something that at least looks like an entity:
// '&' must be followed by at least one non-';' character and then a ';'
// before the end of the value.
let bytes = value.as_bytes();
let mut i = 0;
while i < bytes.len() {
if bytes[i] == b'&' {
// There must be at least one character after '&'
assert!(
i + 1 < bytes.len(),
"Dangling '&' at end of attribute value for key '{}' in '{}'",
key,
result
);
// Find the next ';' after '&'
let mut j = i + 1;
while j < bytes.len() && bytes[j] != b';' {
j += 1;
}
assert!(
j < bytes.len(),
"Found '&' in attribute value for key '{}' that is not terminated by ';' in '{}'",
key,
result
);
// Require at least one character between '&' and ';'
assert!(
j > i + 1,
"Empty entity reference '&;' in attribute value for key '{}' in '{}'",
key,
result
);
// Continue scanning after the ';'
i = j + 1;
} else {
i += 1;
}
}
}
}
}

// 4. Values should be escaped (no raw & < > " ' in values)
// The make_attr_string calls escape_xml on values

// 5. Function should never panic - reaching here means it didn't
});
30 changes: 30 additions & 0 deletions rust/fuzz/fuzz_targets/fuzz_make_valid_xml_name.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#![no_main]

use libfuzzer_sys::fuzz_target;
use json2xml_rs::make_valid_xml_name;

fuzz_target!(|data: &str| {
let (name, attr) = make_valid_xml_name(data);

// Verify invariants:
// 1. The returned name must be a valid XML name OR be "key" with an attribute
if name != "key" {
// If we didn't fall back to "key", the name should be valid
// (though it might have been transformed)
assert!(!name.is_empty(), "Name should not be empty");
}

// 2. If attr is Some, name should be "key"
if attr.is_some() {
assert_eq!(name, "key", "Fallback name should be 'key'");
let (attr_name, _attr_value) = attr.unwrap();
assert_eq!(attr_name, "name", "Attribute key should be 'name'");
}

// 3. Purely numeric input should get 'n' prefix
if !data.is_empty() && data.chars().all(|c| c.is_ascii_digit()) {
assert!(name.starts_with('n'), "Numeric keys should get 'n' prefix");
}

// 4. Function should never panic - reaching here means it didn't
});
21 changes: 21 additions & 0 deletions rust/fuzz/fuzz_targets/fuzz_wrap_cdata.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#![no_main]

use libfuzzer_sys::fuzz_target;
use json2xml_rs::wrap_cdata;

fuzz_target!(|data: &str| {
let result = wrap_cdata(data);

// Verify invariants:
// 1. Result must start with CDATA opening
assert!(result.starts_with("<![CDATA["));

// 2. Result must end with CDATA closing
assert!(result.ends_with("]]>"));

// 3. The ]]> sequence in input must be properly escaped
// (split into ]]]]><![CDATA[>)

// 4. Result should be longer than or equal to input + CDATA wrapper (12 chars)
assert!(result.len() >= data.len() + 12);
});
Loading
Loading