Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
6299124
feat: Add free-threaded Python 3.14t support with parallel processing
vinitkumar Oct 23, 2025
12a6f77
fix: lint
vinitkumar Oct 23, 2025
d882c02
fix: correct type annotations for ids parameter and test assertions
vinitkumar Oct 23, 2025
75cf65d
test: add comprehensive tests for parallel processing to improve coveโ€ฆ
vinitkumar Oct 23, 2025
ba9314d
style: fix ruff linting errors - remove whitespace from blank lines
vinitkumar Oct 23, 2025
b802a7d
test: add tests for Decimal/Fraction types and edge cases to improve โ€ฆ
vinitkumar Oct 23, 2025
a3db3b2
fix: tests
vinitkumar Nov 3, 2025
76257e5
Fix type checker issues in parallel tests by adding type ignore comments
vinitkumar Nov 3, 2025
af3a56a
Fix type checker issues in parallel tests using cast instead of type โ€ฆ
vinitkumar Nov 3, 2025
c4402ad
Improve code coverage to 99% by adding missing tests and removing deaโ€ฆ
vinitkumar Nov 3, 2025
e9ccc12
Fix type error for sys._is_gil_enabled in tests
vinitkumar Nov 3, 2025
547694c
Update to Python 3.13+ support, focus on freethreaded version
vinitkumar Nov 3, 2025
f40212c
update coverage
vinitkumar Nov 3, 2025
3449c8a
Fix CI: Use uv venv to avoid --system installation issues
vinitkumar Nov 3, 2025
b2e9981
Fix CI: Use uv run for pytest to access venv packages
vinitkumar Nov 3, 2025
8fdab64
fix: Update pytest to 8.4.1+ for Python 3.14 compatibility and fix imโ€ฆ
vinitkumar Nov 3, 2025
8e5d68a
ci: Add ty typecheck to pythonpackage.yml lint job
vinitkumar Nov 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .coverage
Binary file not shown.
19 changes: 12 additions & 7 deletions .github/workflows/pythonpackage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: [pypy-3.10, pypy-3.11, '3.10', '3.11', '3.12', '3.13', '3.14', '3.14t', '3.15.0-alpha.1']
python-version: ['3.13t', '3.14', '3.14t', '3.15.0-alpha.1']
os: [
ubuntu-latest,
windows-latest,
Expand Down Expand Up @@ -63,15 +63,16 @@ jobs:

- name: Install dependencies
run: |
uv pip install --system -e .
uv pip install --system pytest pytest-xdist pytest-cov
uv venv
uv pip install -e .
uv pip install pytest pytest-xdist pytest-cov

- name: Create coverage directory
run: mkdir -p coverage/reports

- name: Run tests
run: |
pytest --cov=json2xml --cov-report=xml:coverage/reports/coverage.xml --cov-report=term -xvs tests -n auto
uv run pytest --cov=json2xml --cov-report=xml:coverage/reports/coverage.xml --cov-report=term -xvs tests -n auto
env:
PYTHONPATH: ${{ github.workspace }}

Expand All @@ -98,10 +99,10 @@ jobs:
with:
persist-credentials: false

- name: Set up Python 3.12
- name: Set up Python 3.13
uses: actions/[email protected]
with:
python-version: '3.12'
python-version: '3.13'

- name: Install uv
uses: astral-sh/setup-uv@v6
Expand All @@ -114,8 +115,12 @@ jobs:

- name: Install dependencies
run: |
uv pip install --system -e .
uv venv
uv pip install -e .

- name: Run ruff
run: uvx ruff check json2xml tests

- name: Run ty typecheck
run: uvx ty check json2xml tests

4 changes: 2 additions & 2 deletions AGENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- Clean artifacts: `make clean`

## Architecture
- Main module: `json2xml/` with `json2xml.py` (main converter), `dicttoxml.py` (core conversion), `utils.py` (utilities)
- Main module: `json2xml/` with `json2xml.py` (main converter), `dicttoxml.py` (core conversion), `utils.py` (utilities), `parallel.py` (parallel processing)
- Core functionality: JSON to XML conversion via `Json2xml` class wrapping `dicttoxml`
- Tests: `tests/` with test files following `test_*.py` pattern

Expand All @@ -18,5 +18,5 @@
- Use pytest (no unittest), all tests in `./tests/` with typing annotations
- Import typing fixtures when TYPE_CHECKING: `CaptureFixture`, `FixtureRequest`, `LogCaptureFixture`, `MonkeyPatch`, `MockerFixture`
- Ruff formatting: line length 119, ignores E501, F403, E701, F401
- Python 3.10+ required, supports up to 3.14 (including 3.14t freethreaded)
- Python 3.13+ required, supports up to 3.14 (including 3.13t, 3.14t freethreaded)
- Dependencies: defusedxml, urllib3, xmltodict, pytest, pytest-cov
152 changes: 152 additions & 0 deletions BENCHMARK_RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# json2xml Performance Benchmark Results

## Test Environment

- **Machine**: macOS on ARM64 (Apple Silicon)
- **Date**: October 2025
- **Library Version**: 5.2.1 (with free-threaded optimization)

## Python Versions Tested

### Python 3.14.0 (Standard GIL)
- **Build**: CPython 3.14.0 (main, Oct 7 2025)
- **GIL Status**: Enabled (Standard)
- **Free-threaded**: No

### Python 3.14.0t (Free-threaded)
- **Build**: CPython 3.14.0 free-threading build (main, Oct 7 2025)
- **GIL Status**: Disabled
- **Free-threaded**: Yes

## Benchmark Methodology

Each test runs 5 iterations and reports the average time. Tests compare:
- **Serial processing**: Traditional single-threaded conversion (`parallel=False`)
- **Parallel processing**: Multi-threaded conversion with 2, 4, and 8 worker threads

### Test Datasets

| Dataset | Items | Description |
|---------|-------|-------------|
| **Small** | 10 | Simple key-value pairs |
| **Medium** | 100 | Nested dictionaries with lists |
| **Large** | 1,000 | Complex user objects with nested metadata |
| **XLarge** | 5,000 | Large array of objects with 20 fields each |

## Results

### Python 3.14 (Standard GIL) - Baseline

| Dataset | Serial Time | Parallel (2w) | Parallel (4w) | Parallel (8w) |
|---------|-------------|---------------|---------------|---------------|
| **Small** (10 items) | 0.25 ms | 0.40 ms (0.63x) | 0.51 ms (0.49x) | 0.44 ms (0.56x) |
| **Medium** (100 items) | 7.56 ms | 7.35 ms (1.03x) | 7.86 ms (0.96x) | 8.76 ms (0.86x) |
| **Large** (1K items) | 240.54 ms | 244.17 ms (0.99x) | 244.30 ms (0.98x) | 246.58 ms (0.98x) |
| **XLarge** (5K items) | 2354.32 ms | 2629.16 ms (0.90x) | 2508.42 ms (0.94x) | 2522.19 ms (0.93x) |

**Analysis**: As expected, with the GIL enabled, parallel processing provides **no speedup** and may even add slight overhead due to thread management costs. The GIL prevents true parallel execution of Python code.

### Python 3.14t (Free-threaded) - With Optimization

| Dataset | Serial Time | Parallel (2w) | Parallel (4w) | Parallel (8w) |
|---------|-------------|---------------|---------------|---------------|
| **Small** (10 items) | 0.25 ms | 0.51 ms (0.49x) | 0.69 ms (0.37x) | 0.63 ms (0.40x) |
| **Medium** (100 items) | 8.59 ms | 5.77 ms (**1.49x**) | 5.55 ms (๐Ÿš€ **1.55x**) | 7.13 ms (1.21x) |
| **Large** (1K items) | 231.96 ms | 232.84 ms (1.00x) | 232.79 ms (1.00x) | 244.08 ms (0.95x) |
| **XLarge** (5K items) | 1934.75 ms | 2022.40 ms (0.96x) | 1926.55 ms (1.00x) | 1975.37 ms (0.98x) |

**Key Findings**:
- โœ… **Medium datasets show 1.5x speedup** with 4 workers on free-threaded Python
- โœ… Free-threaded Python removes GIL bottleneck, enabling true parallel execution
- โš ๏ธ Small datasets still have overhead (not worth parallelizing)
- ๐Ÿค” Large/XLarge datasets show neutral results - likely XML string concatenation bottleneck

## Performance Analysis

### Sweet Spot: Medium Datasets (100-1K items)

The **medium dataset with 4 workers** shows the best improvement:
- **Standard GIL**: 7.56 ms serial, 7.86 ms parallel (0.96x - no benefit)
- **Free-threaded**: 8.59 ms serial, 5.55 ms parallel (**1.55x speedup** ๐Ÿš€)

This is the ideal use case for parallel processing.

### Why Large Datasets Don't Show More Improvement?

Potential bottlenecks for large datasets:
1. **String concatenation overhead**: Large XML strings being joined
2. **Pretty printing**: XML parsing and formatting (single-threaded)
3. **Memory allocation**: Large result strings
4. **I/O bottlenecks**: String building in Python

**Future optimizations** could address these by:
- Using more efficient string builders
- Parallelizing pretty-printing
- Chunk-based result assembly

### Optimal Configuration

Based on results:
- **4 workers** provides best performance on typical hardware
- **Automatic fallback** to serial for small datasets (< 100 items)
- **Enable parallel processing** for medium datasets (100-1K items)

## Speedup Comparison Chart

```
Medium Dataset (100 items) - Best Case

Standard GIL (Python 3.14):
Serial: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 7.56 ms
Parallel: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 7.86 ms (0.96x - slower!)

Free-threaded (Python 3.14t):
Serial: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 8.59 ms
Parallel: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 5.55 ms (1.55x faster! ๐Ÿš€)
```

## Recommendations

### For Users

1. **Use Python 3.14t** for best performance with parallel processing
2. **Enable parallel processing** for medium-sized datasets:
```python
converter = Json2xml(data, parallel=True, workers=4)
```
3. **Keep default serial** for small datasets (automatic in library)
4. **Benchmark your specific use case** - results vary by data structure

### For Development

1. **Medium datasets are the sweet spot** - focus optimization efforts here
2. **Investigate string building** for large datasets
3. **Consider streaming API** for very large documents
4. **Profile memory usage** with parallel processing

## Running Benchmarks Yourself

### Standard Python 3.14
```bash
uv run --python 3.14 python benchmark.py
```

### Free-threaded Python 3.14t
```bash
uv run --python 3.14t python benchmark.py
```

## Conclusion

โœ… **Free-threaded Python 3.14t enables real performance gains**
- Up to **1.55x faster** for medium datasets
- Removes GIL bottleneck for CPU-bound XML conversion
- Production-ready with automatic fallback for small datasets

๐ŸŽฏ **Best use case**: Medium-sized JSON documents (100-1,000 items) with complex nested structures

๐Ÿ”ฎ **Future potential**: Further optimizations could improve large dataset performance even more

---

*Benchmarks run on: macOS ARM64, Python 3.14.0, October 2025*
Loading
Loading