Skip to content

Use orjson for json_dumps #7391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

EugeneChung
Copy link

@EugeneChung EugeneChung commented Mar 31, 2025

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • New Query Runner (Data Source)
  • New Alert Destination
  • Other

Description

Following the discussion in #7339 (comment), I updated utils.json_dumps to use orjson for improved serialization performance.

Key implementation details:

  • orjson 3.10.15 is used, as it's the last version compatible with Python 3.8.
  • Pre-processing: Before calling orjson.dumps, data is pre-processed recursively using the existing custom JSONEncoder to maintain compatibility with Redash's current serialization specifications.
    • For instance, datetime serialization differs:
      • With the custom JSONEncoder: {"time": "2024-03-01T15:30:45.123"}
      • With orjson: {"time": "2024-03-01T15:30:45.123456"}
    • Note: Unlike the standard json module, orjson does not allow overriding serialization behavior for supported native types. The provided default function isn't called for these built-in supported types.
  • Option Mapping:
    • Default options are set to orjson.OPT_NON_STR_KEYS | orjson.OPT_UTC_Z, aligning with ensure_ascii=False behavior.
    • The sort_keys parameter maps directly to OPT_SORT_KEYS.
  • Testing: Added pytest cases to validate behavior aligned with the existing JSONEncoder specifications.

How is this tested?

  • Unit tests (pytest, jest)
  • E2E Tests (Cypress)
  • Manually
  • N/A

For Athena and Trino, the result of select 1.0, cast('NaN' as double), cast('Infinity' as double), cast('-Infinity' as double) is 1.0, null, null, null as expected.

Related Tickets & Documents

@eradman
Copy link
Collaborator

eradman commented Mar 31, 2025

Basic test works

SELECT 'NaN'::float AS not_a_number, 'Inf'::float AS inf, now() AS date

@EugeneChung
Copy link
Author

EugeneChung commented Apr 1, 2025

(venv) ~/project/private/redash git:[master]
ruff check --fix tests/test_utils.py
warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`:
  - 'ignore' -> 'lint.ignore'
  - 'select' -> 'lint.select'
  - 'mccabe' -> 'lint.mccabe'
  - 'per-file-ignores' -> 'lint.per-file-ignores'
All checks passed!
(venv) ~/project/private/redash git:[master]
ruff --version
ruff 0.11.2

@eradman
Copy link
Collaborator

eradman commented Apr 11, 2025

Is there a difference in the way key-value values are formatted?

- Options: {"dbname":"testdb1","host":"example.com"}
+ Options: {"dbname": "testdb1", "host": "example.com"}

https://github.com/getredash/redash/actions/runs/14329251848/job/40418005004?pr=7391

@EugeneChung
Copy link
Author

EugeneChung commented Apr 12, 2025

@eradman Yes. As I commented, orjson always uses compact separators. I'm going to fix the test.

@EugeneChung
Copy link
Author

Passed all failed tests.
image

* add orjson 3.10.15
* before calling orjson.dumps, process data using JSONEncoder to conform to the AS-IS redash spec
* Support mapping of some json.dumps options
   * orjson.OPT_NON_STR_KEYS | orjson.OPT_UTC_Z is similar to ensure_ascii=False.
* add pytest cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants