Fix: Preserve float columns in JSON loader when values are integer-like (e.g. 0.0, 1.0) #7635

ArjunJagdale · 2025-06-24T06:16:48Z

This PR fixes a bug in the JSON loader where columns containing float values like [0.0, 1.0, 2.0] were being implicitly coerced to int, due to pandas or Arrow type inference.

This caused issues downstream in statistics computation (e.g., dataset-viewer) where such columns were incorrectly labeled as "int" instead of "float".

🔍 What was happening:

When the JSON loader falls back to pandas_read_json() (after pa.read_json() fails), pandas/Arrow can coerce float values to integers if all values are integer-like (e.g., 0.0 == 0).

✅ What this PR does:

Adds a check in the fallback path of _generate_tables()
Ensures that columns made entirely of floats are preserved as "float64" even if they are integer-like (e.g. 0.0, 1.0)
This prevents loss of float semantics when creating the Arrow table

🧪 Reproducible Example:

[{"col": 0.0}, {"col": 1.0}, {"col": 2.0}]

Previously loaded as:

int

Now correctly loaded as:

float

Fixes #6937

…ke (e.g. 0.0, 1.0) This PR fixes a bug in the JSON loader where columns containing float values like `[0.0, 1.0, 2.0]` were being implicitly coerced to `int`, due to pandas or Arrow type inference. This caused issues downstream in statistics computation (e.g., dataset-viewer) where such columns were incorrectly labeled as `"int"` instead of `"float"`. ### 🔍 What was happening: When the JSON loader falls back to `pandas_read_json()` (after `pa.read_json()` fails), pandas/Arrow can coerce float values to integers if all values are integer-like (e.g., `0.0 == 0`). ### ✅ What this PR does: - Adds a check in the fallback path of `_generate_tables()` - Ensures that columns made entirely of floats are preserved as `"float64"` even if they are integer-like (e.g. `0.0`, `1.0`) - This prevents loss of float semantics when creating the Arrow table ### 🧪 Reproducible Example: ```json [{"col": 0.0}, {"col": 1.0}, {"col": 2.0}]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Preserve float columns in JSON loader when values are integer-like (e.g. 0.0, 1.0) #7635

Fix: Preserve float columns in JSON loader when values are integer-like (e.g. 0.0, 1.0) #7635

ArjunJagdale commented Jun 24, 2025

Uh oh!

Uh oh!

Fix: Preserve float columns in JSON loader when values are integer-like (e.g. 0.0, 1.0) #7635

Are you sure you want to change the base?

Fix: Preserve float columns in JSON loader when values are integer-like (e.g. 0.0, 1.0) #7635

Conversation

ArjunJagdale commented Jun 24, 2025

🔍 What was happening:

✅ What this PR does:

🧪 Reproducible Example:

Uh oh!

Uh oh!