Skip to content

Fix: Preserve float columns in JSON loader when values are integer-like (e.g. 0.0, 1.0) #7635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ArjunJagdale
Copy link
Contributor

This PR fixes a bug in the JSON loader where columns containing float values like [0.0, 1.0, 2.0] were being implicitly coerced to int, due to pandas or Arrow type inference.

This caused issues downstream in statistics computation (e.g., dataset-viewer) where such columns were incorrectly labeled as "int" instead of "float".

🔍 What was happening:

When the JSON loader falls back to pandas_read_json() (after pa.read_json() fails), pandas/Arrow can coerce float values to integers if all values are integer-like (e.g., 0.0 == 0).

✅ What this PR does:

  • Adds a check in the fallback path of _generate_tables()
  • Ensures that columns made entirely of floats are preserved as "float64" even if they are integer-like (e.g. 0.0, 1.0)
  • This prevents loss of float semantics when creating the Arrow table

🧪 Reproducible Example:

[{"col": 0.0}, {"col": 1.0}, {"col": 2.0}]

Previously loaded as:

  • int

Now correctly loaded as:

  • float

Fixes #6937

…ke (e.g. 0.0, 1.0)

This PR fixes a bug in the JSON loader where columns containing float values like `[0.0, 1.0, 2.0]` were being implicitly coerced to `int`, due to pandas or Arrow type inference.

This caused issues downstream in statistics computation (e.g., dataset-viewer) where such columns were incorrectly labeled as `"int"` instead of `"float"`.

### 🔍 What was happening:
When the JSON loader falls back to `pandas_read_json()` (after `pa.read_json()` fails), pandas/Arrow can coerce float values to integers if all values are integer-like (e.g., `0.0 == 0`).

### ✅ What this PR does:
- Adds a check in the fallback path of `_generate_tables()`
- Ensures that columns made entirely of floats are preserved as `"float64"` even if they are integer-like (e.g. `0.0`, `1.0`)
- This prevents loss of float semantics when creating the Arrow table

### 🧪 Reproducible Example:
```json
[{"col": 0.0}, {"col": 1.0}, {"col": 2.0}]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON loader implicitly coerces floats to integers
1 participant