Skip to content

Conversation

@sudo-hannes
Copy link
Contributor

Add to_dict method for binning table serialization

Summary

Implements a to_dict() method that converts optimal bins, split points, and transformations to dictionary format for easy serialisation and export.

@sudo-hannes sudo-hannes reopened this Aug 29, 2025
@sudo-hannes sudo-hannes changed the base branch from master to develop August 29, 2025 07:53
@guillermo-navas-palencia guillermo-navas-palencia added the enhancement New feature or request label Aug 29, 2025
@guillermo-navas-palencia guillermo-navas-palencia added this to the v0.21.0 milestone Aug 29, 2025
@sudo-hannes
Copy link
Contributor Author

sudo-hannes commented Aug 29, 2025

@guillermo-navas-palencia

I saw that the test failed because "http://lib.stat.cmu.edu/datasets/boston" is no longer available. I found a similar dataset on Kaggle. When I use this dataset, all but one of the tests pass.

Can I open a separate PR to fix the test issue?

import numpy as np
import pandas as pd
class Data:
def __init__(self, data, target, feature_names):
self.data = data
self.target = target
self.feature_names = feature_names
def load_boston():
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep=r"\s+", skiprows=22, header=None)
raw_data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]
feature_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS',
'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
return Data(raw_data, target, feature_names)

@guillermo-navas-palencia
Copy link
Owner

guillermo-navas-palencia commented Aug 29, 2025

@guillermo-navas-palencia

I saw that the test failed because "http://lib.stat.cmu.edu/datasets/boston" is no longer available. I found a similar dataset on Kaggle. When I use this dataset, all but one of the tests pass.

Can I open a separate PR to fix the test issue?

import numpy as np
import pandas as pd
class Data:
def __init__(self, data, target, feature_names):
self.data = data
self.target = target
self.feature_names = feature_names
def load_boston():
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep=r"\s+", skiprows=22, header=None)
raw_data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]
feature_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS',
'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
return Data(raw_data, target, feature_names)

Hi @sudo-hannes. Thanks for your contribution! Please feel free to work on that PR :). In addition, we could consider saving a csv file and loading it directly without relying on external sources.

@guillermo-navas-palencia guillermo-navas-palencia merged commit b00530b into guillermo-navas-palencia:develop Sep 1, 2025
12 checks passed
@guillermo-navas-palencia guillermo-navas-palencia mentioned this pull request Oct 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants