You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suppose that I have the node named "A0516" containing two categories (0 and 1) after discretization. With the function of 'bn.predict_probability()', we can get a DataFrame object ('predictions'), which contains two columns ('A0516_0' and 'A0516_1'). Unfortunately, the following code in the 'roc_auc()' function will cause it to become 4 columns, leading to a error when we use roc_auc().
The original purpose of 'x.lstrip(node+"_ ")' was to convert 'A0516_0' and 'A0516_1' into '0' and '1'. However, since both '0' and '1' are present in the string "A0516", this results in two identical empty strings, which causes the number of columns in "predictions" to double after sorting and leads to subsequent errors.
ValueError Traceback (most recent call last)
Cell In[16], line 1
----> 1 roc, auc = roc_auc(bn, df_discrete, "A0516")
2 print(auc)
File ~\.conda\envs\causenet_python\lib\site-packages\causalnex\evaluation\evaluation.py:106, in roc_auc(bn, data, node)
103 predictions.rename(columns=lambda x: x.lstrip(node + "_"), inplace=True)
104 predictions = predictions[sorted(predictions.columns)]
--> 106 fpr, tpr, _ = metrics.roc_curve(
107 ground_truth.values.ravel(), predictions.values.ravel()
108 )
109 roc = list(zip(fpr, tpr))
110 auc = metrics.auc(fpr, tpr)
File ~\.conda\envs\causenet_python\lib\site-packages\sklearn\utils\_param_validation.py:214, in validate_params.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
208 try:
209 with config_context(
210 skip_parameter_validation=(
211 prefer_skip_nested_validation or global_skip_validation
212 )
213 ):
--> 214 return func(*args, **kwargs)
215 except InvalidParameterError as e:
216 # When the function is just a wrapper around an estimator, we allow
217 # the function to delegate validation to the estimator, but we replace
218 # the name of the estimator by the name of the function in the error
219 # message to avoid confusion.
220 msg = re.sub(
221 r"parameter of \w+ must be",
222 f"parameter of {func.__qualname__} must be",
223 str(e),
224 )
File ~\.conda\envs\causenet_python\lib\site-packages\sklearn\metrics\_ranking.py:1095, in roc_curve(y_true, y_score, pos_label, sample_weight, drop_intermediate)
993 @validate_params(
994 {
995 "y_true": ["array-like"],
(...)
1004 y_true, y_score, *, pos_label=None, sample_weight=None, drop_intermediate=True
1005 ):
1006 """Compute Receiver operating characteristic (ROC). 1007 1008 Note: this implementation is restricted to the binary classification task. (...) 1093 array([ inf, 0.8 , 0.4 , 0.35, 0.1 ]) 1094 """
-> 1095 fps, tps, thresholds = _binary_clf_curve(
1096 y_true, y_score, pos_label=pos_label, sample_weight=sample_weight
1097 )
1099 # Attempt to drop thresholds corresponding to points in between and
1100 # collinear with other points. These are always suboptimal and do not
1101 # appear on a plotted ROC curve (and thus do not affect the AUC).
(...)
1106 # but does not drop more complicated cases like fps = [1, 3, 7],
1107 # tps = [1, 2, 4]; there is no harm in keeping too many thresholds.
1108 if drop_intermediate and len(fps) > 2:
File ~\.conda\envs\causenet_python\lib\site-packages\sklearn\metrics\_ranking.py:806, in _binary_clf_curve(y_true, y_score, pos_label, sample_weight)
803 if not (y_type == "binary" or (y_type == "multiclass" and pos_label is not None)):
804 raise ValueError("{0} format is not supported".format(y_type))
--> 806 check_consistent_length(y_true, y_score, sample_weight)
807 y_true = column_or_1d(y_true)
808 y_score = column_or_1d(y_score)
File ~\.conda\envs\causenet_python\lib\site-packages\sklearn\utils\validation.py:407, in check_consistent_length(*arrays)
405 uniques = np.unique(lengths)
406 if len(uniques) > 1:
--> 407 raise ValueError(
408 "Found input variables with inconsistent numbers of samples: %r"
409 % [int(l) forlin lengths]
410 )
ValueError: Found input variables with inconsistent numbers of samples: [36, 72]
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Contact Details
[email protected]
Short description of the problem here.
Suppose that I have the node named "A0516" containing two categories (0 and 1) after discretization. With the function of 'bn.predict_probability()', we can get a DataFrame object ('predictions'), which contains two columns ('A0516_0' and 'A0516_1'). Unfortunately, the following code in the 'roc_auc()' function will cause it to become 4 columns, leading to a error when we use roc_auc().
predictions = bn.predict_probability(data, node)
predictions.rename(columns=lambda x: x.lstrip(node + "_"), inplace=True)
predictions = predictions[sorted(predictions.columns)]
The original purpose of 'x.lstrip(node+"_ ")' was to convert 'A0516_0' and 'A0516_1' into '0' and '1'. However, since both '0' and '1' are present in the string "A0516", this results in two identical empty strings, which causes the number of columns in "predictions" to double after sorting and leads to subsequent errors.
CausalNex Version
0.12.1
Python Version
3.8.20
Relevant code snippet
Relevant log output
Code of Conduct
The text was updated successfully, but these errors were encountered: