-
Notifications
You must be signed in to change notification settings - Fork 183
Open
Description
def feature_bin_stats(df_bin,feature,target):
"""calculate the detail info of a feature after bin
Args:
df_bin (dataframe has featute and target columns)
feature (str)
target (str)
Returns:
DataFrame: contains good, bad, badrate, prop, y_prop, n_prop, woe, iv
"""
table = df_bin[[feature, target]].groupby([feature, target]).agg(len).unstack().reset_index()
table = table.rename(columns = {0 : 'good', 1 : 'bad'})
table['total'] = table['good'] + table['bad']
table['badrate'] = table['bad'] / table['total']
table['prop'] = table['total'] / table['total'].sum()
# mye question here
table['y_prop'] = table['good'] / table['good'].sum()
table['n_prop'] = table['bad'] / table['bad'].sum()
table['woe'] = table.apply(lambda x : WOE(x['y_prop'], x['n_prop']),axis=1)
table['iv'] = table.apply(lambda x : (x['y_prop'] - x['n_prop']) * WOE(x['y_prop'], x['n_prop']), axis=1)
return table
Should the y_prop refers to the bad proportion while n_prop refers to the good proportion? As we may see for the definition of probability function in stats.py
def probability(target, mask = None):
"""get probability of target by mask
"""
if mask is None:
return 1, 1
counts_0 = np_count(target, 0, default = 1)
counts_1 = np_count(target, 1, default = 1)
sub_target = target[mask]
sub_0 = np_count(sub_target, 0, default = 1)
sub_1 = np_count(sub_target, 1, default = 1)
y_prob = sub_1 / counts_1
n_prob = sub_0 / counts_0
return y_prob, n_prob
y_prob is the fraction of 1 while n_prob is the fraction of 0.
Pleae coorect me if I am wrong thank you.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels