Potential Issue for  function feature_bin_stats in stats.py

```
def feature_bin_stats(df_bin,feature,target):
    """calculate the detail info of a feature after bin

    Args:
        df_bin (dataframe has featute and target columns)
        feature (str)
        target (str)
    Returns:
        DataFrame: contains good, bad, badrate, prop, y_prop, n_prop, woe, iv
    """
    table = df_bin[[feature, target]].groupby([feature, target]).agg(len).unstack().reset_index()
    table = table.rename(columns = {0 : 'good', 1 : 'bad'}) 
    table['total'] = table['good'] + table['bad']
    table['badrate'] = table['bad'] / table['total']
    table['prop'] = table['total'] / table['total'].sum()
   
    # mye question here 
    table['y_prop'] = table['good'] / table['good'].sum()
    table['n_prop'] = table['bad'] / table['bad'].sum()
   
    table['woe'] = table.apply(lambda x : WOE(x['y_prop'], x['n_prop']),axis=1)
    table['iv'] = table.apply(lambda x : (x['y_prop'] - x['n_prop']) * WOE(x['y_prop'], x['n_prop']), axis=1)
    return table
```


Should the y_prop refers to the bad proportion while n_prop refers to the good proportion? As we may see for the definition of probability function in stats.py 
```
def probability(target, mask = None):
    """get probability of target by mask
    """
    if mask is None:
        return 1, 1

    counts_0 = np_count(target, 0, default = 1)
    counts_1 = np_count(target, 1, default = 1)

    sub_target = target[mask]

    sub_0 = np_count(sub_target, 0, default = 1)
    sub_1 = np_count(sub_target, 1, default = 1)

    y_prob = sub_1 / counts_1
    n_prob = sub_0 / counts_0

    return y_prob, n_prob
```
y_prob is the fraction of 1  while n_prob is the fraction of 0.

Pleae coorect me if I am wrong thank you. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Issue for function feature_bin_stats in stats.py #155

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential Issue for function feature_bin_stats in stats.py #155

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions