Inf values for variance explained during L2 regularization #614

jkim0731 · 2023-11-28T05:38:25Z

There is an edge case where -Inf values are returned during L2 regularization process.

This happens when fit_trace_test values are all 0 for certain cells, for certain split.

e.g. oeid = 879331157

result_dir = Path(r'\\allen\programs\braintv\workgroups\nc-ophys\visual_behavior\ophys_glm\v_24_events_all_L2_optimize_by_session')
run_param = json.load(open(alex_result_dir / 'run_params.json', 'r'))

session, fit, design = gft.load_fit_experiment(oeid, run_param)
num_splits = len(fit['ridge_splits'])
for split_ind in range(num_splits):
    test_split = fit['ridge_splits'][split_ind]
    fit_trace_test  = fit['fit_trace_arr'][test_split,:]
    print((fit_trace_test[:,78].values != 0).any())
for split_ind in range(num_splits):
    test_split = fit['ridge_splits'][split_ind]
    fit_trace_test  = fit['fit_trace_arr'][test_split,:]
    print((fit_trace_test[:,18].values != 0).any())

Eventually test_cv values for these 2 cells are all -Inf.

L329 of GLM_fit_tools.py gets around this by
fit['avg_L2_regularization'] = np.mean([fit['L2_grid'][x] for x in np.argmax(test_cv,1)])
where np.argmax returns 0 and lambda is set to L2_grid[0].

Is this intended?
I think it should be either nan'ed for splits with 0 trace or the cell should be removed from the analysis.

The text was updated successfully, but these errors were encountered:

jkim0731 · 2023-11-28T06:18:24Z

For oeid = 1050725735,
31/123 cells have the same problem.
Note: GLM from this experiment performed the least among Slc17a7-Cre VISp Novel (active) experiments

alexpiet · 2023-11-28T19:42:35Z

Yes, this seems like a mistake. When evaluating the models I check for splits with all 0s in the trace. I apparently do not when determining the L2 parameter. Note that the analysis is all done on "optimize_by_session" so we take the average L2 value across all the cells in the session. The result of this mistake is that the L2 value is sometimes smaller, because the cells with 0 traces in the CV splits will have "L2_grid[0]" as their best fitting L2 value. However since we average across cells its probably a small effect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inf values for variance explained during L2 regularization #614

Inf values for variance explained during L2 regularization #614

jkim0731 commented Nov 28, 2023

jkim0731 commented Nov 28, 2023

alexpiet commented Nov 28, 2023

Inf values for variance explained during L2 regularization #614

Inf values for variance explained during L2 regularization #614

Comments

jkim0731 commented Nov 28, 2023

jkim0731 commented Nov 28, 2023

alexpiet commented Nov 28, 2023