Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inf values for variance explained during L2 regularization #614

Open
jkim0731 opened this issue Nov 28, 2023 · 2 comments
Open

Inf values for variance explained during L2 regularization #614

jkim0731 opened this issue Nov 28, 2023 · 2 comments

Comments

@jkim0731
Copy link
Collaborator

There is an edge case where -Inf values are returned during L2 regularization process.

This happens when fit_trace_test values are all 0 for certain cells, for certain split.

e.g. oeid = 879331157

result_dir = Path(r'\\allen\programs\braintv\workgroups\nc-ophys\visual_behavior\ophys_glm\v_24_events_all_L2_optimize_by_session')
run_param = json.load(open(alex_result_dir / 'run_params.json', 'r'))

session, fit, design = gft.load_fit_experiment(oeid, run_param)
num_splits = len(fit['ridge_splits'])
for split_ind in range(num_splits):
    test_split = fit['ridge_splits'][split_ind]
    fit_trace_test  = fit['fit_trace_arr'][test_split,:]
    print((fit_trace_test[:,78].values != 0).any())
for split_ind in range(num_splits):
    test_split = fit['ridge_splits'][split_ind]
    fit_trace_test  = fit['fit_trace_arr'][test_split,:]
    print((fit_trace_test[:,18].values != 0).any())

Eventually test_cv values for these 2 cells are all -Inf.

L329 of GLM_fit_tools.py gets around this by
fit['avg_L2_regularization'] = np.mean([fit['L2_grid'][x] for x in np.argmax(test_cv,1)])
where np.argmax returns 0 and lambda is set to L2_grid[0].

Is this intended?
I think it should be either nan'ed for splits with 0 trace or the cell should be removed from the analysis.

@jkim0731
Copy link
Collaborator Author

For oeid = 1050725735,
31/123 cells have the same problem.
Note: GLM from this experiment performed the least among Slc17a7-Cre VISp Novel (active) experiments

@alexpiet
Copy link
Collaborator

Yes, this seems like a mistake. When evaluating the models I check for splits with all 0s in the trace. I apparently do not when determining the L2 parameter. Note that the analysis is all done on "optimize_by_session" so we take the average L2 value across all the cells in the session. The result of this mistake is that the L2 value is sometimes smaller, because the cells with 0 traces in the CV splits will have "L2_grid[0]" as their best fitting L2 value. However since we average across cells its probably a small effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants