You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following items can be improved in this notebook:
The early sections "Recap" and "Dataset" are almost identical, so redundant
Exercise 1
Presumably the expectation is to separate the train/test sets for the classifier and also for the voxel selection. It might be worth emphasizing that using all the data for voxel selection is a common but subtle error. There are probably quite a few good examples in the literature that got past less technical reviewers
In this example, I consistently get slightly below chance performance. I believe that this is driven by the cross-validation, see:
Classification based hypothesis testing in neuroscience: Below‐chance level classification rates and overlooked statistical properties of linear parametric classifiers. HBM 2016
Another subtle example of bias is given in the following by Watts et al 😊 Potholes and Molehills: Bias in the Diagnostic Performance of Diffusion-Tensor Imaging in Concussion. Radiology 2014
In 3.1 Grid search
Strictly, the dependence of the number of combinations on granularity of the grid search is not exponential
3.2 Regularization Example: L2 vs L1
L1 regularization now requires solver='saga' in LogisticRegression call for L1 penalty. This is probably a change in the default behavior of Scikit Learn
4. Build a Pipeline
As with 3.1, there seem to be a lot of parameters that give perfect accuracy. Maybe classifying by blocks is too easy, and the number of blocks is relatively low, so big steps in accuracy
c_steps = [10e-1, 10e0, 10e1, 10e2] is confusing notation for exponents
The text was updated successfully, but these errors were encountered:
The following items can be improved in this notebook:
The early sections "Recap" and "Dataset" are almost identical, so redundant
Exercise 1
Presumably the expectation is to separate the train/test sets for the classifier and also for the voxel selection. It might be worth emphasizing that using all the data for voxel selection is a common but subtle error. There are probably quite a few good examples in the literature that got past less technical reviewers
In this example, I consistently get slightly below chance performance. I believe that this is driven by the cross-validation, see:
Classification based hypothesis testing in neuroscience: Below‐chance level classification rates and overlooked statistical properties of linear parametric classifiers. HBM 2016
Another subtle example of bias is given in the following by Watts et al 😊 Potholes and Molehills: Bias in the Diagnostic Performance of Diffusion-Tensor Imaging in Concussion. Radiology 2014
In 3.1 Grid search
Strictly, the dependence of the number of combinations on granularity of the grid search is not exponential
3.2 Regularization Example: L2 vs L1
L1 regularization now requires solver='saga' in LogisticRegression call for L1 penalty. This is probably a change in the default behavior of Scikit Learn
4. Build a Pipeline
As with 3.1, there seem to be a lot of parameters that give perfect accuracy. Maybe classifying by blocks is too easy, and the number of blocks is relatively low, so big steps in accuracy
c_steps = [10e-1, 10e0, 10e1, 10e2] is confusing notation for exponents
The text was updated successfully, but these errors were encountered: