Skip to content

05-Optimzation: Enhancements and fixes #57

Open
@manojneuro

Description

@manojneuro

The following items can be improved in this notebook:

  • The early sections "Recap" and "Dataset" are almost identical, so redundant

  • Exercise 1
    Presumably the expectation is to separate the train/test sets for the classifier and also for the voxel selection. It might be worth emphasizing that using all the data for voxel selection is a common but subtle error. There are probably quite a few good examples in the literature that got past less technical reviewers
    In this example, I consistently get slightly below chance performance. I believe that this is driven by the cross-validation, see:
    Classification based hypothesis testing in neuroscience: Below‐chance level classification rates and overlooked statistical properties of linear parametric classifiers. HBM 2016
    Another subtle example of bias is given in the following by Watts et al 😊 Potholes and Molehills: Bias in the Diagnostic Performance of Diffusion-Tensor Imaging in Concussion. Radiology 2014

  • In 3.1 Grid search
    Strictly, the dependence of the number of combinations on granularity of the grid search is not exponential

  • 3.2 Regularization Example: L2 vs L1
    L1 regularization now requires solver='saga' in LogisticRegression call for L1 penalty. This is probably a change in the default behavior of Scikit Learn

  • 4. Build a Pipeline
    As with 3.1, there seem to be a lot of parameters that give perfect accuracy. Maybe classifying by blocks is too easy, and the number of blocks is relatively low, so big steps in accuracy

  • c_steps = [10e-1, 10e0, 10e1, 10e2] is confusing notation for exponents

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions