04-dimensionality reduction

Some suggestions to enhance the 04-dimensionality reduction notebook: 

- [ ] Personally, I prefer a different formulation for correlation. See eq.3 in https://en.wikipedia.org/wiki/Pearson_correlation_coefficient. It's also easier to calculate since all the (n-1)s cancel

- [ ] In section 3.3 I had problems coloring the scatter plot by label - one label was always missing when I added c = labels to the scatter call. I eventually tracked this down the the colormap, with one label ending up white. Adding cmp='plasma_r' sorted it out. I think that this issue came from the colorblind palette

- [ ] In exercise 8 (PCA challenge), I'm not sure what is being asked for. Given that with n=7 I get perfect accuracy, I could try dropping components. Should this be a manual search, or an exhaustive search of all combinations of 1, 2, 3, 4, ... components?

- [ ] In the univariate feature selection section, using the anova F as feature selection before doing SVC seems very circular. While it shouldn't contaminate the test set, it seems inevitable that it will give perfect performance on training. This is of course different to PCA, which is blind to the labels. This would seem an example for nested cross-validation in the next section

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

04-dimensionality reduction #54

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

04-dimensionality reduction #54

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions