You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In section 3.3 I had problems coloring the scatter plot by label - one label was always missing when I added c = labels to the scatter call. I eventually tracked this down the the colormap, with one label ending up white. Adding cmp='plasma_r' sorted it out. I think that this issue came from the colorblind palette
In exercise 8 (PCA challenge), I'm not sure what is being asked for. Given that with n=7 I get perfect accuracy, I could try dropping components. Should this be a manual search, or an exhaustive search of all combinations of 1, 2, 3, 4, ... components?
In the univariate feature selection section, using the anova F as feature selection before doing SVC seems very circular. While it shouldn't contaminate the test set, it seems inevitable that it will give perfect performance on training. This is of course different to PCA, which is blind to the labels. This would seem an example for nested cross-validation in the next section
The text was updated successfully, but these errors were encountered:
Some suggestions to enhance the 04-dimensionality reduction notebook:
Personally, I prefer a different formulation for correlation. See eq.3 in https://en.wikipedia.org/wiki/Pearson_correlation_coefficient. It's also easier to calculate since all the (n-1)s cancel
In section 3.3 I had problems coloring the scatter plot by label - one label was always missing when I added c = labels to the scatter call. I eventually tracked this down the the colormap, with one label ending up white. Adding cmp='plasma_r' sorted it out. I think that this issue came from the colorblind palette
In exercise 8 (PCA challenge), I'm not sure what is being asked for. Given that with n=7 I get perfect accuracy, I could try dropping components. Should this be a manual search, or an exhaustive search of all combinations of 1, 2, 3, 4, ... components?
In the univariate feature selection section, using the anova F as feature selection before doing SVC seems very circular. While it shouldn't contaminate the test set, it seems inevitable that it will give perfect performance on training. This is of course different to PCA, which is blind to the labels. This would seem an example for nested cross-validation in the next section
The text was updated successfully, but these errors were encountered: