Skip to content

Latest commit

 

History

History

ch04

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Python Machine Learning - Code Examples

Chapter 4: Building Good Training Datasets – Data Preprocessing

Chapter Outline

  • Dealing with missing data
    • Identifying missing values in tabular data
    • Eliminating training examples or features with missing values
    • Imputing missing values
    • Understanding the scikit-learn estimator API
  • Handling categorical data
    • Nominal and ordinal features
    • Creating an example dataset
    • Mapping ordinal features
    • Encoding class labels
    • Performing one-hot encoding on nominal features
  • Partitioning a dataset into separate training and test sets
  • Bringing features onto the same scale
  • Selecting meaningful features
    • L1 and L2 regularization as penalties against model complexity
    • A geometric interpretation of L2 regularization
    • Sparse solutions with L1 regularization
    • Sequential feature selection algorithms
  • Assessing feature importance with random forests
  • Summary

Please refer to the README.md file in ../ch01 for more information about running the code examples.