Companion repository for CORE: Data Science and Machine Learning. This project collects curated books, articles, and practice material for building a career in data science.
A prioritized list of core skills, reading material, personal portfolio projects, and practice assignments every new data scientist should have.
- Learning Paths
- Combined Core Skills
- Certification Checklist
- Portfolio Project Checklist
- Next Steps
- Latest News
Start with Foundations, then dive deeper based on the role you want.
| Path | Focus | Highlights |
|---|---|---|
| Foundations | First principles and community | Library recommendations, data science communities, public datasets, resume & interview tools |
| Data Analyst | Situational awareness & reporting | Book recommendations, spreadsheet practice, SQL training, business intelligence |
| General Data Scientist | Statistical decision making with R | Book recommendations, cheatsheets, probability & statistics, R resources, web development |
| Machine Learning Engineer | Building and deploying ML products | Book recommendations, Python practice, math foundations, deployment, deep learning |
The foundational checklist for every data scientist lives in core-skills.md. Work through the list from top to bottom to ensure there are no knowledge gaps.
Certifications are a tricky thing. They don't really demonstrate mastery but can make the difference on getting an interview. Here are our minimum recommended certifications. However, if you cannot afford to complete these certifications don't worry! Use the Kaggle courses and LinkedIn Assessments instead and let your project portfolio show your competence!
| Category | Name | Link | Notes |
|---|---|---|---|
| All | 2023 CORE: Data Science and Machine Learning | Link | |
| Data Analyst | LinkedIn Excel | Link | |
| Data Analyst | Kaggle SQL | Link and Link | |
| Data Analyst | Tableau Data Analyst | Link | |
| General Data Scientist | LinkedIn R Assessment | Link | |
| Machine Learning Engineer | Andrew Ng's Intro ML Course | Link | |
| Cloud - ML | AWS Certified Machine Learning - Specialty | Link | Only need 1 of 3 |
| Cloud - ML | Google Professional Machine Learning Engineer | Link | Only need 1 of 3 |
| Cloud - ML | Azure Data Scientist Associate | Link | Only need 1 of 3 |
We recommend you use GitHub Pages and blogdown to host your portfolio as shown in the course. Recommended minimal list of hosted projects:
- 2x MS Excel dashboards - hosted as webpages
- 1x Tableau Public dashboard
- 1x Tableau Public story
- 2x EDA of a dataset using RMarkdown - published on Kaggle as well
- 2x EDA of a dataset and ML model development using Python - published on Kaggle as well
- 1x deploy an ML model to the cloud using AWS (or similar) EC2 and a docker container
The course walks you through or gives resources needed to complete each of these. Make sure you use novel datasets in your portfolio! If you only use the data from the course it will be very similar to everyone else...
If you have completed the certification checklist, built a resume and hosted project portfolio you are ready to start work! The next step in your learning journey should be to decide which of the job types you want to dive deeper into. Here are the recommended next learning resources for each:
- Data Analyst - Work to become one of the Tableau Visionaries
- General Data Scientist - Create and publish an R Package to CRAN
- Machine Learning Engineer - Complete the fast.ai course 'Deep learning for coders'
- Everyone - compete in a competition on Kaggle
Stay tuned for updates!