- Teach: https://www.datacamp.com/teach/repositories/1545
- Campus: https://www.datacamp.com/courses/introduction-to-git-for-data-science
- Docs: https://authoring.datacamp.com
Version control is the beating heart of every productive programmer's workflow. It allows people to keep track of what they've done, and to share what they're currently doing with colleagues. This course is an introduction to version control with Git for data scientists who can already use the Unix shell to run simple commands and edit text files.
Please see the DataCamp Learner Profiles for details.
-
Anya has been using Subversion on the Unix command line for years, but has never used Git. This course will show her the similarities and differences between the two.
-
Catalina will take this course herself if she has time, as it would help her with her own research, but it is too advanced for her students.
-
Jasmine has never used version control, and has only just completed DataCamp's "Introduction to the Unix Shell". Most of the ideas in this course will be new to her. She will be disappointed to discover that Git doesn't handle Microsoft file formats cleanly.
-
Thanh has used Git from inside RStudio, but has never branched or merged. This course will show him what's going on under the hood when he does a commit, and how to collaborate with colleagues through GitHub.
-
Yngve already knows the material in this course.
- Clone the Git repository whose URL you have been given.
- View the changes made in the most recent commit.
- Create a new branch called
rewriting-conclusion
and switch to it. - In that branch, remove every occurrence of the word "not" from the last paragraph of the file
report.txt
. - Commit your changes with the log message "Correcting conclusions".
- While still in that branch, pull in the content of the
rewriting-intro
branch from the repository you cloned. - Merge the changes in that branch with the changes you just made, keeping your changes to the last paragraph and all of their changes to the other paragraphs.
- Push the merged to a newly-created branch called
rewriting-conclusion
in the repository you cloned.
Go into the directory dental
and look at its history.
- What is the log message of the most recent commit?
- Who made that commit?
- Go into the directory
dental
. - How many different commits have been made to
report.txt
?
- Go into the directory
dental
. - What files were added in the most recent commit?
- How many lines were changed in the file
data/western.csv
in the most recent commit that affected it?
- Create a directory called
workspace
. - Use
git init
to initialize Git in that directory. - Move the four data files from
dental/data
to the top level ofworkspace
, add them, and commit the change with "Starting to use Git" as the log message.
- Go into the directory
dental
. - Add a line containing the data shown below to the file
eastern.csv
. Do not add any blank lines before or after this line. - Commit your change with the log message "Adding September's data."
2017-11-30,bicuspid
- Go into the directory
dental
. - Undo the most recent two commits.
- What branches are there in the directory
dental
? - Which of these branches are you currently on?
- Go into the directory
dental
. - Compare the contents of the
master
branch with thesummary-statistics
branch. - Which files contain differences?
- Go into the directory
dental
. - Create a new branch called
restarting
. - Delete the file
report.txt
in that branch and commit your changes without affecting any other branch. Leaverestarting
as the active branch when you are finished.
- Go into the directory
dental
. - Merge the branch
summary-statistics
into the branchmaster
using "Consolidating work" as the log message.
- Go into the directory
dental
. - Merge the branch
alter-report-title-branch
into the branchmaster
. - Resolve the conflicts so that the final title is "Dental Work by Season".
- Commit the reconciled version with the log message "Integrating changes".
Clone the repository file:///tmp/dental
to create a repository called dental
in your home directory.
- Go into the repository
dental
. - Add a remote called
upstream
with the URLfile:///tmp/dental
.
- Go into the repository
dental
. - Pull changes from the
master
branch of the remoteupstream
into themaster
branch of your repository.
- Go into the repository
dental
. - Delete the file
report.txt
. - Commit your change with the log message "Getting rid of report".
- Push your change to the remote repository
upstream
.
- Go into the repository
dental
. - Pull changes from the
master
branch of the remoteupstream
into the `master branch of your repository. - Resolve the conflicts in
report.txt
so that the title is "Final Report: Regional Dental Work". - Commit your resolution with the log message "Integrating changes".
- Viewing a project's history
- Viewing the log with
git log
- Line-by-line history with
git blame
- Log messages
- Viewing differences with
git diff
- Naming commits with hashes
- Naming commits with
HEAD~N
- Viewing the log with
- Making changes
- Viewing work in progress with
git status
- Saving changes to existing files with
git add
andgit commit
- Adding new files with
git add
andgit commit
- Canceling changes in progress
- Viewing work in progress with
- Working with branches
- Listing branches with
git branch
- Switching between branches with
git checkout
- Viewing differences between branches
- Merging changes with
git merge
- Recognizing conflicts
- Resolving conflicts
- Avoiding conflicts
- Undoing changes with
git reset
- Undoing changes with
git revert
- Tagging with
git tag
- Listing branches with
- Managing repositories
- Initializing a repository with
git init
- Ignoring files with
.gitignore
- Viewing and configuring preferences with
git config
- Listing remotes with
git remote
- Adding and removing remotes
- Pulling from branches in remote repositories with
git pull
- Pushing to branches in remote repositories with
git push
- Initializing a repository with
The "datasets" are:
- A repository called
dental
with some history and branches. This repository will be in the user's home directory for most exercises; for others, it will be moved to/tmp/dental
for the user to clone and/or set as a remote.
Course Description
Version control is one of the power tools of programming. It allows you to keep track of what you did when, undo any changes you have decided you don't want, and collaborate at scale with other people. This lesson will introduce you to Git, a modern version control tool that is very popular with data scientists and software developers alike, and show you how it can help you get more done in less time and with less pain.
Learning Objectives
- Explain the pros and cons of version control compared to alternatives like Dropbox and Google Docs.
- Create new repositories and turn existing projects into repositories.
- Configure basic settings in Git.
- View and explain a repository's history.
- Save changes to files.
- Resolve conflicts that arise when changing files.
- Create and navigate branches.
- Undo changes to files.
- Explain the relationships between commits, branches, and remote repositories.
- Pull changes from, and push changes to, remote repositories.
Prerequisites