Using the History

Up until now, we have been concerned with setting up a repository, putting a project under version control, and keeping track of the changes to the project files.

One of the reasons we have been doing this is that at some time we will want to go back through the project history to view or work with previously versions of our files.

Unintentional Changes

I am sure we have all done this. At some point in developing some software we either accidentally delete an important file, modify one and then decide that we want to restore a file to the way it was, delete some lines and find that they contained an important bit of information, or we just want to restore our working directory to the way it was.

In this example, I have accidently removed the loop in stats.py and replaced it with a comment saying that I have messed up.

The first step is to have a look at the current status of the working directory to see how it differs from the last commit:

$ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#    modified:   stats.py
#
no changes added to commit (use "git add" and/or "git commit -a")

In this example, we see that stats.py has been modified. Good, that is the file I messed up and I want to restore it to the way it was. But before that, let me see what I changed:

$ git diff stats.py
diff --git a/stats.py b/stats.py
index da65fdd..ae4784d 100644
--- a/stats.py
+++ b/stats.py
@@ -4,9 +4,9 @@
 import sys

 values = []
-for line in open(sys.argv[1]):
-   value = int(line)
-   values.append(value)
+
+# I really messed up here and deleted this loop and I have
+# no idea how it opened the file and read data from it!

 total = sum(values)

The diff format shows removed lines by prefacing them with a -, and added lines with a +. Unchanged lines are added for context.

Now, before you grab your mouse with the intention of cutting and pasting the good lines back into your file, lets see how easy it is to do this using git:

$ git checkout stats.py

Yes, it is that simple! And it works even if you accidentally delete a file!

Just run git checkout . if you have modified numerous files in your project and want to restore them all to the way they were. However, it is worth running git status first to see what has been modified. This will not work if you have deleted a number of files. You will need to name each file explictly.

If you have deleted many files and directories and want to restore your working directory to the state of the last commit, you run:

$ git status
# On branch master
# Changed but not updated:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#    deleted:    stats.py
#
no changes added to commit (use "git add" and/or "git commit -a")

$ git reset HEAD --hard
HEAD is now at 9a202a6 modified stats.py to output the average as well

Working with other Commits

There will be times when we will want to work with various versions (commits) of the repository. There are three ways in which we can identify commits made to the repository, using:

  • Absolute Commit IDs,
  • Relative Commit IDs, and
  • Tags.

Absolute Commit IDs

Every commit made in git is identified by a unique ID which is a 40 byte string representing a 160 bit SHA1 hash derived from the information in a commit, including the author, commit date and time, the commit’s parent(s), and the contents of the project at the time of the commit. You will have seen such strings whenever you made a commit or viewed the log:

$ git log
commit 9a202a66e7b2b93b12190dd0b56b594a60ebed22
Author: Your Name <your.name@yourdomain.com>
Date:   Thu Feb 7 11:36:27 2013 +1100

    modified stats.py to output the average as well

commit c6f2746af72c45d6f92741f51b5f4ae1241b79e2
Author: Your Name <your.name@yourdomain.com>
Date:   Tue Feb 5 16:36:40 2013 +1100

    put .gitignore under version control

commit aaad9ced7a33f5f8dc301411c2958f0267cfd82c
Author: Your Name <your.name@yourdomain.com>
Date:   Tue Feb 5 16:35:40 2013 +1100

    Initial commit of stats.py

You can restore your working directory to the state that existed at any of these commits by running git checkout <commit-id> by specifying any of these 40 byte Absolute IDs. For example, to checkout the initial version of the repository containing the version of stats.py that only calculated the sum, you would run:

$ git checkout aaad9ced7a33f5f8dc301411c2958f0267cfd82c
Note: checking out 'aaad9ced7a33f5f8dc301411c2958f0267cfd82c'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at aaad9ce... Initial commit of stats.py

$ cat stats.py
# Calculate the total of the values in a text file,
# formatted as one value per line.

import sys

values = []
for line in open(sys.argv[1]):
   value = int(line)
   values.append(value)

total = sum(values)

print len(values), ' Values were read in'
print 'The total of the input values is:', total

The warning message about being in a detached HEAD state is basically just saying that you are no longer on a branch. Any commits you make here can be lost, unless you create a branch, which we will cover next.

The SHA-1 hash derived from the commit is for all intents and purposes, unique, and are use to reference the commit.

To restore your working directory to the latest version of your code:

$ git checkout master
Previous HEAD position was aaad9ce... Initial commit of stats.py
Switched to branch 'master'

The specifics for this command will be covered in more detail when we cover branches. But for now, all the commits we have been making have been in a default branch called master. By default, git will checkout the latest commit in a branch.

One final point. If you think that cutting and pasting 40 bytes everytime you want to checkout a version of your project, you are not alone. There are at least three ways around this. The first is that git will accept the first few characters instead of the whole string as long as the shortened reference is unique within the repository. So if we want to look at the log for commit aaad9ced7a33f5f8dc301411c2958f0267cfd82c, we can try:

$ git log a
fatal: ambiguous argument 'a': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions

$ git log aaad
commit aaad9ced7a33f5f8dc301411c2958f0267cfd82c
Author: Your Name <your.name@yourdomain.com>
Date:   Tue Feb 5 16:35:40 2013 +1100

    Initial commit of stats.py

You will usually need to type in four or more characters to identify an ID uniquely.

Relative Commit IDs

There are times when we can easily specify a commit relative to where we are in a branch. The currently checked-out commit or branch is referred to as the HEAD. The previous commit is HEAD~, two commits ago is HEAD~~ or HEAD~2, and so on. To look at the log for the current commit:

$ git log -1 HEAD
commit 9a202a66e7b2b93b12190dd0b56b594a60ebed22
Author: Your Name <your.name@yourdomain.com>
Date:   Thu Feb 7 11:36:27 2013 +1100

    modified stats.py to output the average as well

Two commits ago:

$ git log -1 HEAD~~
commit aaad9ced7a33f5f8dc301411c2958f0267cfd82c
Author: Your Name <your.name@yourdomain.com>
Date:   Tue Feb 5 16:35:40 2013 +1100

    Initial commit of stats.py

Note that adding the -n option to git log limits the output to the previous n commits (see git help log).

As well as viewing the log of a specific commit, you can view what changes were made in that commit with git show <commit-id>, or view the differences between the current HEAD and a previous commit, for instance:

$ git diff HEAD~~ stats.py
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..00b1d66
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,2 @@
+data.py
+data.txt
diff --git a/stats.py b/stats.py
index 28dfd6d..2193c9c 100644
--- a/stats.py
+++ b/stats.py
@@ -12,3 +12,4 @@ total = sum(values)

 print len(values), 'values were read in'
 print 'The sum of the input values is:', total
+print 'The average of the input values is:', total / len(values)

Tags

Ok, the above information is no doubt very useful, but you may have noticed that it was git that called the tune and determined the names for our commits. What if we want to define meaningful names for commits. This is easily done and in git they are called tags.

In our example, we have two places in our project development that might be worth a tag. The initial version of stats.py that calculated the sum of the input data, and the latest version that calculates the sum and average. Let’s create two tags sum and sum_and_average to point to these two places in the development. We use git tag <my tag name> <commit ID> to do this:

$ git tag sum aaad9c

$ git tag sum_and_average 9a202a

Using git tag by itself will list all the tags currently in the repository:

$ git tag
sum
sum_and_average

And we can view a version of the log with branches and tags shown:

$ git log --decorate
commit 9a202a66e7b2b93b12190dd0b56b594a60ebed22 (HEAD, tag: sum_and_average, master)
Author: Your Name <your.name@yourdomain.com>
Date:   Thu Feb 7 11:36:27 2013 +1100

    modified stats.py to output the average as well

commit c6f2746af72c45d6f92741f51b5f4ae1241b79e2
Author: Your Name <your.name@yourdomain.com>
Date:   Tue Feb 5 16:36:40 2013 +1100

    put .gitignore under version control

commit aaad9ced7a33f5f8dc301411c2958f0267cfd82c (tag: sum)
Author: Your Name <your.name@yourdomain.com>
Date:   Tue Feb 5 16:35:40 2013 +1100

    Initial commit of stats.py

When we want to checkout these two versions of the code we just specify the appropriate tag to git, ie git checkout sum_and_average.

Other uses of tags could be to specify tested, working versions of your code (v1.0, v2.0), snapshots of the code you used for specific projects, analysis (CMIP5_analysis), or paper submissions (GRL_first_submission).

You can also quickly define tags during development, debugging and testing, using them like bookmarks and deleting them when they are no longer wanted.