Branches

In our project development we initially created a version of the code to calculate the sum of the values in a file. Subsequently, we developed that software to add the functionality to calculate the average as well. The changes were simple and at all times the HEAD of the master branch represented a working piece of code as we commited more changes to the branch. This may not be the case when developing a complex changes to your project. It might be better to always ensure that the latest version of your code is in a stable, working state, while unstable development happens elsewhere. Especially if other people are using your code.

One solution would be to develop your changes off-line, bring them into the working directory when they have been tested and ready for release, and to commit those changes at one time. There are a few disadvantages with this approach. Firstly, it would make sense that the development also be done using a VCS to keep track of your changes. Secondly, the history of your development would be lost by making all the changes in one commit.

This workflow is handled in a VCS by using the concept of branches and merging. Branches allow you to work on several separate lines of development within the same repository. In essence, we create a branch of the code on which to do software development, leaving the main or master branch in a known, stable working state. When the development is ready for release, the main branch is then updated with the changes in the branch using a process called merging.

In this section, we are going to modify our code to also calculate and print the standard deviation of the data. However, we will do the development in a branch and later merge these changes back into the master branch.

Listing branches

All development within git is done in the context of a branch. Your working directory is usually created by a checkout of a branch of the repository. A branch is a linked list of commits. The commit your working directory comes from is called the HEAD.

You may recall that we glossed over branches earlier. By default, development is done in a branch called master. You can see what branches are available and which one the working directory is on by using git branch:

$ git branch
* master

Here we see that there is only one branch. The asterisk indicates which branch is currently checked out into the working directory.

A git repository can be represented as an Directed Acyclic Graph. The diagram below represents our repository at the moment:

_images/node_01_sum_and_average.png

Each commit is represented by a circle including its absolute Commit ID. The master branch is shown, as are the two tags we have created.

Creating a branch

Here is how we create a branch named std_dev from the HEAD of the master branch, and check it out into our working directory:

$ git checkout -b std_dev master
Switched to a new branch 'std_dev'

This is actually a shortcut for git branch std_dev master followed by git checkout std_dev. We can see which branches exist and which one we are currently on with:

$ git branch
  master
* std_dev

$ git status
# On branch std_dev
nothing to commit (working directory clean)

You can see, that we now have two branches and that the working directory is now on the std_dev branch. Any commits we make on this branch will not effect the master branch.

We can also see that this new branch has not lost any of its previous history:

$ git log
commit 9a202a66e7b2b93b12190dd0b56b594a60ebed22
Author: Your Name <your.name@yourdomain.com>
Date:   Thu Feb 7 11:36:27 2013 +1100

    modified stats.py to output the average as well

commit c6f2746af72c45d6f92741f51b5f4ae1241b79e2
Author: Your Name <your.name@yourdomain.com>
Date:   Tue Feb 5 16:36:40 2013 +1100

    put .gitignore under version control

commit aaad9ced7a33f5f8dc301411c2958f0267cfd82c
Author: Your Name <your.name@yourdomain.com>
Date:   Tue Feb 5 16:35:40 2013 +1100

    Initial commit of stats.py

Ok, now go ahead and modify stats.py to calculate and print the standard deviation of the values read in:

# Calculate the sum, average, and standard deviation
# of the values in a text file, formatted as one value per line.

import sys
import math

values = []
for line in open(sys.argv[1]):
   value = float(line)
   values.append(value)

total = sum(values)

diffSquared = []
for value in values:
    diff = value - average
    diffSquared.append(diff**2)

stdDev = math.sqrt(sum(diffSquared) / (len(values) - 1))

print len(values), 'values were read in'
print 'The sum of the input values is:', total
print 'The average of the input values is:', total / len(values)
print 'The standard deviation of the input values is:', stdDev

When you are happy that this update is working correctly, and you have checked your changes with git status and git diff, go ahead and commit the change:

$ git add stats.py

$ git commit -m"modified stats.py to also calculate and print out the standard deviation"
[std_dev 9b7129e] modified stats.py to also calculate and print out the standard deviation
 1 files changed, 15 insertions(+), 5 deletions(-)

Lets have a look at what a node diagram for our repository looks like now:

_images/node_02_std_dev_branch.png

After checking that you have a clean working directory, use git checkout to checkout the various versions of stats.py in your repository such as: the master and std_dev branches, as well as the sum and sum_and_average tags.

Deleting a branch

Sometimes, a branch we have been working on is no longer wanted. It could be that we had a flash of brilliance about a new algorithm but in practice it didn’t work out or the branch may have been successfully merged into another branch.

The normal command to delete a branch is git branch -d <branch name>. However git takes care to ensure you don’t delete a branch accidentally. It will warn you if you try to delete a branch that has not been merged into the current branch. If you really want to delete a branch that has not been merged, you use git branch -D <branch name>