Version Control Systems
Recently, I gave a presentation to my research group at the university on Version Control Systems, specifically regarding why and how to use Git. Given my position as a grad student, that may sound odd, so here’s some context. While there exists a non-trivial amount of programming involved in the Geomatics department, our program does not necessarily cover a wealth of software engineering practices or techniques. Certainly one of the greater tragedies of the department, is that there’s simply not enough time to teach everybody both the underlying mathematics that we use day-to-day, and likewise how to better organize and write software. Given I do have some experience with version control software, I decided it may be useful for other members of my group if I presented some of this knowledge to the group. I recieved some really positive feedback from the presentation, and decided to make a post about it, if only for future reference.
A tale of two developers
Before discussing what a version control system (VCS) does, it’s pertinent to consider why we might want one. If you’re the type of developer who has never used one before, you may have never identified or explicitly enumerated some of the experiences below. One of the biggest issues I find with convincing beginners to adopt a VCS, is that they find it too much work, and don’t directly appreciate the differences in mindset or workflow that a VCS should provide. That said, I’m going to start with two tales, specifically that of Joe Blub and Mary Blub.
Meet Joe Blub. Joe is a software developer, and although Joe sometimes makes mistakes, he is considered a very strong developer at his company of employment, BlubCo. Joe works alone on some software that’s internal to the company, and the projects he writes are small enough that he can maintain them alone, but large enough that he isn’t able to keep every line of code on his screen or in his head at a single given moment.
When it comes to developing software, nobody tries to stay on the bleeding edge like Joe does. He’s set himself up with a snazzy new text editor, and by pressing a combination on his keyboard that’s as simple as Control+Z, he can undo mistakes and walk back through his edit history. Joe also doesn’t keep many copies of his code around. When he edits a file, he doesn’t make a new copy each time, and instead chooses to save over the previous copy of the file once he’s done making his edits. After all, “Why needlessly copy your source code outside of the regularly scheduled backups?”, Joe always says. He doesn’t want the clutter of files on his computer, and so far, this has been working for him. Needless to say, after a long week of work, Joe decides to go out with his BlubCo peers, and takes a well-deserved trip to the pub.
Unfortunately, when Joe returns on Monday morning, he’s noticed that there has been a power outage at the BlubCo offices. Joe is nervous, because his desktop shut down unexpectedly, but he eventually opens up what he was working on last Friday. Much to his dismay, his software is no longer working. Moreover, he had been making big refactoring changes on Friday. As perhaps the cherry on the cake, Joe can no longer remember exactly what he had changed when he left Friday evening, and has since lost his edit history, because of the power outage. Joe doesn’t know what to do.
Let’s shift gears into Mary’s story. Mary is a software developer at BlubCo just like Joe, but unlike Joe, Mary works in a team. Although this is Mary’s first job as a senior developer, she is finding some serious shortcomings with how her team is organized. Specifically, coordination has become a major issue amongst her team members, as they are always passing around code with changes about the office. It seems like chaos, but work still somehow manages to get done by the end of the day. Here are some of the ways her team works:
Firstly, Mary’s team is set up so that only one person can edit the codebase at a time. They’ve noticed that if two or more people edit the code at the same time, they often run into conflicts, where two developers will change the same file, and it becomes very difficult to decide whose edits are kept. For this reason, their team has shifted to only allow one person to change things at a time. This means that for the most part of the day, most of Mary’s otherwise talented team sits in atrophy, unable to move on to more productive things on their own.
To distribute code, Mary’s coworker has set up a file sharing service. They share their code by passing zip files of each version around, and use the file name to show which edits have been made recently. This helps Mary keep track of who made what change last, so she can ask for help if the software encounters problems. Currently, the zip file looks like this:
Where did it all go wrong?
Beyond both these stories, the future that unfolds is somewhat bleak. BlubCo, while hiring lots of talented engineers and developers, is knee deep in a swamp of organizational problems. Let’s analyse some of the outcomes of the stories of Joe and Mary above. Let’s start with Joe again:
- After fixing the mess later that week, Joe’s attitude about his development shifts dramatically. He begins to dread refactoring his code, and fears any changes that he might make in the future.
- Joe reconsiders what he previously said about not making extra copies of files. Everytime he wants to make a major change, he makes a new copy of the file in question.
- Eventually, he ends up commenting out sections of the code he isn’t using at the moment, just in case. His code becomes much more difficult to read and write, and he’s often left wondering if a commented piece of code should be commented or not.
Joe has encountered some key problems here. What about Mary?
- Mary’s team still works as described above, but they’re incredibly slow. What’s more, with Mary constantly coordinating her team to avoid conflicts at all costs, she’s had to increase her budget, but ends up wasting it to the additional bureaucracy.
- Mary’s peers are always in need of the latest version of the software. Changes they make to any other version can sometimes get shipped to clients, but aren’t always present in newer versions. This causes no end to Mary’s pain, because often she finds regressions of old bugs in the current code.
All that said, I’ve tried to lay out some of the key words or issues above. The first and foremost issue is the idea of developer fear. This is a common thing I see in my own peers, and its something I wanted to discuss for some time. The idea that you can scrap or add or move a piece of code should not be something you live in fear of.
The attitude itself is prevalent in all kinds of engineering, but hesitation in throwing out what doesn’t work causes larger problems down the road. Perhaps Kent Beck or some of the Test-Driven-Development guys can go over this better than I can, but being afraid of refactoring is one of the fastest ways to end up with a very brittle and fragile codebase. But this isn’t about TDD. This is about versions. This is about not being afraid to try new things, and about not worrying that the version you had yesterday is going to disappear.
The next big issue is about coordination, and consistency. When I started my final year undergrad project, our group was forced to use Git and Github as a means to document, organize, and evaluate our progress. Although I had some experience with Git at the time, I still struggled to organize a workflow for the team that we could use without issues, and everyone dreaded it. However, in our second semester on the year-long project, something amazing happened. We became far better organized, and I no longer had to know what each individual was doing at a given time. Conflicts were solved when they happened, and no effort was needed to try and prevent them beforehand.
All this was in comparison to another group in my year, who had no version control system, and wrote their software not as a team, but as fragmented individuals working on different parts on the same project. Their organizational issues took hold very late in the project, and there was all hell to pay when they tried to merge things together later. Fortunately they pulled through, but it took them much longer, and in my opinion, slowed them down irrecoverably.
Version Control Systems
Ok, so basically, we want to avoid developer fear, and we want to organize ourselves. Better yet, a formal system that helps us do this by alleviating the source of our fears and organizational issues would be a steady boon. Effectively, we want a system with the following properties:
- Can track changes incrementally
- Allows us to revert changes incrementally
- Each modification should help define each version
- Versions should be transparent to a group
- Conflicts should only matter when they arise (not earlier)
This brings us to Git, my preferred VCS (or Stupid Content Manager, if that’s what you want to call it). There’s plenty of different VCS out there, such as Perforce, Subversion, and Mercurial; However, given the distributed nature of Git, the ability to easily branch versions, and the vast ecosystem of tools supporting it in favour of proprietary alternatives, I find it difficult to live without it. Even the thought of needing to be connected to a server to commit changes disgusts me.
Okay, so I’m not going to give a full tutorial on how Git works, nor am I going to try and explain how rebasing works, or why / when you should use it. There’s tons of great tutorials on learning Git proper, and I’ll list a few of them here:
You don’t need those for the remainder of this post, and maybe I’ll eventually post something interesting about Git, but for the most part the manual describes everything I typically use day-to-day. What I want to do, hopefully, is show that Git satisfies the above needs of our system.
Tracking Changes Incrementally
This one is easy. Git is all about tracking changes to files. If you make a change to a file, you can see the direct incremental changes using
git diff. This works at any time, including after commits. Using something like BitBucket or Github, you can even view these differences online. Take the following example:
The above is an example of one of my projects on Bitbucket. The differences in the files are shown plainly, and you can tell easily which lines were removed (the red ones) and which were added (the green ones). Each commit is incremental, and the VCS (Git in this case) helps you identify which changes happened when.
Revert Changes Incrementally
This one is a little more difficult to show graphically, but reverting a commit in Git is easy. Since each commit (or incremental change, if we’re using commits properly) has a unique hash-id, we can easily revert specific commits by using this identifier. To revert or undo a commit, we simply write:
$ git revert 777eac6
777eac6 is the hexidecimal hash-id of the commit you’re trying to undo.
Each Modification Should Define Each Version
What I mean by this is that if we make a change, that technically constitutes a new version. Sure, it doesn’t have to be enumerated by some kind of versioning scheme, but logically the software is different, so the “version” of the software should change as well. Git handles that as mentioned in the previous section, with unique hash-identifiers for each incremental change. The most important consequence of this is the history, which we can view using
git log, or if we’re utilizing an online service such as Bitbucket / GitLab / Github, we can view it online as well, in a more… pretty way. The usage of online history viewing is debatable (some people love the command line), but no matter how you do it, it definitely helps improve your understanding of how the project evolves over time.
Versions Should Be Transparent To A Group
Here, my intention is to say that it shouldn’t matter if I have a version at some commit (say
777eac6), and you have another (say
567eff2). If we both make changes to these versions and they later get merged into a common stream (in Git, this could be the
master branch), our changes should both be accepted by the system unless some kind of conflict arises. This could be something such as both of us editing the same file, or lines in a file, but that’s not terribly important. Either way, I shouldn’t have to worry about which version my teammates have, as versions and changes should be incremental, and should ideally fix themselves, if you exclude edge cases where conflicts arise.
Git can achieve this through use of branches and forks, alongside
git merge. This allows your team to work independently without stepping on each others’ toes. It also give you peace of mind, so you don’t have to constantly care which branch you’re on compared to your teammates in the back of your head. You keep track of your own branches, and I’ll keep track of mine.
Conflicts Should Only Matter When They Arise
As an extension of the last case, lets throw away our fear regarding conflicts. They don’t matter until they happen, and at that point we can work together to solve the issues. Merging branches can very easily lead to lots of merge conflicts, but when we’re editing code, we don’t have tip-toe around sections of our code because somebody else might want to change something close-by. Git does a great job of assisting with your merge conflicts, and doesn’t nag you about conflicts before you decide to merge.
Hopefully this post helps convey the importance of version control, and gives you a new perspective on why you may want to adopt a VCS into your team, organization, or workflow. Some common objections I hear often stem from the idea that version control will slow you down, is only necessary for large teams, or somehow makes it harder to get things done (by complicating the process).
In general, I would argue that getting rid of developer fear, alongside the enhanced organization and collaboration that VCS brings to the table actually speeds up your development times overall. In large part, not being able to freely refactor, edit, modify, or otherwise create and scrap changes and ideas in the code will lead you down the path of Joe and Mary Blub. Sure, they’re smart and talented developers, but the pitfalls they faced were not for lack of talent, merely lack of organization.
Had Joe and Mary utilized a VCS such as Git to its fullest extent, they surely could have flourished in the face of so many of the problems that they faced. At one point, I saw myself in Joe, and eventually found myself in Mary’s shoes as well. Now, I can guide others towards making themselves more organized developers, or at least, that’s the idea. Don’t live in fear of your code. Don’t try to micromanage your peers. Just Git to it. :-)