« "Leveraging Computing Resources with Synergy" | Main | "Extending Subversion with Properties (Part 1)" »
February 13, 2005
Version Control with Subversion
As noted in recent blogs on this site, controlling code is an essential part of software development. One widely used version control tool is Concurrent Versions System. CVS is the de-facto standard version control system in the open-source software development world, managing large, global projects such as Linux, Apache and Mozilla.
The team of developers at J&R Consulting has been using CVS for several years to meet the version control needs of our projects. One of our repositories now contains over 2000 files. I shudder to think what a mess the project would be without a tool to manage such large amounts of information.
CVS Benefits
Three of the primary benefits of using CVS are:
1. Retrieval of historical files
Every time a file is modified, CVS keeps track of the history of a file, including methods to reproduce any version of the file at any time. So if a developer realizes that a recent modification they made has caused adverse affects in the system, a previous version of that code can be retrieved.
2. Software release
CVS can be used to control the release of software. A project can be tagged at a given point in time and delivered to the customer - for example as a prototype of a system being built or as an early phase of a multi-phase project. At any point in the future, all the files that have made up that release can be retrieved.
3. Merging of simultaneous development efforts
Many software development projects involve the efforts of multiple developers. CVS can be used to automatically merge the contributions of more than one developer working on the same file. Our project has used this feature several times, saving time that would have been spent manually merging contributions and decreasing the risk of mistakes.
CVS Limitations
While CVS has served its purpose well, it has drawbacks that have been a source of frustration to members of our team. The biggest complaint by team members has been the inability to rename files and directories. We have encountered several situations where at some point into the development of a system, the folder names and or structures that were defined at the beginning of the project were inadequate. CVS doesn't allow for the renaming of folders or moving files from one location to another. Instead, CVS recommends that you delete the folder or file in question and re-add it to the desired place in the repository. This has the dreadful side effect of losing all of the history of the moved files.
Another problem we've had with CVS is its inability to merge binary files. When a change is made to a plain text file, only the changes are stored on the server. Using diff algorithms, CVS can recreate any version of the file without needing to store every single version. Further, file versions that have gone down diverging paths can be automatically merged back to a main trunk.
However, much of the code our team works with is binary, even in its "source" form. For example, both the source file and compiled file of an Oracle Form or Oracle Report are binary. Because of the limitations in the methods CVS uses to merge files, it can only store full copies of a binary file. Every time we make even the simplest update to a form or report and commit the change to CVS, the entire new file must be uploaded and added to the server. This is obviously a waste of space and bandwidth. But what's worse is the need to manually make changes to a form or report that have already been made on a diverging branch.
The last complaint I will mention with CVS is the potential for selecting the incorrect version of a file for inclusion in a release. Our team has used several documents and tools to record file version numbers for important lifecycle phases (for example to record versions of files that have undergone unit testing). When it comes time to release files, it is a painstaking process to rectify the file versions in the repository with the tracking documents. Because our repositories are disjointed from the rest of our tools, it is possible that an incorrect file version could be tagged and released from the repository.
CVS Replacement
Fortunately, there is a team of developers out there who share these same frustrations and have chosen to take matters into their own hands. Enter Subversion, whose goal is to be a "compelling replacement for CVS". Subversion supports everything [good] that CVS supports and improves on the negative aspects of CVS. From an end-user perspective, Subversion is very similar to CVS, and someone who is already familiar with CVS will feel right at home with Subversion. Best of all, Subversion can be used to address each of the specific complaints that our team has had with CVS.
First, Subversion supports directory and file renaming. And it not only supports the actions, it even versions the changes. Gone are the days when we are forced to live with an awkward organization of our code.
Second, Subversion handles binary files just as easily as it handles text files. It uses a binary diff algorithm, and is therefore much more efficient at transmitting and storing data. We will evaluate whether binary merges can actually be used in practice once we begin using the tool.
Finally, Subversion may be useful in addressing the issues we've had with our version tracking tools being segregated from our repositories. Subversion has great support of meta-data through "properties". Properties can be attached to each file and folder in a repository, providing key-value pairs of information on that object.
The uses of this functionality are far-reaching but could be used, for example, to integrate our version tracking system with our repository. The version tracking data could quite literally be stored right along side the actual files that make up the repository. And the combined versioning and tracking data can be output to formats that allow for extended functionality - for example XML. Someday, I would love to explore the implementation of an XML based tracking and release system that reduces the chances of deploying incorrect versions of a file.
I look forward to the coming days when our team evaluates Subversion and the ways that it strengthens our version control processes.
Posted by Rob Sullivan at February 13, 2005 09:53 AM
Trackback Pings
TrackBack URL for this entry:
http://www.jandrconsult.com/cgi-bin/mt/mt-tb.cgi/10
Listed below are links to weblogs that reference Version Control with Subversion:
» Identifying files by Digital Signature from Jeff Vannest's Weblog
In the past, I've looked at how digital signatures can be used to identify textual and binary files. Today I'd like to explore the specifics of how a company can use digital signatures to organize and identify files.... [Read More]
Tracked on April 9, 2005 01:46 PM