Migrating open source code repositories from CVS to GitHub
Removal Helpers
© altomedia, 123RF
GitHub makes it easier for programmers to contribute to open source projects by simplifying and accelerating communications between project maintainers and people willing to contribute.
If you have ever tried sending a patch to an open source project, you will be familiar with the obstacles that can put off even the most motivated of developers. For example, you need to discover the project maintainer's email address and negotiate the various levels of moderation. If somebody in a position to make decisions actually finds the time to take a look, the patch format might be wrong, or the patch might collide with another, unpublished change.
Reducing the PITA Factor
GitHub's aim is to reduce the PITA (pain in the ass) factor [2]. Public code repositories that are hosted on GitHub use the branch- and merge-friendly Git revision control system that allows the open source community to apply changes, test locally, and – if successful – flow the code into the original project with little pain.
Creating a fork – spawning a copy of an open source project – is not a sneaky move on github.com. Here, forks are not a means of taking control of a project, but the recommended way of developing and testing new features and, finally, asking the project maintainer to add them to the main branch of the project.
GitHub hosts open source projects free of charge for public access, offering 300MB of disc space per developer. If you want to use the service for a non-public project, GitHub has a number of commercial variants that provide the small startup with a handful of developers a steady source of income.
From CVS to Git
Listing 1, cvs2github, helps convert CVS repositories to Git and prepare them for publication on github.com. Git with the git-cvs add-on package already includes an import function. All cvs2github needs to do is use rsync to download the CVS repository and the CVSROOT directory from wherever it is being hosted to the local machine and then call git-cvsimport.
For this to happen, line 16 of the script creates a temporary directory in which rsync stores the local copy of the server's CVS repository files. The CVS server name and account information is stored in line 24. The directory in which the Git repository finally lands is set in line 20 with the $git_dir variable. Because developers create entries in the CVS repository with their Unix IDs but use a different ID on GitHub, lines 30 through 32 map the old Unix usernames to new GitHub IDs with an email address, using an author conversion file.
Listing 1
cvs2github
01 #!/usr/bin/perl -w
02 #############################
03 # cvs2github - Turn cvs repos
04 # to github
05 # 2009 (m@perlmeister.com)
06 #############################
07 use strict;
08 use Getopt::Std;
09 use Pod::Usage;
10 use File::Temp
11 qw(tempdir tempfile);
12 use Sysadm::Install qw(:all);
13
14 my ($proj) = @ARGV;
15 my ($temp_dir) =
16 tempdir( CLEANUP => 1 );
17 my ( $fh, $author_conv_file )
18 = tempfile( UNLINK => 1 );
19 my ($home) = glob "~";
20 my $git_dir = "$home/DEV";
21
22 my $email =
23 'githubemail@mydomain.com';
24 my $cvs_loc = 'mikeschilli@some.cvs.server:cvs';
25 my $github_loc =
26 'git@github.com:mschilli';
27
28 blurt(
29 <<EOT, $author_conv_file );
30 mschilli=mschilli <$email>
31 perlmeis=mschilli <$email>
32 mikeschilli=mschilli <$email>
33 EOT
34
35 pod2usage("No project given")
36 unless defined $proj;
37
38 my $git_proj_name =
39 lc($proj) . "-perl";
40 my $git_path =
41 "$git_dir/$git_proj_name";
42
43 if ( -e $git_path ) {
44 die "Path $git_path already exists";
45 }
46
47 mkd $git_path;
48
49 for my $cvs_dir ( $proj,
50 "CVSROOT" )
51 {
52 sysrun(
53 "RSYNC_RSH=/usr/bin/ssh ".
54 "rsync -avz $cvs_loc:cvs".
55 "/$cvs_dir $temp_dir/");
56 }
57
58 cd $git_path;
59
60 sysrun( "git-cvsimport -A " .
61 "$author_conv_file -d $temp_dir $proj");
62
63 sysrun(
64 "git remote add origin " .
65 "$github_loc/$git_proj_name.git"
66 );
67
68 print
69 "Done: $git_proj_name\n";
70
71 __END__
72
73 =head1 NAME
74
75 cvs2github - Convert cvs projects to git
76
77 =head1 SYNOPSIS
78
79 cvs2github My-Project-Name
80
81 =head1 DESCRIPTION
82
83 cvs2github takes a project
84 checked into cvs and converts
85 it into a git repo ready for
86 github.
87
88 =head1 EXAMPLES
89
90 $ cvs2github Foo-Bar
Developer Names
Here, I am mapping three different IDs (mschill, perlmeis, mikeschilli) to a single new GitHub ID mschilli. If multiple developers have worked on a project, you need to convert all of their IDs to new GitHub IDs. The blurt function from the CPAN Sysadm::Install module stores the lines in a temporary file, which git-cvsimport accepts for mapping via the -A option. Because cvs2github is intended for Perl modules, line 38 converts project names to lowercase letters and adds a -perl suffix; hence, Log-Log4perl becomes log-log4perl-perl; this is in line with the Debian name schema, keeps the namespace on GitHub clean, and helps avoid clashes between original projects and their Perl ports or wrappers. Line 52 calls the rsync command to copy the server's CVS repository and then calls it again for the CVSROOT metadata directory to the local machine because git-cvsroot requires them both to make the transformation to Git. The RSYNC_RSH environmental variable gets set to the SSH client program on the local machine (ssh) because the client communicates via SSH with the server-side repository. A similar approach exists for SourceForge projects [3]. Importing the CVS repositories in line 60 creates a Git repository in the directory specified by line 20 ($HOME/DEV/log-log4perl-perl in this case). The command called in line 63, git remote add, points Git's default remote branch origin to the GitHub project to be created. Git uses this later to synchronize the local copy and the GitHub server version via push and pull.
Our Services
Direct Download
Tag Cloud
News
-
FSF Outs the World Wide Web Consortium over DRM Proposal
Richard Stallman calls for the W3C to remain independent of vendor interests.
-
Debian 7.0 Debuts
The new release supports nine architectures, 73 human languages, and zero non-Free components.
-
Alpha Version of Fedora 19 Released
Fedora developers release the first alpha version of Fedora 19, known as Schrödinger’s Cat, for general testing. The final release is expected in July 2013.
-
ack 2.0 Released
ack is a grep-like, command-line tool that has been optimized for programmers to search large trees of source code.
-
SUSE Studio 1.3 Released
New features in SUSE Studio 1.3 include enhanced cloud integration, VM platform support, and lifecycle management.
-
Xen To Become Linux Foundation Collaborative Project
The Linux Foundation recently announced that the Xen Project is becoming a Linux Foundation Collaborative Project.
-
RunRev Releases Open Source Version of LiveCode
Open source version of LiveCode is now available for developing apps, games, and utilities for all major platforms.
-
OpenDaylight Project Formed
OpenDaylight is an open source software-defined networking project committed to furthering adoption of SDN and accelerating innovation in a vendor-neutral and open environment.
-
Gnome 3.8 Released
The new Gnome release includes privacy and sharing settings, allowing more user control over access to personal information.
-
Mozilla and Samsung Collaborate on New Browser Engine
Mozilla is collaborating with Samsung on a new web browser engine called Servo.
