Re: To diff or not to diff

From: ··········@tfeb.org
Subject: Re: To diff or not to diff
Date: Wed, 18 Aug 2004 11:11:19 +0000
Message-ID: <cfvdgn$irn@odah37.prod.google.com>

Ron Garret wrote:
> I am starting a multi-developer project and have come to the
conclusion
> that none of the existing revision control systems meet my needs, so
> I've decided to write my own (in Lisp, of course).  There's a major
> design decision that I'm having trouble making: should I store
revision
> chains as full files, or as diffs?  Storing original files uses more
> storage but makes the coding simpler.  Storing diffs makes the coding

> somewhat more complicated, but feels more elegant and less wasteful.

> Still, disk space is cheap, and redundancy is not necessarily a bad
> thing.  I thought I'd ask if anyone here had any experience with this

> sort of thing and had an opinion one way or the other.
>

I think this is a long way below the level of things you want to worry
about when designing a revision control system.  It was a big deal for
things like RCS and SCCS which really only dealt with things at the
individual file level.  It was a big deal because they had to run on
very slow machines with very limited disk space, and also because they
only did rather limited things.  Even then, I should think that safety
issues (locking and so on) were a larger issue. Modern systems do a
lot more, and although they may seem overcomplicated, most of the
things they do actually count for multi-developer projects. Issues
like support for parallel development without locks (everything?),
support for restructuring source trees (notably not CVS), atomic
commits (not CVS), merge tracking (not CVS or, shockingly, subversion)
and so forth are a lot harder to get right and more important.

If I was designing a revision control system then (a) I wouldn't, and
(b) I'd abstract the storage away behind an API, so I could change my
mind later.

And seriously, I wouldn't.  There are loads of competent systems out
there, and you don't have to use all of their features.  If you don't
want the stuff that CVS doesn't do (or does very badly), for instance,
then you can just use the things it does well, like versioning sets of
files with some gentle branching support. Or subversion, or any of a
myriad others.

The only reasons I can think of to write such a system now are:
curiosity (perfectly valid, of course), intending to write a big
competent system to fix problems with existing ones (subversion, say,
meta-CVS too?), because you have a radical new idea on how such a
system should work, or beause reinventing wheels helps you avoid doing
real work.

--tim

From: Ron Garret
Subject: Re: To diff or not to diff
Date: Wed, 18 Aug 2004 16:14:59 +0000
Message-ID: <rNOSPAMon-8D16D8.09145818082004@nntp1.jpl.nasa.gov>

In article <··········@odah37.prod.google.com>,
 ···········@tfeb.org" <··········@tfeb.org> wrote:

> If I was designing a revision control system then (a) I wouldn't, and
> (b) I'd abstract the storage away behind an API, so I could change my
> mind later.
> 
> And seriously, I wouldn't.  There are loads of competent systems out
> there, and you don't have to use all of their features.  If you don't
> want the stuff that CVS doesn't do (or does very badly), for instance,
> then you can just use the things it does well, like versioning sets of
> files with some gentle branching support. Or subversion, or any of a
> myriad others.
> 
> The only reasons I can think of to write such a system now are:
> curiosity (perfectly valid, of course), intending to write a big
> competent system to fix problems with existing ones (subversion, say,
> meta-CVS too?), because you have a radical new idea on how such a
> system should work, or beause reinventing wheels helps you avoid doing
> real work.

Sound advice on all counts.  Just FYI, the reasons I was considering 
rolling my own was:

1.  I thought CVS was the state of the art in open-source revision 
control.  (In retrospect I really should have known better.  As a Lisp 
programmer, assuming that what everyone uses is the best available is 
obviously a mistake.)

2.  I had some experience in recreating RCS (or at least the subset of 
RCS that I wanted to use), and found it was pretty easy.

3.  My requirements are actually pretty minimal, but very particular.  I 
want to be able to:

a) Checkpoint a file (without making it visible to others)
b) Roll back to a previous checkpoint
c) Push my changes up so that they are visible to others
d) Pull other people's pushed changes down so I can see them

CVS does (c) and (d) but not (a) and (b).  You can sort of emulate a-d 
with only c and d using branches, but there seems to be pretty universal 
consensus that CVS's branching model is badly broken.

I thought (and still think) that "diff3 -m" plus a little bookkeeping 
added to srcs is all I really need.

4.  Writing code is much more fun (and educational) than reading 
documentation :-)

Nonetheless, I am looking seriously at all of the rcs systems that I 
have learned are out there as a result of this discussion.  Some of them 
look very promising.

Thanks for all the replies.

rg