en/mq-collab.tex
author Mariusz Mazur <mmazur@kernel.pl>
Tue Aug 26 20:49:37 2008 +0200 (2008-08-26)
changeset 312 5561812fc5c9
parent 271 8627f718517a
child 320 97e929385442
permissions -rw-r--r--
Fixed typo.
bos@104
     1
\chapter{Advanced uses of Mercurial Queues}
bos@224
     2
\label{chap:mq-collab}
bos@104
     3
bos@104
     4
While it's easy to pick up straightforward uses of Mercurial Queues,
bos@104
     5
use of a little discipline and some of MQ's less frequently used
bos@104
     6
capabilities makes it possible to work in complicated development
bos@104
     7
environments.
bos@104
     8
bos@105
     9
In this chapter, I will use as an example a technique I have used to
bos@105
    10
manage the development of an Infiniband device driver for the Linux
bos@105
    11
kernel.  The driver in question is large (at least as drivers go),
bos@105
    12
with 25,000 lines of code spread across 35 source files.  It is
bos@105
    13
maintained by a small team of developers.
bos@104
    14
bos@104
    15
While much of the material in this chapter is specific to Linux, the
bos@104
    16
same principles apply to any code base for which you're not the
bos@104
    17
primary owner, and upon which you need to do a lot of development.
bos@104
    18
bos@104
    19
\section{The problem of many targets}
bos@104
    20
bos@104
    21
The Linux kernel changes rapidly, and has never been internally
bos@104
    22
stable; developers frequently make drastic changes between releases.
bos@104
    23
This means that a version of the driver that works well with a
bos@104
    24
particular released version of the kernel will not even \emph{compile}
bos@104
    25
correctly against, typically, any other version.
bos@104
    26
bos@104
    27
To maintain a driver, we have to keep a number of distinct versions of
bos@104
    28
Linux in mind.
bos@104
    29
\begin{itemize}
bos@104
    30
\item One target is the main Linux kernel development tree.
bos@104
    31
  Maintenance of the code is in this case partly shared by other
bos@104
    32
  developers in the kernel community, who make ``drive-by''
bos@104
    33
  modifications to the driver as they develop and refine kernel
bos@104
    34
  subsystems.
bos@104
    35
\item We also maintain a number of ``backports'' to older versions of
bos@104
    36
  the Linux kernel, to support the needs of customers who are running
bos@105
    37
  older Linux distributions that do not incorporate our drivers.  (To
bos@105
    38
  \emph{backport} a piece of code is to modify it to work in an older
bos@105
    39
  version of its target environment than the version it was developed
bos@105
    40
  for.)
bos@104
    41
\item Finally, we make software releases on a schedule that is
bos@104
    42
  necessarily not aligned with those used by Linux distributors and
bos@104
    43
  kernel developers, so that we can deliver new features to customers
bos@104
    44
  without forcing them to upgrade their entire kernels or
bos@104
    45
  distributions.
bos@104
    46
\end{itemize}
bos@104
    47
bos@104
    48
\subsection{Tempting approaches that don't work well}
bos@104
    49
bos@104
    50
There are two ``standard'' ways to maintain a piece of software that
bos@104
    51
has to target many different environments.
bos@104
    52
bos@104
    53
The first is to maintain a number of branches, each intended for a
bos@104
    54
single target.  The trouble with this approach is that you must
bos@104
    55
maintain iron discipline in the flow of changes between repositories.
bos@104
    56
A new feature or bug fix must start life in a ``pristine'' repository,
bos@104
    57
then percolate out to every backport repository.  Backport changes are
bos@104
    58
more limited in the branches they should propagate to; a backport
bos@104
    59
change that is applied to a branch where it doesn't belong will
bos@104
    60
probably stop the driver from compiling.
bos@104
    61
bos@104
    62
The second is to maintain a single source tree filled with conditional
bos@104
    63
statements that turn chunks of code on or off depending on the
bos@104
    64
intended target.  Because these ``ifdefs'' are not allowed in the
bos@104
    65
Linux kernel tree, a manual or automatic process must be followed to
bos@104
    66
strip them out and yield a clean tree.  A code base maintained in this
bos@104
    67
fashion rapidly becomes a rat's nest of conditional blocks that are
bos@104
    68
difficult to understand and maintain.
bos@104
    69
bos@104
    70
Neither of these approaches is well suited to a situation where you
bos@104
    71
don't ``own'' the canonical copy of a source tree.  In the case of a
bos@104
    72
Linux driver that is distributed with the standard kernel, Linus's
bos@104
    73
tree contains the copy of the code that will be treated by the world
bos@104
    74
as canonical.  The upstream version of ``my'' driver can be modified
bos@104
    75
by people I don't know, without me even finding out about it until
bos@104
    76
after the changes show up in Linus's tree.  
bos@104
    77
bos@104
    78
These approaches have the added weakness of making it difficult to
bos@104
    79
generate well-formed patches to submit upstream.
bos@104
    80
bos@104
    81
In principle, Mercurial Queues seems like a good candidate to manage a
bos@104
    82
development scenario such as the above.  While this is indeed the
bos@104
    83
case, MQ contains a few added features that make the job more
bos@104
    84
pleasant.
bos@104
    85
bos@105
    86
\section{Conditionally applying patches with 
bos@105
    87
  guards}
bos@104
    88
bos@104
    89
Perhaps the best way to maintain sanity with so many targets is to be
bos@104
    90
able to choose specific patches to apply for a given situation.  MQ
bos@104
    91
provides a feature called ``guards'' (which originates with quilt's
bos@104
    92
\texttt{guards} command) that does just this.  To start off, let's
bos@104
    93
create a simple repository for experimenting in.
bos@104
    94
\interaction{mq.guards.init}
bos@104
    95
This gives us a tiny repository that contains two patches that don't
bos@104
    96
have any dependencies on each other, because they touch different files.
bos@104
    97
bos@104
    98
The idea behind conditional application is that you can ``tag'' a
bos@104
    99
patch with a \emph{guard}, which is simply a text string of your
bos@104
   100
choosing, then tell MQ to select specific guards to use when applying
bos@104
   101
patches.  MQ will then either apply, or skip over, a guarded patch,
bos@104
   102
depending on the guards that you have selected.
bos@104
   103
bos@104
   104
A patch can have an arbitrary number of guards;
bos@104
   105
each one is \emph{positive} (``apply this patch if this guard is
bos@104
   106
selected'') or \emph{negative} (``skip this patch if this guard is
bos@104
   107
selected'').  A patch with no guards is always applied.
bos@104
   108
bos@104
   109
\section{Controlling the guards on a patch}
bos@104
   110
bos@233
   111
The \hgxcmd{mq}{qguard} command lets you determine which guards should
bos@104
   112
apply to a patch, or display the guards that are already in effect.
bos@104
   113
Without any arguments, it displays the guards on the current topmost
bos@104
   114
patch.
bos@104
   115
\interaction{mq.guards.qguard}
bos@104
   116
To set a positive guard on a patch, prefix the name of the guard with
bos@104
   117
a ``\texttt{+}''.
bos@104
   118
\interaction{mq.guards.qguard.pos}
bos@104
   119
To set a negative guard on a patch, prefix the name of the guard with
bos@104
   120
a ``\texttt{-}''.
bos@104
   121
\interaction{mq.guards.qguard.neg}
bos@104
   122
bos@104
   123
\begin{note}
bos@233
   124
  The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it
bos@104
   125
  doesn't \emph{modify} them.  What this means is that if you run
bos@104
   126
  \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on
bos@104
   127
  the same patch, the \emph{only} guard that will be set on it
bos@104
   128
  afterwards is \texttt{+c}.
bos@104
   129
\end{note}
bos@104
   130
bos@104
   131
Mercurial stores guards in the \sfilename{series} file; the form in
bos@104
   132
which they are stored is easy both to understand and to edit by hand.
bos@233
   133
(In other words, you don't have to use the \hgxcmd{mq}{qguard} command if
bos@104
   134
you don't want to; it's okay to simply edit the \sfilename{series}
bos@104
   135
file.)
bos@104
   136
\interaction{mq.guards.series}
bos@104
   137
bos@104
   138
\section{Selecting the guards to use}
bos@104
   139
bos@233
   140
The \hgxcmd{mq}{qselect} command determines which guards are active at a
bos@104
   141
given time.  The effect of this is to determine which patches MQ will
bos@233
   142
apply the next time you run \hgxcmd{mq}{qpush}.  It has no other effect; in
bos@104
   143
particular, it doesn't do anything to patches that are already
bos@104
   144
applied.
bos@104
   145
bos@233
   146
With no arguments, the \hgxcmd{mq}{qselect} command lists the guards
bos@104
   147
currently in effect, one per line of output.  Each argument is treated
bos@104
   148
as the name of a guard to apply.
bos@104
   149
\interaction{mq.guards.qselect.foo}
bos@104
   150
In case you're interested, the currently selected guards are stored in
bos@104
   151
the \sfilename{guards} file.
bos@104
   152
\interaction{mq.guards.qselect.cat}
bos@104
   153
We can see the effect the selected guards have when we run
bos@233
   154
\hgxcmd{mq}{qpush}.
bos@104
   155
\interaction{mq.guards.qselect.qpush}
bos@104
   156
bos@104
   157
A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}''
bos@106
   158
character.  The name of a guard must not contain white space, but most
mmazur@312
   159
other characters are acceptable.  If you try to use a guard with an
bos@106
   160
invalid name, MQ will complain:
bos@106
   161
\interaction{mq.guards.qselect.error} 
bos@104
   162
Changing the selected guards changes the patches that are applied.
bos@106
   163
\interaction{mq.guards.qselect.quux} 
bos@105
   164
You can see in the example below that negative guards take precedence
bos@105
   165
over positive guards.
bos@104
   166
\interaction{mq.guards.qselect.foobar}
bos@104
   167
bos@105
   168
\section{MQ's rules for applying patches}
bos@105
   169
bos@105
   170
The rules that MQ uses when deciding whether to apply a patch
bos@105
   171
are as follows.
bos@105
   172
\begin{itemize}
bos@105
   173
\item A patch that has no guards is always applied.
bos@105
   174
\item If the patch has any negative guard that matches any currently
bos@105
   175
  selected guard, the patch is skipped.
bos@105
   176
\item If the patch has any positive guard that matches any currently
bos@105
   177
  selected guard, the patch is applied.
bos@105
   178
\item If the patch has positive or negative guards, but none matches
bos@105
   179
  any currently selected guard, the patch is skipped.
bos@105
   180
\end{itemize}
bos@105
   181
bos@105
   182
\section{Trimming the work environment}
bos@105
   183
bos@105
   184
In working on the device driver I mentioned earlier, I don't apply the
bos@105
   185
patches to a normal Linux kernel tree.  Instead, I use a repository
bos@105
   186
that contains only a snapshot of the source files and headers that are
bos@105
   187
relevant to Infiniband development.  This repository is~1\% the size
bos@105
   188
of a kernel repository, so it's easier to work with.
bos@105
   189
bos@105
   190
I then choose a ``base'' version on top of which the patches are
bos@105
   191
applied.  This is a snapshot of the Linux kernel tree as of a revision
bos@105
   192
of my choosing.  When I take the snapshot, I record the changeset ID
bos@105
   193
from the kernel repository in the commit message.  Since the snapshot
bos@105
   194
preserves the ``shape'' and content of the relevant parts of the
bos@105
   195
kernel tree, I can apply my patches on top of either my tiny
bos@105
   196
repository or a normal kernel tree.
bos@105
   197
bos@105
   198
Normally, the base tree atop which the patches apply should be a
bos@105
   199
snapshot of a very recent upstream tree.  This best facilitates the
bos@105
   200
development of patches that can easily be submitted upstream with few
bos@105
   201
or no modifications.
bos@105
   202
bos@105
   203
\section{Dividing up the \sfilename{series} file}
bos@105
   204
bos@105
   205
I categorise the patches in the \sfilename{series} file into a number
bos@105
   206
of logical groups.  Each section of like patches begins with a block
bos@105
   207
of comments that describes the purpose of the patches that follow.
bos@105
   208
bos@105
   209
The sequence of patch groups that I maintain follows.  The ordering of
bos@105
   210
these groups is important; I'll describe why after I introduce the
bos@105
   211
groups.
bos@105
   212
\begin{itemize}
bos@105
   213
\item The ``accepted'' group.  Patches that the development team has
bos@105
   214
  submitted to the maintainer of the Infiniband subsystem, and which
bos@105
   215
  he has accepted, but which are not present in the snapshot that the
bos@105
   216
  tiny repository is based on.  These are ``read only'' patches,
bos@105
   217
  present only to transform the tree into a similar state as it is in
bos@105
   218
  the upstream maintainer's repository.
bos@105
   219
\item The ``rework'' group.  Patches that I have submitted, but that
bos@105
   220
  the upstream maintainer has requested modifications to before he
bos@105
   221
  will accept them.
bos@105
   222
\item The ``pending'' group.  Patches that I have not yet submitted to
bos@105
   223
  the upstream maintainer, but which we have finished working on.
bos@105
   224
  These will be ``read only'' for a while.  If the upstream maintainer
bos@105
   225
  accepts them upon submission, I'll move them to the end of the
bos@105
   226
  ``accepted'' group.  If he requests that I modify any, I'll move
bos@105
   227
  them to the beginning of the ``rework'' group.
bos@105
   228
\item The ``in progress'' group.  Patches that are actively being
bos@105
   229
  developed, and should not be submitted anywhere yet.
bos@105
   230
\item The ``backport'' group.  Patches that adapt the source tree to
bos@105
   231
  older versions of the kernel tree.
bos@105
   232
\item The ``do not ship'' group.  Patches that for some reason should
bos@105
   233
  never be submitted upstream.  For example, one such patch might
bos@105
   234
  change embedded driver identification strings to make it easier to
bos@105
   235
  distinguish, in the field, between an out-of-tree version of the
bos@105
   236
  driver and a version shipped by a distribution vendor.
bos@105
   237
\end{itemize}
bos@105
   238
bos@105
   239
Now to return to the reasons for ordering groups of patches in this
bos@105
   240
way.  We would like the lowest patches in the stack to be as stable as
bos@105
   241
possible, so that we will not need to rework higher patches due to
bos@105
   242
changes in context.  Putting patches that will never be changed first
bos@105
   243
in the \sfilename{series} file serves this purpose.
bos@105
   244
bos@105
   245
We would also like the patches that we know we'll need to modify to be
bos@105
   246
applied on top of a source tree that resembles the upstream tree as
bos@105
   247
closely as possible.  This is why we keep accepted patches around for
bos@105
   248
a while.
bos@105
   249
bos@105
   250
The ``backport'' and ``do not ship'' patches float at the end of the
bos@106
   251
\sfilename{series} file.  The backport patches must be applied on top
bos@106
   252
of all other patches, and the ``do not ship'' patches might as well
bos@106
   253
stay out of harm's way.
bos@106
   254
bos@106
   255
\section{Maintaining the patch series}
bos@106
   256
bos@106
   257
In my work, I use a number of guards to control which patches are to
bos@106
   258
be applied.
bos@106
   259
bos@106
   260
\begin{itemize}
bos@106
   261
\item ``Accepted'' patches are guarded with \texttt{accepted}.  I
bos@106
   262
  enable this guard most of the time.  When I'm applying the patches
bos@106
   263
  on top of a tree where the patches are already present, I can turn
max@271
   264
  this patch off, and the patches that follow it will apply cleanly.
bos@106
   265
\item Patches that are ``finished'', but not yet submitted, have no
bos@106
   266
  guards.  If I'm applying the patch stack to a copy of the upstream
bos@106
   267
  tree, I don't need to enable any guards in order to get a reasonably
bos@106
   268
  safe source tree.
bos@106
   269
\item Those patches that need reworking before being resubmitted are
bos@106
   270
  guarded with \texttt{rework}.
bos@106
   271
\item For those patches that are still under development, I use
bos@106
   272
  \texttt{devel}.
bos@106
   273
\item A backport patch may have several guards, one for each version
bos@106
   274
  of the kernel to which it applies.  For example, a patch that
bos@106
   275
  backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard.
bos@106
   276
\end{itemize}
bos@106
   277
This variety of guards gives me considerable flexibility in
bos@106
   278
qdetermining what kind of source tree I want to end up with.  For most
bos@106
   279
situations, the selection of appropriate guards is automated during
bos@106
   280
the build process, but I can manually tune the guards to use for less
bos@106
   281
common circumstances.
bos@106
   282
bos@106
   283
\subsection{The art of writing backport patches}
bos@106
   284
bos@106
   285
Using MQ, writing a backport patch is a simple process.  All such a
bos@106
   286
patch has to do is modify a piece of code that uses a kernel feature
bos@106
   287
not present in the older version of the kernel, so that the driver
bos@106
   288
continues to work correctly under that older version.
bos@106
   289
bos@106
   290
A useful goal when writing a good backport patch is to make your code
bos@106
   291
look as if it was written for the older version of the kernel you're
bos@106
   292
targeting.  The less obtrusive the patch, the easier it will be to
bos@106
   293
understand and maintain.  If you're writing a collection of backport
bos@106
   294
patches to avoid the ``rat's nest'' effect of lots of
bos@106
   295
\texttt{\#ifdef}s (hunks of source code that are only used
bos@106
   296
conditionally) in your code, don't introduce version-dependent
bos@106
   297
\texttt{\#ifdef}s into the patches.  Instead, write several patches,
bos@106
   298
each of which makes unconditional changes, and control their
bos@106
   299
application using guards.
bos@106
   300
bos@106
   301
There are two reasons to divide backport patches into a distinct
bos@106
   302
group, away from the ``regular'' patches whose effects they modify.
bos@106
   303
The first is that intermingling the two makes it more difficult to use
bos@106
   304
a tool like the \hgext{patchbomb} extension to automate the process of
bos@106
   305
submitting the patches to an upstream maintainer.  The second is that
bos@106
   306
a backport patch could perturb the context in which a subsequent
bos@106
   307
regular patch is applied, making it impossible to apply the regular
bos@106
   308
patch cleanly \emph{without} the earlier backport patch already being
bos@106
   309
applied.
bos@106
   310
bos@106
   311
\section{Useful tips for developing with MQ}
bos@106
   312
bos@106
   313
\subsection{Organising patches in directories}
bos@106
   314
bos@106
   315
If you're working on a substantial project with MQ, it's not difficult
bos@106
   316
to accumulate a large number of patches.  For example, I have one
bos@106
   317
patch repository that contains over 250 patches.
bos@106
   318
bos@106
   319
If you can group these patches into separate logical categories, you
bos@106
   320
can if you like store them in different directories; MQ has no
bos@106
   321
problems with patch names that contain path separators.
bos@106
   322
bos@106
   323
\subsection{Viewing the history of a patch}
bos@106
   324
\label{mq-collab:tips:interdiff}
bos@106
   325
bos@106
   326
If you're developing a set of patches over a long time, it's a good
bos@106
   327
idea to maintain them in a repository, as discussed in
bos@106
   328
section~\ref{sec:mq:repo}.  If you do so, you'll quickly discover that
bos@106
   329
using the \hgcmd{diff} command to look at the history of changes to a
bos@106
   330
patch is unworkable.  This is in part because you're looking at the
bos@106
   331
second derivative of the real code (a diff of a diff), but also
bos@106
   332
because MQ adds noise to the process by modifying time stamps and
bos@106
   333
directory names when it updates a patch.
bos@106
   334
bos@106
   335
However, you can use the \hgext{extdiff} extension, which is bundled
bos@106
   336
with Mercurial, to turn a diff of two versions of a patch into
bos@106
   337
something readable.  To do this, you will need a third-party package
bos@106
   338
called \package{patchutils}~\cite{web:patchutils}.  This provides a
bos@106
   339
command named \command{interdiff}, which shows the differences between
bos@106
   340
two diffs as a diff.  Used on two versions of the same diff, it
bos@106
   341
generates a diff that represents the diff from the first to the second
bos@106
   342
version.
bos@106
   343
bos@106
   344
You can enable the \hgext{extdiff} extension in the usual way, by
bos@106
   345
adding a line to the \rcsection{extensions} section of your \hgrc.
bos@106
   346
\begin{codesample2}
bos@106
   347
  [extensions]
bos@106
   348
  extdiff =
bos@106
   349
\end{codesample2}
bos@106
   350
The \command{interdiff} command expects to be passed the names of two
bos@106
   351
files, but the \hgext{extdiff} extension passes the program it runs a
bos@106
   352
pair of directories, each of which can contain an arbitrary number of
bos@106
   353
files.  We thus need a small program that will run \command{interdiff}
bos@106
   354
on each pair of files in these two directories.  This program is
bos@106
   355
available as \sfilename{hg-interdiff} in the \dirname{examples}
bos@106
   356
directory of the source code repository that accompanies this book.
bos@106
   357
\excode{hg-interdiff}
bos@106
   358
bos@106
   359
With the \sfilename{hg-interdiff} program in your shell's search path,
bos@106
   360
you can run it as follows, from inside an MQ patch directory:
bos@106
   361
\begin{codesample2}
bos@106
   362
  hg extdiff -p hg-interdiff -r A:B my-change.patch
bos@106
   363
\end{codesample2}
bos@106
   364
Since you'll probably want to use this long-winded command a lot, you
bos@106
   365
can get \hgext{hgext} to make it available as a normal Mercurial
bos@106
   366
command, again by editing your \hgrc.
bos@106
   367
\begin{codesample2}
bos@106
   368
  [extdiff]
bos@106
   369
  cmd.interdiff = hg-interdiff
bos@106
   370
\end{codesample2}
bos@106
   371
This directs \hgext{hgext} to make an \texttt{interdiff} command
bos@106
   372
available, so you can now shorten the previous invocation of
bos@238
   373
\hgxcmd{extdiff}{extdiff} to something a little more wieldy.
bos@106
   374
\begin{codesample2}
bos@106
   375
  hg interdiff -r A:B my-change.patch
bos@106
   376
\end{codesample2}
bos@105
   377
bos@107
   378
\begin{note}
bos@107
   379
  The \command{interdiff} command works well only if the underlying
bos@107
   380
  files against which versions of a patch are generated remain the
bos@107
   381
  same.  If you create a patch, modify the underlying files, and then
bos@107
   382
  regenerate the patch, \command{interdiff} may not produce useful
bos@107
   383
  output.
bos@107
   384
\end{note}
bos@107
   385
bos@240
   386
The \hgext{extdiff} extension is useful for more than merely improving
bos@239
   387
the presentation of MQ~patches.  To read more about it, go to
bos@239
   388
section~\ref{sec:hgext:extdiff}.
bos@239
   389
bos@104
   390
%%% Local Variables: 
bos@104
   391
%%% mode: latex
bos@104
   392
%%% TeX-master: "00book"
bos@104
   393
%%% End: