en/mq-collab.tex
author Mariusz Mazur <mmazur@kernel.pl>
Tue Aug 26 20:49:37 2008 +0200 (2008-08-26)
changeset 312 5561812fc5c9
parent 271 8627f718517a
child 320 97e929385442
permissions -rw-r--r--
Fixed typo.
     1 \chapter{Advanced uses of Mercurial Queues}
     2 \label{chap:mq-collab}
     3 
     4 While it's easy to pick up straightforward uses of Mercurial Queues,
     5 use of a little discipline and some of MQ's less frequently used
     6 capabilities makes it possible to work in complicated development
     7 environments.
     8 
     9 In this chapter, I will use as an example a technique I have used to
    10 manage the development of an Infiniband device driver for the Linux
    11 kernel.  The driver in question is large (at least as drivers go),
    12 with 25,000 lines of code spread across 35 source files.  It is
    13 maintained by a small team of developers.
    14 
    15 While much of the material in this chapter is specific to Linux, the
    16 same principles apply to any code base for which you're not the
    17 primary owner, and upon which you need to do a lot of development.
    18 
    19 \section{The problem of many targets}
    20 
    21 The Linux kernel changes rapidly, and has never been internally
    22 stable; developers frequently make drastic changes between releases.
    23 This means that a version of the driver that works well with a
    24 particular released version of the kernel will not even \emph{compile}
    25 correctly against, typically, any other version.
    26 
    27 To maintain a driver, we have to keep a number of distinct versions of
    28 Linux in mind.
    29 \begin{itemize}
    30 \item One target is the main Linux kernel development tree.
    31   Maintenance of the code is in this case partly shared by other
    32   developers in the kernel community, who make ``drive-by''
    33   modifications to the driver as they develop and refine kernel
    34   subsystems.
    35 \item We also maintain a number of ``backports'' to older versions of
    36   the Linux kernel, to support the needs of customers who are running
    37   older Linux distributions that do not incorporate our drivers.  (To
    38   \emph{backport} a piece of code is to modify it to work in an older
    39   version of its target environment than the version it was developed
    40   for.)
    41 \item Finally, we make software releases on a schedule that is
    42   necessarily not aligned with those used by Linux distributors and
    43   kernel developers, so that we can deliver new features to customers
    44   without forcing them to upgrade their entire kernels or
    45   distributions.
    46 \end{itemize}
    47 
    48 \subsection{Tempting approaches that don't work well}
    49 
    50 There are two ``standard'' ways to maintain a piece of software that
    51 has to target many different environments.
    52 
    53 The first is to maintain a number of branches, each intended for a
    54 single target.  The trouble with this approach is that you must
    55 maintain iron discipline in the flow of changes between repositories.
    56 A new feature or bug fix must start life in a ``pristine'' repository,
    57 then percolate out to every backport repository.  Backport changes are
    58 more limited in the branches they should propagate to; a backport
    59 change that is applied to a branch where it doesn't belong will
    60 probably stop the driver from compiling.
    61 
    62 The second is to maintain a single source tree filled with conditional
    63 statements that turn chunks of code on or off depending on the
    64 intended target.  Because these ``ifdefs'' are not allowed in the
    65 Linux kernel tree, a manual or automatic process must be followed to
    66 strip them out and yield a clean tree.  A code base maintained in this
    67 fashion rapidly becomes a rat's nest of conditional blocks that are
    68 difficult to understand and maintain.
    69 
    70 Neither of these approaches is well suited to a situation where you
    71 don't ``own'' the canonical copy of a source tree.  In the case of a
    72 Linux driver that is distributed with the standard kernel, Linus's
    73 tree contains the copy of the code that will be treated by the world
    74 as canonical.  The upstream version of ``my'' driver can be modified
    75 by people I don't know, without me even finding out about it until
    76 after the changes show up in Linus's tree.  
    77 
    78 These approaches have the added weakness of making it difficult to
    79 generate well-formed patches to submit upstream.
    80 
    81 In principle, Mercurial Queues seems like a good candidate to manage a
    82 development scenario such as the above.  While this is indeed the
    83 case, MQ contains a few added features that make the job more
    84 pleasant.
    85 
    86 \section{Conditionally applying patches with 
    87   guards}
    88 
    89 Perhaps the best way to maintain sanity with so many targets is to be
    90 able to choose specific patches to apply for a given situation.  MQ
    91 provides a feature called ``guards'' (which originates with quilt's
    92 \texttt{guards} command) that does just this.  To start off, let's
    93 create a simple repository for experimenting in.
    94 \interaction{mq.guards.init}
    95 This gives us a tiny repository that contains two patches that don't
    96 have any dependencies on each other, because they touch different files.
    97 
    98 The idea behind conditional application is that you can ``tag'' a
    99 patch with a \emph{guard}, which is simply a text string of your
   100 choosing, then tell MQ to select specific guards to use when applying
   101 patches.  MQ will then either apply, or skip over, a guarded patch,
   102 depending on the guards that you have selected.
   103 
   104 A patch can have an arbitrary number of guards;
   105 each one is \emph{positive} (``apply this patch if this guard is
   106 selected'') or \emph{negative} (``skip this patch if this guard is
   107 selected'').  A patch with no guards is always applied.
   108 
   109 \section{Controlling the guards on a patch}
   110 
   111 The \hgxcmd{mq}{qguard} command lets you determine which guards should
   112 apply to a patch, or display the guards that are already in effect.
   113 Without any arguments, it displays the guards on the current topmost
   114 patch.
   115 \interaction{mq.guards.qguard}
   116 To set a positive guard on a patch, prefix the name of the guard with
   117 a ``\texttt{+}''.
   118 \interaction{mq.guards.qguard.pos}
   119 To set a negative guard on a patch, prefix the name of the guard with
   120 a ``\texttt{-}''.
   121 \interaction{mq.guards.qguard.neg}
   122 
   123 \begin{note}
   124   The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it
   125   doesn't \emph{modify} them.  What this means is that if you run
   126   \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on
   127   the same patch, the \emph{only} guard that will be set on it
   128   afterwards is \texttt{+c}.
   129 \end{note}
   130 
   131 Mercurial stores guards in the \sfilename{series} file; the form in
   132 which they are stored is easy both to understand and to edit by hand.
   133 (In other words, you don't have to use the \hgxcmd{mq}{qguard} command if
   134 you don't want to; it's okay to simply edit the \sfilename{series}
   135 file.)
   136 \interaction{mq.guards.series}
   137 
   138 \section{Selecting the guards to use}
   139 
   140 The \hgxcmd{mq}{qselect} command determines which guards are active at a
   141 given time.  The effect of this is to determine which patches MQ will
   142 apply the next time you run \hgxcmd{mq}{qpush}.  It has no other effect; in
   143 particular, it doesn't do anything to patches that are already
   144 applied.
   145 
   146 With no arguments, the \hgxcmd{mq}{qselect} command lists the guards
   147 currently in effect, one per line of output.  Each argument is treated
   148 as the name of a guard to apply.
   149 \interaction{mq.guards.qselect.foo}
   150 In case you're interested, the currently selected guards are stored in
   151 the \sfilename{guards} file.
   152 \interaction{mq.guards.qselect.cat}
   153 We can see the effect the selected guards have when we run
   154 \hgxcmd{mq}{qpush}.
   155 \interaction{mq.guards.qselect.qpush}
   156 
   157 A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}''
   158 character.  The name of a guard must not contain white space, but most
   159 other characters are acceptable.  If you try to use a guard with an
   160 invalid name, MQ will complain:
   161 \interaction{mq.guards.qselect.error} 
   162 Changing the selected guards changes the patches that are applied.
   163 \interaction{mq.guards.qselect.quux} 
   164 You can see in the example below that negative guards take precedence
   165 over positive guards.
   166 \interaction{mq.guards.qselect.foobar}
   167 
   168 \section{MQ's rules for applying patches}
   169 
   170 The rules that MQ uses when deciding whether to apply a patch
   171 are as follows.
   172 \begin{itemize}
   173 \item A patch that has no guards is always applied.
   174 \item If the patch has any negative guard that matches any currently
   175   selected guard, the patch is skipped.
   176 \item If the patch has any positive guard that matches any currently
   177   selected guard, the patch is applied.
   178 \item If the patch has positive or negative guards, but none matches
   179   any currently selected guard, the patch is skipped.
   180 \end{itemize}
   181 
   182 \section{Trimming the work environment}
   183 
   184 In working on the device driver I mentioned earlier, I don't apply the
   185 patches to a normal Linux kernel tree.  Instead, I use a repository
   186 that contains only a snapshot of the source files and headers that are
   187 relevant to Infiniband development.  This repository is~1\% the size
   188 of a kernel repository, so it's easier to work with.
   189 
   190 I then choose a ``base'' version on top of which the patches are
   191 applied.  This is a snapshot of the Linux kernel tree as of a revision
   192 of my choosing.  When I take the snapshot, I record the changeset ID
   193 from the kernel repository in the commit message.  Since the snapshot
   194 preserves the ``shape'' and content of the relevant parts of the
   195 kernel tree, I can apply my patches on top of either my tiny
   196 repository or a normal kernel tree.
   197 
   198 Normally, the base tree atop which the patches apply should be a
   199 snapshot of a very recent upstream tree.  This best facilitates the
   200 development of patches that can easily be submitted upstream with few
   201 or no modifications.
   202 
   203 \section{Dividing up the \sfilename{series} file}
   204 
   205 I categorise the patches in the \sfilename{series} file into a number
   206 of logical groups.  Each section of like patches begins with a block
   207 of comments that describes the purpose of the patches that follow.
   208 
   209 The sequence of patch groups that I maintain follows.  The ordering of
   210 these groups is important; I'll describe why after I introduce the
   211 groups.
   212 \begin{itemize}
   213 \item The ``accepted'' group.  Patches that the development team has
   214   submitted to the maintainer of the Infiniband subsystem, and which
   215   he has accepted, but which are not present in the snapshot that the
   216   tiny repository is based on.  These are ``read only'' patches,
   217   present only to transform the tree into a similar state as it is in
   218   the upstream maintainer's repository.
   219 \item The ``rework'' group.  Patches that I have submitted, but that
   220   the upstream maintainer has requested modifications to before he
   221   will accept them.
   222 \item The ``pending'' group.  Patches that I have not yet submitted to
   223   the upstream maintainer, but which we have finished working on.
   224   These will be ``read only'' for a while.  If the upstream maintainer
   225   accepts them upon submission, I'll move them to the end of the
   226   ``accepted'' group.  If he requests that I modify any, I'll move
   227   them to the beginning of the ``rework'' group.
   228 \item The ``in progress'' group.  Patches that are actively being
   229   developed, and should not be submitted anywhere yet.
   230 \item The ``backport'' group.  Patches that adapt the source tree to
   231   older versions of the kernel tree.
   232 \item The ``do not ship'' group.  Patches that for some reason should
   233   never be submitted upstream.  For example, one such patch might
   234   change embedded driver identification strings to make it easier to
   235   distinguish, in the field, between an out-of-tree version of the
   236   driver and a version shipped by a distribution vendor.
   237 \end{itemize}
   238 
   239 Now to return to the reasons for ordering groups of patches in this
   240 way.  We would like the lowest patches in the stack to be as stable as
   241 possible, so that we will not need to rework higher patches due to
   242 changes in context.  Putting patches that will never be changed first
   243 in the \sfilename{series} file serves this purpose.
   244 
   245 We would also like the patches that we know we'll need to modify to be
   246 applied on top of a source tree that resembles the upstream tree as
   247 closely as possible.  This is why we keep accepted patches around for
   248 a while.
   249 
   250 The ``backport'' and ``do not ship'' patches float at the end of the
   251 \sfilename{series} file.  The backport patches must be applied on top
   252 of all other patches, and the ``do not ship'' patches might as well
   253 stay out of harm's way.
   254 
   255 \section{Maintaining the patch series}
   256 
   257 In my work, I use a number of guards to control which patches are to
   258 be applied.
   259 
   260 \begin{itemize}
   261 \item ``Accepted'' patches are guarded with \texttt{accepted}.  I
   262   enable this guard most of the time.  When I'm applying the patches
   263   on top of a tree where the patches are already present, I can turn
   264   this patch off, and the patches that follow it will apply cleanly.
   265 \item Patches that are ``finished'', but not yet submitted, have no
   266   guards.  If I'm applying the patch stack to a copy of the upstream
   267   tree, I don't need to enable any guards in order to get a reasonably
   268   safe source tree.
   269 \item Those patches that need reworking before being resubmitted are
   270   guarded with \texttt{rework}.
   271 \item For those patches that are still under development, I use
   272   \texttt{devel}.
   273 \item A backport patch may have several guards, one for each version
   274   of the kernel to which it applies.  For example, a patch that
   275   backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard.
   276 \end{itemize}
   277 This variety of guards gives me considerable flexibility in
   278 qdetermining what kind of source tree I want to end up with.  For most
   279 situations, the selection of appropriate guards is automated during
   280 the build process, but I can manually tune the guards to use for less
   281 common circumstances.
   282 
   283 \subsection{The art of writing backport patches}
   284 
   285 Using MQ, writing a backport patch is a simple process.  All such a
   286 patch has to do is modify a piece of code that uses a kernel feature
   287 not present in the older version of the kernel, so that the driver
   288 continues to work correctly under that older version.
   289 
   290 A useful goal when writing a good backport patch is to make your code
   291 look as if it was written for the older version of the kernel you're
   292 targeting.  The less obtrusive the patch, the easier it will be to
   293 understand and maintain.  If you're writing a collection of backport
   294 patches to avoid the ``rat's nest'' effect of lots of
   295 \texttt{\#ifdef}s (hunks of source code that are only used
   296 conditionally) in your code, don't introduce version-dependent
   297 \texttt{\#ifdef}s into the patches.  Instead, write several patches,
   298 each of which makes unconditional changes, and control their
   299 application using guards.
   300 
   301 There are two reasons to divide backport patches into a distinct
   302 group, away from the ``regular'' patches whose effects they modify.
   303 The first is that intermingling the two makes it more difficult to use
   304 a tool like the \hgext{patchbomb} extension to automate the process of
   305 submitting the patches to an upstream maintainer.  The second is that
   306 a backport patch could perturb the context in which a subsequent
   307 regular patch is applied, making it impossible to apply the regular
   308 patch cleanly \emph{without} the earlier backport patch already being
   309 applied.
   310 
   311 \section{Useful tips for developing with MQ}
   312 
   313 \subsection{Organising patches in directories}
   314 
   315 If you're working on a substantial project with MQ, it's not difficult
   316 to accumulate a large number of patches.  For example, I have one
   317 patch repository that contains over 250 patches.
   318 
   319 If you can group these patches into separate logical categories, you
   320 can if you like store them in different directories; MQ has no
   321 problems with patch names that contain path separators.
   322 
   323 \subsection{Viewing the history of a patch}
   324 \label{mq-collab:tips:interdiff}
   325 
   326 If you're developing a set of patches over a long time, it's a good
   327 idea to maintain them in a repository, as discussed in
   328 section~\ref{sec:mq:repo}.  If you do so, you'll quickly discover that
   329 using the \hgcmd{diff} command to look at the history of changes to a
   330 patch is unworkable.  This is in part because you're looking at the
   331 second derivative of the real code (a diff of a diff), but also
   332 because MQ adds noise to the process by modifying time stamps and
   333 directory names when it updates a patch.
   334 
   335 However, you can use the \hgext{extdiff} extension, which is bundled
   336 with Mercurial, to turn a diff of two versions of a patch into
   337 something readable.  To do this, you will need a third-party package
   338 called \package{patchutils}~\cite{web:patchutils}.  This provides a
   339 command named \command{interdiff}, which shows the differences between
   340 two diffs as a diff.  Used on two versions of the same diff, it
   341 generates a diff that represents the diff from the first to the second
   342 version.
   343 
   344 You can enable the \hgext{extdiff} extension in the usual way, by
   345 adding a line to the \rcsection{extensions} section of your \hgrc.
   346 \begin{codesample2}
   347   [extensions]
   348   extdiff =
   349 \end{codesample2}
   350 The \command{interdiff} command expects to be passed the names of two
   351 files, but the \hgext{extdiff} extension passes the program it runs a
   352 pair of directories, each of which can contain an arbitrary number of
   353 files.  We thus need a small program that will run \command{interdiff}
   354 on each pair of files in these two directories.  This program is
   355 available as \sfilename{hg-interdiff} in the \dirname{examples}
   356 directory of the source code repository that accompanies this book.
   357 \excode{hg-interdiff}
   358 
   359 With the \sfilename{hg-interdiff} program in your shell's search path,
   360 you can run it as follows, from inside an MQ patch directory:
   361 \begin{codesample2}
   362   hg extdiff -p hg-interdiff -r A:B my-change.patch
   363 \end{codesample2}
   364 Since you'll probably want to use this long-winded command a lot, you
   365 can get \hgext{hgext} to make it available as a normal Mercurial
   366 command, again by editing your \hgrc.
   367 \begin{codesample2}
   368   [extdiff]
   369   cmd.interdiff = hg-interdiff
   370 \end{codesample2}
   371 This directs \hgext{hgext} to make an \texttt{interdiff} command
   372 available, so you can now shorten the previous invocation of
   373 \hgxcmd{extdiff}{extdiff} to something a little more wieldy.
   374 \begin{codesample2}
   375   hg interdiff -r A:B my-change.patch
   376 \end{codesample2}
   377 
   378 \begin{note}
   379   The \command{interdiff} command works well only if the underlying
   380   files against which versions of a patch are generated remain the
   381   same.  If you create a patch, modify the underlying files, and then
   382   regenerate the patch, \command{interdiff} may not produce useful
   383   output.
   384 \end{note}
   385 
   386 The \hgext{extdiff} extension is useful for more than merely improving
   387 the presentation of MQ~patches.  To read more about it, go to
   388 section~\ref{sec:hgext:extdiff}.
   389 
   390 %%% Local Variables: 
   391 %%% mode: latex
   392 %%% TeX-master: "00book"
   393 %%% End: