Fixed some typos
[git_nutshell_guides.git] / git_guide.tex
blob0bd4c26082f9e052705352dadd2895206a32882a
1 \documentclass[a4paper,10pt]{article}
2 \usepackage[english]{babel}
3 \usepackage[OT1]{fontenc}
4 \usepackage[utf8]{inputenc}
5 \usepackage{fancyhdr}
6 \usepackage{mydoc}
7 \usepackage{url}
8 \usepackage[pdftex,colorlinks=true,
9 pdfstartview=FitV,
10 linkcolor=blue,
11 citecolor=blue,
12 urlcolor=blue,
13 ]{hyperref}
15 \begin{document}
16 \pagestyle{fancy}
17 \fancyfoot{}
18 \fancyhead{}
19 \renewcommand{\sectionmark}[1]{\markboth{\sf\thesection.\ #1}{}}
20 \renewcommand{\subsectionmark}[1]{}
21 \fancyhead[R]{{\rmfamily\thepage}}
22 \newcommand{\Ref}[1]{section~\ref{#1}, p.~\pageref{#1}}
24 \title{Git in a Nutshell}
26 \subtitle{For Normal People (tm)}
27 \author{{\sf Jonas Jus\'elius}}
28 \address{
29 {\tt <jonas.juselius@chem.uit.no>}\\
30 {\sf Centre for Theoretical and Computational Chemistry}\\
31 {\sf University of Tromsø}\\
32 {\sf N-9037 University of Tromsø, Norway}
34 \years{2007}
35 \abstract{Rev 0.5.5:\\{\tt git clone http://git.jonas.iki.fi/git\_guides.git}\\
36 Corrections and improvements are most welcome via e-mail using the
37 {\tt git format-patch} facility.}
40 \maketitle
41 \tableofcontents
42 \newpage
44 \section*{About this document}
45 The intent of this document is to give an overview of git and to explain some of
46 the aspects of working with git that I feel are somewhat poorly explained
47 elsewhere.
48 What I feel is missing is a manual which explains what the commands are for,
49 how the pieces fit together and how to actually work with git on a daily
50 basis. This guide has been written with an audience consisting of the typical
51 academic programmer in mind, and relies implicitly on a repository setup as
52 outlined in the ``Git to CVS Migration Guide''.
54 All git commands have excellent man pages explaining in
55 excruciating detail the particular command. As you read through this manual,
56 it might be a good idea to have a quick look at the corresponding manual pages of
57 the commands, just to give you an idea of more advanced features lurking under
58 the surface. For example, get all available information on the
59 \texttt{git checkout} command, run
60 \begin{verbatim}
61 $ man git-checkout
62 \end{verbatim}
63 Don't get overwhelmed by the amount of detail in the man pages. As
64 you gain experience and confidence working with git you will learn to
65 appreciate the finer details. Don't panic.
67 Finally a disclaimer. I'm not an expert on git, nor do I have a huge
68 experience working with git. All comments, corrections and suggestions are
69 most welcome!
71 \section{Introduction to git}
72 Git is a very powerful, easy to use and flexible revision control system.
73 Although git has many advanced and flexible features, most basic day-to-day
74 operations are very simple to use.
76 Git has been designed in very UNIX-like fashion; it's built from a set of
77 small, efficient programs which do one thing, and do it well. By combining
78 these programs, new high-level commands can be created to do more or less any
79 desired task. This structure is reflected in how git commands are executed.
80 There are usually two equivalent ways of executing a command, either by
81 executing the command directly
82 \begin{verbatim}
83 $ git-command args ...
84 \end{verbatim}
85 or by using the \texttt{git} command which wraps the most common commands for a
86 more CVS-like interface
87 \begin{verbatim}
88 $ git command args ...
89 \end{verbatim}
90 In this guide I'll use the second form throughout, mostly because modern
91 shells like \texttt{zsh} and \texttt{bash} have command line
92 completion features which interact very nicely with the \texttt{git} wrapper.
94 \subsection{Creating a repository}
95 Git makes it very easy to put projects (code, manuscripts, or whatever) under
96 revision control. Suppose you have a directory which contains a number of
97 files you want to have under revision control. The first step is to remove all
98 files that should not be under revision control, i.e. files that can be
99 generated from source (.o, .pdf, \ldots). When you have a ``clean'' directory
100 tree, simply in the top level project directory run
101 \begin{verbatim}
102 $ git init
103 $ git add .
104 $ git commit
105 # edit the commit message, save and quit.
106 \end{verbatim}
107 That's it!
108 If you don't want to clean your project directory
109 you can instead specify the files to include by explicitly giving the file
110 names to \texttt{git add} instead of the current directory ('.')
112 Running \texttt{git init} in a directory sets up the repository and the
113 necessary files in a directory named \path{.git/} in the current directory.
114 Since everything is contained in the project directory, no special permissions
115 or groups are needed.
117 \subsection{Cloning a repository}
118 Ok, so you have a repository on a server somewhere, and you want to get your own
119 working copy and dig in! Git supports many different protocols (http, ssh,
120 git\ldots), but for simplicity I'll assume you have ssh access to the server.
121 To get your own repository, simply run
122 \begin{verbatim}
123 $ git clone me@myserver:/path/to/myrepo .
124 \end{verbatim}
125 This creates a directory called \path{myrepo}, with a copy of the remote
126 repository in the current directory.
128 Git is quite different from CVS in most respects. When you clone a repository
129 you get the WHOLE repository, everything, not just a working copy like in CVS!
130 \texttt{git clone} is in principle almost the same as doing a\footnote{
131 Actually git clone is a bit more selective, and it also sets up various
132 administrative information, etc.}
133 \begin{verbatim}
134 $ scp -r me@myserver:/path/to/myrepo .
135 \end{verbatim}
136 Within this personal repository you can do \textit{whatever} you like, i.e.
137 create branches, delete branches, tags, and of course commit as much as you
138 like. It's only when you push to the master that others can see your
139 changes. In fact, your (cloned) repository can act as a master for someone
140 else! The division into master and client is really quite blurred and
141 artificial in git. When you clone a repository, the new copy will contain
142 administrative information in \path{.git/config} about where it was cloned from, and
143 has a slightly modified branch structure (as you will see), but apart from
144 that it's identical to its parent. And where we humans can't choose nor
145 change who our parents are, git has no problem in changing or removing the
146 parent(s).
148 \subsection{Working in a repository}
149 With your repository in place, working with git is very easy. Work and edit
150 your files, and whenever you have completed something to the point you think
151 it could be worthy of a save, simply run
152 \begin{verbatim}
153 $ git commit -a
154 # edit the commit message, and save
155 \end{verbatim}
156 A commit does not make your changes public. The commits are local to your
157 repository, and unless you let somebody clone or pull changes from you,
158 they will remain hidden until you decide to publish them by pushing them
159 to a common server.
161 \section{Basic git}
162 The most basic operation under a revision control system is saving the state of
163 a project as a new revision. When you create new files
164 these also need to be placed under revision control. This is done with
165 \begin{verbatim}
166 $ git add file(s)
167 \end{verbatim}
168 This marks the files to be added to the repository next time you run
169 \texttt{git commit}.
171 In fact, you can run \texttt{git add} on files already
172 under revision control in the repository, to selectively mark them for
173 inclusion in the next commit. This is known as \emph{staging} files for a
174 commit. We thus have a number of ways of marking files for a commit, either by
175 directly specifying the files to \texttt{git commit} or by first adding them
176 with \texttt{git add} and then running \texttt{git commit} without any files.
178 \subsection{Branches}
179 The use of branches is where git probably differs most from CVS to most users.
180 Under git branches are used extensively, and they are essentially
181 free. It costs nearly nothing to create branches, both in terms of time and
182 disk space. Also, there is no need to tag first and branch second, like in CVS.
184 It's very easy to create a new branch under git
185 \begin{verbatim}
186 $ git branch foobar
187 \end{verbatim}
188 This creates a branch named foobar which will be an exact copy of the latest
189 revision of the current branch. To start working under this new branch you
190 much first do a checkout
191 \begin{verbatim}
192 $ git checkout foobar
193 \end{verbatim}
194 This command switches to the new branch, and updates the files in the source
195 tree to the HEAD of the branch (HEAD is a symbolic tag representing
196 the current branch). For more detailed information on
197 how to specify revisions see \Ref{sec:revision}.
199 The checkout command is thus very
200 different from the \texttt{cvs checkout} command.
201 Since branches are so cheap and easy to use, it's often convenient to create a
202 new branch for a specific task. Such branches are known as topic branches.
203 Branches are also good for experimentation; suppose you have a wild idea you
204 want to try out, just create a branch, and discard it if it doesn't work out.
205 Suppose we want to base the experiment on a branch named 'work', here is how to
206 create and switch branch in one go:
207 \begin{verbatim}
208 $ git checkout -b wildidea work
209 # work, test, commit, work... realise it was a bad idea
210 $ git checkout work
211 # delete the wildidea branch
212 $ git branch -d wildidea
213 \end{verbatim}
214 Deleting the branch 'wildidea' not only deletes the branch, but actually the
215 whole commit history and all changes on the branch, so you will never be able
216 to get back that information. So be sure that you are not throwing away
217 something you might want to save.
219 \subsection{Merging}
220 \label{sec:merge}
221 Obviously branches would be very cumbersome to work with unless it would be
222 nearly trivial to incorporate changes in one branch into another. Git provides
223 two main ways\footnote{Actually, there is a third way called rebasing, but
224 that method is for experts only. See the \texttt{git rebase} man page for more
225 info.} to do this; merging and cherry
226 picking (for more information on cherry picking see \Ref{sec:cherry}).
228 Merging is the main mechanism for incorporating changes from one branch
229 into another. Merging is trivial if you work in a slightly disciplined
230 way, avoiding making changes to the same files in different branches without
231 synchronising first. Of course, it's not a problem modifying the same files,
232 but then you will have to select by hand which changes to retain. For more
233 information on how to resolve conflicts see \Ref{sec:conflict}.
235 Suppose that the branch 'wildidea' in the previous section worked out, and
236 that you want to merge the changes back into the 'work' branch:
237 \begin{verbatim}
238 $ git checkout work
239 $ git merge wildidea
240 \end{verbatim}
241 If the work branch has not been changed since wildidea was created from it,
242 this brings the work branch into
243 the exact same state as the wildidea branch. If, on the other hand, the work
244 branch has changed two things can happen: The changes do not overlap and the
245 changes in wildidea are cleanly incorporated. If the same file in both
246 branches have changed you will have a conflict which needs to be resolved
247 (see \Ref{sec:conflict}).
249 Branch names don't have to be simple strings. In fact you can create branches
250 with sub-branches exactly like a directory tree. This can be useful if there
251 are many developers sharing a master repository. Every developer has
252 his/her own branch tree, e.g.
253 \begin{verbatim}
254 $ git branch $USER/work
255 $ git branch $USER/crazy_stuff
257 \end{verbatim}
259 As a last point, it's actually possible to merge many branches in one merge
260 operation. Such a merge is called an octopus merge, see the
261 \texttt{git-merge} man page for more info.
263 \subsection{Tags}
264 Tagging is a way of specifying a symbolic name to a specific revision (state)
265 of the source tree. This makes it much easier to access that particular
266 revision later.
268 Tags should not be checked out directly, rather used to create branches
269 starting at the specified tag.
270 For example, it's a good idea to tag every release of a
271 project. Later if someone finds a bug, it's easy to either go back to that
272 state or to create a bugfix branch.
273 \begin{verbatim}
274 $ git tag version-1.12
276 $ git branch bugfix-1.12 version-1.12
277 $ git checkout bugfix-1.12
278 \end{verbatim}
279 Tags are not specific to a particular branch, but by using
280 ``directory tree'' like names one can emulate branch tags
281 \begin{verbatim}
282 $ git checkout crazy_stuff
283 $ git tag crazy_stuff/pre-release1
284 \end{verbatim}
285 Tags can also be specified retroactively by specifying a revision
286 \begin{verbatim}
287 $ git tag working-version7 HEAD~42
288 \end{verbatim}
289 Finally, tags are deleted by running
290 \begin{verbatim}
291 $ git tag -d working-version7
292 \end{verbatim}
294 \subsection{Moving, renaming and deleting files}
295 As projects evolve, it often becomes necessary to move, rename or even retire
296 files. For example, to rename and move a file in the repository do
297 \begin{verbatim}
298 $ git mv foo.py raboof/bar.py
299 \end{verbatim}
300 If you want to delete a file or directory use
301 \begin{verbatim}
302 $ git rm myfile
303 \end{verbatim}
304 Note that this does not delete the file irrevocably, since going back to a
305 previous revision will bring it back (as it should).
307 \section{Working with remote repositories}
308 Even though revision control can be very useful in a one-man universe, it's
309 when collaborating with others that git reveals its true power.
310 The two basic operations when working with a remote repository are the pull
311 and push operations, to retrieve and publish changes, respectively. Still, it's
312 good to keep in mind that your own repository is a full-fledged repository,
313 even when you are working with a remote repository. The only difference to
314 the ``master'' repository is that a symbolic pointer called 'origin' is set
315 up to point back to where the repository was cloned from\footnote{You can in
316 fact set up multiple remote repositories if you like, see \Ref{sec:remotes}}.
317 The symbolic name 'origin' is simply a shorthand for the url of the remote
318 repository, i.e. if you do \texttt{git clone me@repos.foo.org/proj.git}, then
319 'origin' is set to \texttt{me@repos.foo.org/proj.git}.
322 \subsection{Remote branches in git}
323 \label{sec:branch}
324 When you clone a repository, the repository you cloned will be referred to as
325 a remote repository (even if it's on the same machine). The default remote
326 repository is called 'origin'. The repository you just cloned has at least one
327 branch ('master'), but probably more.
328 When you clone a repository all the branches in that repository are renamed
329 in your local copy by prefixing the branch name with '\texttt{origin/}'. Thus,
330 the 'master' branch in the remote repository will be named 'origin/master' in
331 your local copy.
332 These branches are called 'remote
333 branches' or 'remote-tracking branches',
334 to distinguish them from \emph{your} local
335 branches. The point of having these branches is that after an update,
336 they will always be in the exact same state as the corresponding branch in the
337 remote repository, i.e. they track the remote repository.
339 To view all remote branches, run
340 \begin{verbatim}
341 $ git branch -r
342 \end{verbatim}
343 Whenever you run
344 \texttt{git fetch origin} \emph{all} remote branches and tags
345 are updated to the exact
346 state of the remote repository. You can inspect the changes on those
347 branches by either using \texttt{gitk}, or a combination of \texttt{git log},
348 \texttt{git whatchanged}, \texttt{git diff} and \texttt{git annotate} (see
349 \Ref{sec:examine}).
350 For example, to inspect the latest changes on the most important remote branch:
351 \begin{verbatim}
352 $ git fetch origin
353 $ gitk origin/master
354 \end{verbatim}
355 Note that \texttt{git fetch} updates the remote
356 tracking branches, not your working branches! Never, ever, checkout a remote
357 branch directly unless you absolutely want trouble.
359 \subsection{Tracking remote branches}
360 In order to incorporate changes in remote branches you need to merge your
361 working copy with the remote branch, e.g.
362 \begin{verbatim}
363 $ git checkout mybranch
364 $ git merge origin/master
365 \end{verbatim}
366 If you get a conflict, run
367 \texttt{git mergetool} which will fire up the merge tool of your choice. When
368 you have resolved all issues, run \texttt{git commit} to complete the merge
369 (see \Ref{sec:conflict}).
371 The \texttt{git pull} command basically does a \texttt{git fetch} and
372 \texttt{git merge} in one go. I recommend you \emph{do not} use it unless you
373 very carefully read the man page first, since it has some pitfalls!
375 As mentioned earlier, never work on a remote branch directly! This will
376 cause terrible problems when you try to sync with the remote repository.
377 Instead create a local branch \emph{from} the remote branch and do your work
378 on it instead:
379 \begin{verbatim}
380 $ git branch foobar origin/foobar
381 $ git checkout foobar
382 \end{verbatim}
384 \subsection{Publishing changes}
385 At some point you will hopefully have produced a nice set of commits, leading
386 up to something you find worthy of sharing with others. Before you
387 push your changes to the remote server it is essential that you make sure that
388 you are in sync with the branch you are going to push to. For a push to work
389 it must result in a so called fast-forward merge.
391 \emph{Fast-forward}\index{fast-forward} means that the files the files on
392 the other branch have not changed since the last pull. If a push fails
393 because it is not fast-forward, you must first fetch, merge and possible
394 resolve any conflicts before (re)doing the push.
395 %% \index{fast-forward} is probably overkill for such a short
396 %% document; if 'fast-forward' is marked as term being defined
397 %% then all other definitions/explanations must be marked in
398 %% the same way. Perhaps instead of \emph{Fast-forward}
399 %% \define{Fast-forward} or \explain{Fast-formard} should be used.
401 When pushing changes I recommend being very explicit in order to avoid any
402 surprises, e.g. to push the changes on mybranch to an \emph{existing} branch
403 called myuserid in the repository where we cloned from (i.e. origin)
404 \begin{verbatim}
405 $ git push origin mybranch:myuserid
406 \end{verbatim}
407 %% Wouldn't it be better to configure remotes for push, with push refspec?
408 %% jj: yes, but it's a lot of explaining. Maybe an appendix?
409 If you have created tags that you want to make public you need to push them
410 explicitly
411 \begin{verbatim}
412 $ git push --tags origin
413 \end{verbatim}
415 \subsection{Creating remote branches}
416 To create a new branch on a remote server, git
417 requires a very explicit syntax. To create a remote
418 branch and push the specified branch to it, do
419 \begin{verbatim}
420 $ git push origin <branch>:refs/heads/<branch>
421 \end{verbatim}
422 As a technical side note, the 'refs/heads' syntax refers to the actual
423 directory structure in the internal git repository in \path{.git/}.
424 Once you have created the branch, you can
425 use the same syntax as when operating on local branches.
427 \subsection{Deleting remote branches and tags}
428 If you for some reason want to delete a branch
429 on the remote server, just push an empty branch to the remote branch:
430 \begin{verbatim}
431 $ git push origin :<branch>
432 \end{verbatim}
434 To delete a tag on the master you need to explicitly push an empty tag
435 \begin{verbatim}
436 $ git push origin :refs/tags/<tag>
437 \end{verbatim}
439 \subsection{Cleaning up}
440 Branches come and go. Some branches have very long lifetimes and others just
441 exist for a short while. When remote branches get deleted, git does not
442 automatically register this when you do a \texttt{git fetch}, it just ignores
443 them. Thus, the number of stale remote branches can grow, cluttering the
444 branch listings. To get rid of all these stale branches simply run
445 \begin{verbatim}
446 $ git remote prune
447 \end{verbatim}
448 To delete single stale branch you can use
449 \begin{verbatim}
450 $ git branch -d -r <remote>/<branch>
451 \end{verbatim}
454 \subsection{The aim of the game}
455 With so many possibilities for organising both repositories and branches it's
456 important not to forget that the ultimate aim should be to get your beautiful
457 new features merged with the master branch on the master server! It's like the
458 musketeers, ``All for one, one for all!''.
460 It's important to update often from the master repository, partly not to drift
461 too far away from the master branch, and also to incorporate all bug fixes
462 etc. At the same time it's also important to push to the master often and in
463 small increments, since this makes it a lot easier for other developers to
464 stay in sync with \emph{your} work.
466 %% Is the goal there description of centralized repository (CVS-like)
467 %% usage? The cycle: push, if it fails pull (and resolve conflict if
468 %% needed), repeat until push succeeds should follow CVS' update, if
469 %% it fails resolve conflicts, although with git you can ensure that
470 %% commits are small and do not break.
471 %%jj This is the idea :)
473 %% If it is not the case, perhaps describing a common git model where
474 %% main developers have private repository and public repository they
475 %% push into (and others pull from), and fringe developers use patches
476 %% sent by email would be good idea.
477 %%jj This could be added explicitly to the adcanced section
479 \subsection{A word of advice}
480 Git might seem a bit overwhelming in the beginning, so here are
481 some recommendations for how to work with git until you get used to it:
482 When you clone a remote master repository, you will get a set of remote
483 branches (see \Ref{sec:branch} for more info on branches), and one
484 local working branch called 'master'. This branch is in the exact state of
485 the remote master branch (i.e. 'origin/master' and 'master' are identical). I
486 suggest you keep it this way, and create a new working branch for yourself,
487 e.g.
488 \begin{verbatim}
489 $ git checkout -b work
490 \end{verbatim}
491 which creates branch 'work' and checks it out in one go.
492 Now work like you normally do, and every now and then (every morning for
493 example) run
494 \begin{verbatim}
495 $ git fetch origin
496 $ gitk origin/master
497 # if you like what you see
498 $ git pull origin master
499 \end{verbatim}
500 %% Using 'git pull origin' generates (if it is not fast-forward case)
501 %% commit message with the URL of remote; 'git merge origin/master'
502 %% would generate commit message with name of branch (origin/master).
504 Every time you have made changes which can be considered complete in some
505 sense (you know best), do a commit
506 \begin{verbatim}
507 $ git status
508 # hmm, what did I actually change?
509 $ git diff
510 # oh yes...
511 $ git add file(s)
512 $ git commit
513 \end{verbatim}
514 It's much better with many small commits
515 than a few big commits as you will see in \Ref{sec:cherry}.
516 Small commits matters also for commit review, and when finding bugs
517 using \texttt{git bisect} (see \Ref{sec:bisect}), or for \texttt{git blame}
518 analysis (see \Ref{sec:blame}).
520 \section{Examining repositories}
521 \label{sec:examine}
522 Every now and then, on a more or less daily basis, I tend to forget which
523 files I have modified so far. This frequently happens during debugging
524 sessions, where you easily end up all over the place in the hunt for the
525 offending piece of code. Then it can be highly convenient to get a listing
526 of files which have changed since the last commit. If you run
527 \begin{verbatim}
528 $ git status
529 \end{verbatim}
530 you will get a compact status report of which files have modifications, which
531 files are not under revision control and files staged for a commit (with
532 \texttt{git add}) but not committed yet. Often you have a bunch of files that
533 should not be under revision control and that you really don't care about,
534 e.g. object files and the like. To avoid having git status always list these
535 files you can edit the file \path{.gitignore} in the project directory and
536 list files and file patterns (one per line) that you want to ignore:
537 \begin{verbatim}
538 # example .gitignore file
539 *.[oa]
541 *.bak
543 \end{verbatim}
544 Have a look at \texttt{man gitignore} for a detailed description of how
545 patterns are interpreted.
546 It's usually a good idea to place the \path{.gitignore} file under revision
547 control too.
549 \subsection{Examining logs and changes}
550 Often it's nice to be able to browse the commit logs on a branch, either
551 to identify a particular commit or just to see what others might have
552 committed. To view the commit log run
553 \begin{verbatim}
554 $ git log
555 \end{verbatim}
556 If the information provided by git log is not enough, and viewing actual
557 changes is too much, then
558 \begin{verbatim}
559 $ git whatchanged
560 \end{verbatim}
561 shows the commit log \emph{including} a listing of which files had
562 modifications in a particular commit.
564 \subsection{Examining changes}
565 One of the most important aspects of revision control is that it allows you to
566 follow how files change over time. We have already looked at how to examine
567 the development history through logs and by listing which files have changed.
568 We shall now turn our focus on how to examine how the actual files change
569 between revisions. For this purpose git provides a very powerful command:
570 \begin{verbatim}
571 $ git diff
572 \end{verbatim}
573 Simply running this command without any arguments prints out the differences
574 between your working copy and the last checked in (staged)
575 version\footnote{A technical detail, which most people safely can ignore:
576 git-diff shows difference between the index (staging area) and the working copy,
577 not between the HEAD (last commit) and the working copy.}.
578 This can be
579 extremely useful at times. \texttt{git diff} can also show differences between
580 arbitrary revisions, branches, tags and so on. To view the differences
581 between two branches
582 \begin{verbatim}
583 $ git diff branch1 branch2
584 \end{verbatim}
586 Alternatively, when you finally have located a commit or a file revision that
587 you want to examine in some detail, the versatile \texttt{git show} command
588 can be useful. \texttt{git show} is also convenient for retrieving older
589 revisions of \emph{files}. Since git only deals with whole revisions, i.e. the
590 \emph{state} of the repository at a given time, it's in principle not possible
591 to retrieve the revision of a single file. Sometimes however it's convenient
592 to be able to reset a single file to an older state. So for example to save a
593 particular file 7 revisions back:
594 \begin{verbatim}
595 $ git show HEAD~7:foobar.c >foobar~7.c
596 \end{verbatim}
597 If you want to reset a file to given state, you can directly use
598 \begin{verbatim}
599 $ git checkout HEAD~7 foobar.c
600 \end{verbatim}
602 \subsection{Searching a repository}
603 Quite often one has a need to search all or some of the files in a source
604 tree for a particular string. Of course, it's reasonably simple to quickly
605 filter the relevant files on the command line and \texttt{grep} for the
606 string. Things become substantially more complicated if you want to search in
607 an older revision. Luckily, git has a simple solution:
608 \begin{verbatim}
609 $ git grep regexp
610 \end{verbatim}
611 This will match the regexp for all files in the current revision.
612 \texttt{git grep} has a lot of flags to limit and refine searches.
615 \subsection{Who to blame}
616 \label{sec:blame}
617 We have all been there, someone has messed up your code and you don't know who
618 to blame\footnote{This is because in 9 out of 10 cases it's you yourself,
619 with a perfect, albeit short, memory.}.
620 Fortunately git comes to our rescue:
621 \begin{verbatim}
622 $ git blame file
623 \end{verbatim}
624 This command prints out the file with every line nicely annotated with who
625 changed it and when.
626 You can also use
627 \begin{verbatim}
628 $ git gui blame file
629 \end{verbatim}
630 for graphical blame.
632 If you want to find what commit introduced given change, you can use
633 so called \emph{pickaxe}\index{pickaxe} search:
634 \begin{verbatim}
635 $ git log -S'<changed line>' file
636 \end{verbatim}
639 \section{Using git for collaboration}
640 Revision management goes well beyond just source code management for a group
641 of programmers. Revision management is useful for most tasks which are
642 expected to evolve with time, like for example manuscripts.
643 Since git is very easy to set up, and supports a wide range of communication
644 protocols, git can be useful for many collaborative tasks. In the following
645 section we will examine how git can be used to collaborate in a highly
646 disconnected environment, where none of the participants have access to a
647 common server or each other's machines. This is a typical situation which
648 arises for shorter-term projects, like when collaborating on a scientific
649 manuscript. To facilitate this situation git offers a powerful e-mail facility
650 for communication changes.
652 Suppose you are working on a LaTeX manuscript and you want to have the whole
653 manuscript under revision control:
654 \begin{verbatim}
655 $ cd ~/tex/manus/
656 $ git init
657 $ git add manus.tex fig1.ps fig2.ps
658 $ git commit
659 \end{verbatim}
660 That's it! Now you can work happily, and remember to commit every now and then
661 so that you always can go back in history if you need to.
663 At the point when you are ready to send the manuscript to your collaborators,
664 you can make an archive of the whole project and
665 send it by email to your collaborators\footnote{If the file is very big it's
666 probably better to provide a (hidden) link to your home page, as many mail
667 servers will not accept excessively large files}.
668 \begin{verbatim}
669 $ cd /tmp
670 $ git clone ~/tex/manus
671 $ tar vfcz manus.tgz manus
672 $ rm -rf manus
673 # mail and attach /tmp/manus.tgz
674 \end{verbatim}
675 %% I don't like to use git archive here, because it does not export the
676 %% _repository_
677 Alternatively you can use \texttt{git bundle} described below.
679 Now you and your collaborators continue to work the manuscript. After some
680 time, and a number of commits, it's time to share your changes with the
681 others. The first step is to identify the commits you want to send. The
682 commits can easily be identified by running \texttt{git log}. Suppose you have
683 made 3 commits since you last distributed your changes:
684 \begin{verbatim}
685 $ mkdir patches/
686 $ git format-patch -3
687 \end{verbatim}
688 %% you can use '-o dir' to save patches to directory; this way you are
689 %% less likely to send some patches you don't want to send, while
690 %% keeping old patches around just in case they need to be resent
691 This creates 3 patch files in the current directory, which now can be attached
692 and sent to your collaborators using your favourite mailer. Alternatively you
693 can use \texttt{git send-email} to do the job.
694 \begin{verbatim}
695 $ git send-email --subject '[PATCH] my latest changes' --to foo@bar.org \
696 --cc raboof@foobar.edu *.patch
697 $ rm *.patch
698 \end{verbatim}
699 \texttt{git send-mail} can also be configured using \texttt{git configure} to
700 avoid having to write the long command line every time.
702 When you receive changes from your collaborators by e-mail, just save the
703 mail(s) in your project directory and apply the changes:
704 \begin{verbatim}
705 $ git am --3way mailfile(s)
706 $ rm mailfile(s)
707 \end{verbatim}
708 It might be a good idea to create and switch to a temporary branch before
709 applying the patches, since this gives you a better possibility to inspect the
710 changes before merging them with your main branch. Obviously, if there is a
711 conflict it has to be resolved like normal. When the conflict has been
712 resolved, you can continue the merge with \texttt{git am --continue}.
714 %% DRAFT >>>
715 \subsubsection{Using bundles}
716 Alternatively you can use \texttt{git bundle} for off-line
717 transport.
718 Instead of creating an archive, you can create an initial bundle with
719 \begin{verbatim}
720 $ git bundle create manus.bndl master
721 $ git tag -f sent-manus master
722 # second command marks when we send bundle
723 \end{verbatim}
724 The receiving side would do
725 \begin{verbatim}
726 $ git bundle verify manus.bndl
727 $ git fetch manus.bndl master:from-him/master
728 \end{verbatim}
730 If you want to send changes since last bundle, do
731 \begin{verbatim}
732 $ git bundle create manus.bndl sent-manus..master
733 $ git tag -f sent-manus master
734 # second command marks when we send bundle
735 \end{verbatim}
736 The receiving side does the same as before.
737 %% <<< DRAFT
739 \section{Advanced git}
740 \subsection{Configuring git}
742 %% Preferred way to configure git is to _edit_ config files;
743 %% git-config is for script (and examples), not for humans
745 Git uses a number of config files, system
746 wide, per user and per repository (see the git config man page for more info).
747 As a minimum you should set the following options:
748 \begin{verbatim}
749 $ git config --global user.name "Your Name"
750 $ git config --global user.email "my@email.com"
751 \end{verbatim}
752 These options are written in ~/.gitconfig, and are used by git to ensure that
753 your commit messages are sensible, since user names and mail addresses are not
754 necessarily set properly on all machines. In addition you might want to enable
755 the following options as well:
756 \begin{verbatim}
757 $ git config --global color.branch auto
758 $ git config --global color.status auto
759 $ git config --global color.diff false
760 \end{verbatim}
761 If you want beautifully colored diffs, you probably need to add the '-R' flag
762 to the \texttt{LESS} environment variable, and change 'false' to 'auto'.
764 If you want to use some external (graphical) merge tool to resolve conflicts:
765 \begin{verbatim}
766 $ git config --global merge.tool meld
767 \end{verbatim}
768 \texttt{meld} is a fantastic merge tool, and I strongly suggest you have a
769 look at it. Other possibilities are \texttt{kdiff3} and \texttt{xxdiff}.
771 Git has many tunable bells and whistles. Please refer to the git configure
772 man page for a more complete listing of configurable options.
774 \subsection{Resolving conflicts}
775 \label{sec:conflict}
776 Every now and then you end up in a situation where files have overlapping or
777 incompatible changes. This will be flagged as a conflict by git, and you will
778 have to resolve the conflict before you can proceed. When you get a conflict,
779 git will insert markers in the file showing where the conflict occurred. For
780 example, working on branch foobar, you merge with changes on the master branch
781 and get a conflict. The problematic section in the offending file looks like
782 this:
783 \begin{verbatim}
784 <<<<<<< HEAD:foobar
785 This is the stuff I have in my working copy in branch foobar...
786 =======
787 This is what master currently looks like. Now you need to edit the file, pick
788 the part you want to retain and remove all the markers.
789 >>>>>>> master:foobar
790 \end{verbatim}
791 After you have resolved the conflict in your favorite editor, save and
792 recommit:
793 \begin{verbatim}
794 $ git add <file>
795 $ git commit
796 \end{verbatim}
797 A more convenient way to handle conflicting merges is to configure
798 \texttt{git mergetool} to launch your favourite diff/merge tool:
799 \begin{verbatim}
800 # this only needs to be configured once
801 $ git configure --global merge.tool meld
802 $ git mergetool
803 $ git commmit
804 \end{verbatim}
805 If you are not familiar with the general purpose graphical diff- and merge
806 tool \texttt{meld} I warmly recommend that you familiarise yourself with this
807 excellent piece of software!
809 \subsection{Specifying revisions}
810 \label{sec:revision}
811 Being able to specify revisions is important whenever we want to in any way
812 access older revisions. Git has a number of different ways of specifying a
813 certain revision, some which we have seen already:
814 \begin{enumerate}
815 \item A branch name
816 \item A tag
817 \item The symbolic name HEAD
818 \item A revision relative to a branch, tag or HEAD
819 \item The SHA1 hash of a commit
820 \end{enumerate}
821 The two first forms are pretty obvious, but the others need a bit of
822 explanation. The symbolic name HEAD will always points to current
823 checked out branch. HEAD is mostly useful to specify
824 older revisions relative to it. There are a number of ways to specify
825 a relative revision, e.g to specify the parent revision (i.e. one below HEAD)
826 \begin{verbatim}
827 git show HEAD~1
828 \end{verbatim}
829 The general syntax is
830 \begin{verbatim}
831 git show <HEAD|tag|branch|SHA1>~N
832 \end{verbatim}
833 For more details see the \texttt{git-rev-parse} man page,
834 specifically the section ``Specifying revisions''.
835 %% ~ and ^ are NOT equivalent: HEAD~n means n-th ancestor of HEAD in
836 %% direct (first parent) line, HEAD^n means n-th parent of HEAD if
837 %% HEAD is merge commit (octopus is merge which has more than
838 %% _2_ parents).
840 Every commit (every object in fact) is internally identified by a
841 cryptography-strength one-way SHA1 hash consisting of 40 hexdigits, which
842 \emph{uniquely} identifies any commit (or file). These hexdigits can be found
843 by running
844 \begin{verbatim}
845 $ git log
846 commit 3ff678cc8abc29db9cb33ec9ca4e468496cb8063
847 Author: Jonas Juselius <jonas@iki.fi>
848 Date: Fri Nov 16 12:47:47 2007 +0100
850 Updated .pdf version.
854 \end{verbatim}
855 where the first line of every log entry starts with the commit label and the
856 hexdigit identifying that particular revision.
858 Wherever a revision is needed one can give
859 the SHA1 hash to exactly specify the revision (in fact, the first 6-8
860 hexdigits are usually enough)
861 \begin{verbatim}
862 git checkout bed6ba53
863 \end{verbatim}
865 Sometimes it's practical to be able to specify a revision range to limit the
866 output
867 \begin{verbatim}
868 git diff HEAD~4..HEAD~1
869 \end{verbatim}
871 \subsection{Cherry picking}
872 \label{sec:cherry}
873 There are lots nice tricks we can play with
874 branches. Suppose you want to try out some idea, but you don't know exactly
875 how it will work out. Create a new branch from your current working branch and
876 check it out:
877 \begin{verbatim}
878 $ git branch foobar
879 $ git checkout foobar
880 \end{verbatim}
881 Now work hard and commit often. If it turns out that everything is good, and
882 you want to keep all changes, merge them with your working branch and push
883 them to the master:
884 \begin{verbatim}
885 $ git checkout work
886 $ git merge foobar
887 $ git push origin work:myuserid
888 \end{verbatim}
889 Now you can delete the foobar branch for ever and all times
890 \begin{verbatim}
891 $ git branch -d foobar
892 \end{verbatim}
893 If, on the other hand it turns out that you had a crackpot idea, and you
894 don't want to see any of it anymore, delete the branch and all changes,
895 commits and everything on the branch will be gone for ever! No trace of it.
896 But what if there are partially useful changes to foobar that you want to
897 keep, before discarding the rest? Well that's when small commits are useful
898 because you can cherry pick! Checkout your work branch, fire up gitk on foobar,
899 and find the commits you like to keep. Every commit is identified by a long
900 SHA1 hash (a long sequence of numbers and letters). Now cherry pick the
901 commits you want into the work branch:
902 \begin{verbatim}
903 $ git checkout work
904 $ gitk foobar &
905 # find commit, copy the SHA1 hash with the mouse
906 $ git cherry-pick SHA1
907 \end{verbatim}
908 Repeat the cherry pick as many times you like. As you see, git gives a lot of
909 flexibility by using branches. Just be a bit careful in the beginning not to
910 make a mess and lose track of what you are doing.
912 \subsection{Stashing}
913 Sometimes when you are working on a branch you temporarily need to switch to
914 another branch to test something, or maybe fix a bug. In a situation like this
915 you cannot just checkout the other branch, since then all your local changes
916 would get lost (don't worry, git will not allow you to do this). However, you
917 might not want to commit your changes either, since they are not ready or
918 complete. In situations like this git allows you to temporarily commit your
919 changes in a ``stash''. Running \texttt{git stash} saves your latest changes
920 and resets the current branch to it's latest checked in state (the HEAD). When
921 you are ready to continue working you just apply changes in the stash.
922 \begin{verbatim}
923 $ git stash save
924 $ git stash list
925 stash@{0}: WIP on mybranch: ad6d0aa... foo
926 $ git checkout other branch
928 $ git checkout mybranch
929 $ git stash apply
930 $ git stash clear
931 \end{verbatim}
934 \subsection{Repository maintenance}
935 When git was conceived, it was based on a very simple scheme for storing
936 revisions to files in the repository. Instead of actually figuring out how
937 files changed between revisions and just storing the differences, git just
938 stored a (compressed) copy of the whole file! This is obviously quite simple and efficient,
939 but very wasteful in terms of storage space. It also becomes inefficient as
940 the number of files in the repository grows.
942 Modern versions of git still retains this simple storage scheme by default.
943 This means that as your repository evolves with time it will grow
944 substantially in size. Fortunately git provides commands to convert the
945 individual objects in the database into a \emph{pack} file, which stores only
946 the differences between revisions. Once the pack has been generated, all the
947 old objects are unnecessary and can be pruned:
948 \begin{verbatim}
949 $ git repack
950 $ git prune
951 \end{verbatim}
952 In fact, git has a simpler command which will do this in one go, and
953 perform additional optimisations on the repository. To ``garbage collect'' and
954 fully optimise your repository run
955 \begin{verbatim}
956 $ git gc --prune --aggressive
957 \end{verbatim}
958 %Current git tries to packs loose objects and repack, when needed, but
959 %it would not prune by itself. Pruning should be run on quescient
960 %repository.
962 \subsection{Exporting a repository}
963 Sometimes you want to package your source, e.g. for an official release, but
964 you certainly don't want to include the whole repository in the package. One
965 way to do this is to simply make a copy of the repository, checkout the right
966 branch, clean out all
967 generated and unnecessary files, delete the \path{.git} directory and make a tar
968 file. This process is much simplified using the \texttt{git archive} which
969 will create a .tar or .zip file of the wanted revision on the fly
970 \begin{verbatim}
971 # to create a uncompressed archive
972 $ git archive --prefix=myprog-1.42/ HEAD >../myprog_1.42.tar
973 # compressed archive are also trivial
974 $ git archive --prefix=myprog-1.42/ HEAD |gzip -c >../myprog_1.42.tgz
975 \end{verbatim}
976 This creates an archive of the latest version (HEAD) on the current branch.
977 You can specify any branch specifier or tag you like to export some other
978 version.
979 Please note the trailing '/' in the prefix, without it you will get a bit of a
980 surprise\ldots
982 \subsection{Finding bugs}
983 \label{sec:bisect}
984 Everyone doing software development either alone, or in a group have been in
985 the situation where a bug is suddenly found, with very little clue when
986 it has been introduced, much less where it might be. It can be very tedious to
987 go back and figure out where and when the bug was introduced. Fortunately git
988 has a very clever mechanism to aid us in the process, using the command
989 \texttt{git bisect}. The way it work is by tagging a starting revision as bad,
990 and then tag some \textit{known, working} revision as good. \texttt{git
991 bisect} will checkout a new revision, and then you compile, test and mark it
992 as either good or bad. A few of these cycles, and the offending revision is
993 found! Here is the process:
995 \begin{verbatim}
996 $ git bisect start
997 $ git bisect bad # current rev is bad
998 $ git bisect good version-1.21 # tagged version 1.21 works, guaranteed!
999 # now git will checkout a new revision
1000 $ make; test.sh
1001 # bad?
1002 $ git bisect bad
1003 $ make; test.sh
1005 \end{verbatim}
1006 In fact, git will allow you to automate the whole process! If your test script
1007 can determine if a version is working or not, then just let your script return
1008 0 if the revision is working and 1 it's bad, and run
1009 \begin{verbatim}
1010 $ git bisect start
1011 $ git bisect run ./test.sh
1012 \end{verbatim}
1013 So, now that you have found the offending revision you probably want to go
1014 back to the latest revision and start debugging properly. To do this just
1015 execute
1016 \begin{verbatim}
1017 $ git bisect reset
1018 \end{verbatim}
1020 \subsection{Undoing commits and resetting}
1022 %% There are three ways of correcting: fine tune last commit with
1023 %% 'git commit --amend', discard last 3 changes using 'git reset
1024 %% --hard HEAD~3' (this removes history; use reflog if you made
1025 %% mistake when resetting); this should not be done if you have
1026 %% published this history, and 'git revert <commit>' which would
1027 %% revert _changes_ brought by commit, in practice doing cherry-pick
1028 %% of reverse of given commit.
1030 Sometimes we screw up. It's embarrassing. And we don't want anybody to know
1031 about it. Like committing something we should not have, or even worse,
1032 misspelling a commit message. Whatever. Or maybe we want to undo a merge or a
1033 pull which screwed up our repository, causing tons of conflicts or breaking
1034 things badly. There are two commands for undoing commits, \texttt{git revert}
1035 and \texttt{git reset}. These two differ in the sense that \texttt{git revert}
1036 will undo changes in a controlled, and in itself reversible manner, whereas
1037 \texttt{git reset} resets the branch to a specified state without leaving any
1038 trace in the history. Even if you reset, it's still possible to recover
1039 the ``lost'' commits through the reflog facility (see below).
1041 Suppose you have been working for a while, committing regularly, and after a
1042 while you realise that everything you have done the last few commits is utter
1043 garbage, and you want to go back 3 revisions and start over:
1044 \begin{verbatim}
1045 $ git revert HEAD~3
1046 \end{verbatim}
1047 This resets your working copy to the specified revision, and commits the
1048 changes the revert introduced. Hence, you can go back when you realise that
1049 the garbage you had produced actually was gold after all.
1051 Another scenario is when you realise you have an embarrassing typo in a commit
1052 message, or that you forgot to include a file in the commit, or committed
1053 too many files. Obviously you can always revert, but that's really quite
1054 unnecessary and does not really achieve what you want. In this situation you
1055 would do a soft reset, which means that the HEAD revision is reset to point to
1056 another resent revision, but your \emph{working copy} is left intact. To undo
1057 a commit in this manner you would do the following:
1058 \begin{verbatim}
1059 $ git commit file1 file2
1060 # uups, let's undo the commit message, and edit file1 and add file3
1061 $ vim file1
1062 $ git add file3
1063 $ git commit --amend file1 file2 file3
1064 \end{verbatim}
1065 %% git reset --soft is nowadays almost never used directly.
1066 %% I would use "git commit -a" instead of "git commit <file>",
1067 %% or "git add <file>; git commit"
1069 If you really, really want to reset the state of both the repository and your
1070 working copy you need to do a hard reset. Be warned, a hard reset throws away
1071 all commits and all changes to your files up to the specified revision forever.
1072 There is no way of getting the information back again. To do a hard reset and
1073 go two revisions back, and at the same time also reset your working copy to
1074 that state:
1075 \begin{verbatim}
1076 $ git reset --hard HEAD~2
1077 \end{verbatim}
1078 If you for some reason suddenly realise that the reset was a mistake, and you
1079 want to go back to where you were before the reset, you can
1080 reset back to the reflog HEAD:
1081 \begin{verbatim}
1082 $ git reset --hard HEAD@{1}
1083 \end{verbatim}
1084 %% You can use "git reset --hard ORIG_HEAD" too
1087 \subsection{Working with multiple remotes}
1088 \label{sec:remotes}
1089 Git provides a lot more flexibility than CVS. Due to it's distributed nature
1090 it's actually possible to have multiple ``master'' servers, or more
1091 accurately, multiple remote repositories.
1092 A typical situation when it can be desirable with many remotes, is when the
1093 central master server has limited push access, e.g. to ensure that only
1094 working code is distributed to other developers. In a situation like this one
1095 can set up multiple remotes to point to the repositories of the
1096 people one is collaborating closely with.
1098 The default remote repository is
1099 called 'origin', but you can attach as many remotes as you like. To register a
1100 new remote repository, simply run
1101 \begin{verbatim}
1102 $ git remote add <name> <myserver>:/path/to/git/proj.git
1103 \end{verbatim}
1105 Now when you fetch, pull and push use your new remote-tag {\tt <name>} instead
1106 of {\tt origin} to the corresponding command. The new development cycle will
1107 typically be something like
1108 \begin{verbatim}
1109 $ git pull origin master
1110 # do some work
1111 $ git push myremote mybranch:mybranch
1112 # or to push all branches automatically
1113 $ git push --all myremote
1114 \end{verbatim}
1116 A good solution might be to have a central storage area, with a true ``master''
1117 repository, to which only a few people have write access to. The other
1118 developers can create their own personal repositories in the storage area, and
1119 have full access to that repository. By setting up the \texttt{gitweb}
1120 interface it becomes very easy to track what everybody is doing through
1121 the web.
1123 To set up your own sub-master repository follow these steps:
1124 \begin{verbatim}
1125 $ ssh <myserver>
1126 $ cd /path/to/repos
1127 $ mkdir $USER
1128 $ git clone -s -l --bare proj.git $USER/proj.git
1129 $ cd $USER/proj.git
1130 $ git branch -a
1131 # remove any branches and tags you do not care about
1132 $ git branch -D <branch branch...>
1133 $ git tag -d <tag tag...>
1134 # edit description (for gitweb) to your liking
1135 $ vi description
1136 \end{verbatim}
1137 The -s and -l options to {\tt git clone} will cause git to set up the
1138 repository to use the master's object database as much as possible, so that it
1139 will take up very little space.
1141 %\subsection{Cleaning up commits using rebase}
1142 %To be written\ldots
1144 %\subsection{Submodules}
1145 %To be written\ldots
1147 %\section{Configuring the git web interface}
1148 %To be written\ldots
1150 %\section{Git administration}
1151 %To be written\ldots
1153 %\section{Git internals for Normal People (tm)}
1154 %To be written\ldots
1156 \end{document}