ideas

   1 * Introduction
   2
   3 ** Intended audience
   4 People already familiar with an SCM (CVS/Subversion).  Developers.
   5 Slides intended to be more standalone than presentation-based.
   6
   7 ** Normal SCM workflow
   8 Centralized server from which people checkout working copies and commit
   9 changes.  (diagram)
  10
  11 ** Distributed SCM
  12 Usually a centralized server (reference point) which people clone.
  13 What makes a truly distributed SCM:
  14   - The ability to exchange revisions without the centralized server.
  15   - Anybody can become the reference point of someone else.
  16 (diagram)
  17
  18 ** Revision numbers
  19 Incremental numbers no longer make sense in DSCM.  (explain with a diagram)
  20
  21 * Why Git is a good (D)SCM, why do the switch?
  22
  23 ** Problems with traditional SCMs
  24 Common problems:
  25   - Branch management
  26   - Commit policy
  27
  28 Branch management: branches are costly to create and nightmare to maintain.
  29 Merging branches is utterly cumbersome.  Typical example with SVN: one must
  30 manually track the merges as to not merge the same thing twice.  Merging
  31 flattens the history (loose the commit logs, makes the history extremely
  32 hard to review, prevents "blame" from working properly, doesn't handle file
  33 renames).
  34
  35 Commit policy: without commit (write) access, one can't work without extra
  36 tools (e.g., quilt).  Who must be given write access?  What can be
  37 committed?  Typical example: "trunk" (main development line) must always
  38 work (compile, pass the testsuite...).  Problems: "Idea.  Hack, hack, hack.
  39 Almost good but not quite perfect.  Some problems must be solved before the
  40 commit.  Hack, hack, hack.  Send a huge commit."  Does not encourage code
  41 review.
  42
  43 ** What's good in Git
  44 Very lightweight branches where merging is made trivial.
  45
  46 Encourages code review (will demonstrate this later in the workflows.)
  47
  48 Cryptographically secure.  100% reliable.  GPG-signed tags.
  49
  50 Feature rich (git-grep, bisect, GUIs, gitweb, rerere, submodules, etc.)
  51
  52 Blazingly fast.  Space efficient.
  53
  54 Patches are first class objects.  Very good integration with emails (to
  55 send/receive and apply patches).
  56
  57 Interaction with other SCMs (SVN in particular).
  58
  59 The ability to work offline.  Exchange commits with others.
  60
  61 Made the UNIX way: lots of small programs.  Easily scriptable.
  62
  63 Huge and quickly growing community, many active developers, project moving
  64 at a fast pace and quickly spreading.
  65
  66 Cherry on the cake: colourful :)
  67
  68 ** What's not so good with Git
  69 Windows port is on its way but not as good as that of CVS/SVN.
  70
  71 Git is obviously harder to learn than SVN, because it has more concepts.
  72
  73 * The basics
  74
  75 ** The git commands
  76 "git-foo" vs "git foo" (completion in dumb shells, whether you have 1
  77 command in your PATH or 137 "git-*")
  78
  79 "git foo --help" = "man git-foo"
  80
  81 Git has two types of commands: Plumbing commands and Porcelain.  The
  82 "plumbings" are more low-level commands on top of which "porcelain" are
  83 written.  Git used to be very low-level and several tools came on top of it
  84 to make it a friendly SCM (e.g. the deprecated Cogito).  Since then, Git has
  85 evolved a lot and its numerous "porcelain" commands make it a real SCM.
  86
  87 ** Telling git who you are
  88   $ git config --global user.name "Your Name"
  89   $ git config --global user.email you@example.domain.com
  90 Stored in ~/.gitconfig
  91
  92 ** Creating a working copy
  93 Creating a repository (git init).  This operation will create an empty
  94 directory with a single .git directory in it.  That's where everything will
  95 be and there will be a single .git directory (unlike the .svn directories).
  96 Unlike SVN's .svn directories, you are allowed to look at the files in the
  97 .git directory, they're not like a "very-internal implementation detail"
  98 that you shouldn't use.  In particular, there is a file .git/config which
  99 contains configuration options specific for this repository (which override
 100 any global configuration).
 101
 102 Cloning a (remote) repository (git clone). Supported protocols: HTTP(S)
 103 (read-only without DAV), rsync (r/w, but writing deprecated because
 104 non-atomic), SSH (needs Git installed in PATH on the remote host, can be
 105 restricted with git-shell), local paths (r/w), native Git protocol (ro).
 106
 107 ** Basic operations
 108 Importing a tree:
 109   $ git init
 110   $ git add .
 111   $ git commit -m "Initial import."
 112
 113 Standard operations:
 114   $ git add file
 115   $ git rm file
 116   $ git mv foo bar
 117   $ git diff
 118   $ git-gui
 119
 120 Sending changes
 121   $ git commit
 122     -s (Signed-by)
 123     --amend
 124     --author "Name <mail@foo.com>" (committer not necessarily author)
 125 A commit is identified by its sha1 which verifies the entire tree, contents,
 126 history and everything that let to this commit.  Revisions can be
 127 abbreviated by providing only the first few characters.  If they uniquely
 128 match a commit, Git will figure out.  You can also use the "commit^" syntax
 129 to express "the first parent of commit" (e.g.: "HEAD^") or "commit^N" for
 130 "the Nth parent of commit".  So if you want to 3 revisions back, you can use
 131 "HEAD^^^" or "HEAD~3" which is more convenient.
 132
 133 Reviewing changes
 134   $ git log
 135     -Swhat
 136   $ git shortlog
 137   $ gitk
 138   $ git blame
 139     -Swhat
 140   $ git diff 'revision^!' == git diff revision^ revision
 141
 142 Undoing changes:
 143   $ git reset (warning, this is the equivalent of svn revert)
 144   $ git revert (create a commit that is the opposite of another, unlike svn
 145                 revert)
 146
 147 Setting up aliases:
 148   $ git config alias.st status
 149   $ git config alias.diffstat 'diff --stat'
 150   $ git config alias.diffw 'diff --ignore-all-space'
 151 You might want to setup many CVS/SVN-like aliases for ci, co, st, etc.
 152 Aliases can't override builtin commands, e.g.:
 153   $ git config alias.diff 'diff --patch-with-stat'
 154 If you then run "git diff", the alias will be silently ignored.
 155
 156 ** Understanding the index
 157 With SVN, HEAD is replicated in the ".svn" in each directory.  Operations
 158 such as diff can be done without network access.  The ".svn" serve as a
 159 cache.  The next commit is built by looking at the differences between the
 160 working copy and the version in the ".svn".  With CVS, there must be a
 161 network access but the result is identical.
 162
 163 Git has a single ".git" at the root and it also has a similar caching
 164 mechanism called the "index" in the binary file ".git/index".  When running
 165 "git diff" the working copy is compared to the tree stored in ".git/index".
 166 What is different with Git is that the index is used as a staging area to
 167 build the tree of the next commit.  By default, "git commit" doesn't do a
 168 full-tree commit like in other SCMs (can be done with -a however).  It only
 169 commits the content of the index.  Thus, you must first add your changes to
 170 the index before committing them.  Adding changes to the index can be done
 171 with "git add" (which has thus 2 roles: telling Git about new files,
 172 updating the content of the index).  This can look weird or impractical at
 173 first sight but ends up being very useful.  With git-gui or git-add
 174 --interactive, one can even select the hunks to schedule in the next commit.
 175
 176 ** Understanding the basics of Git internals
 177 It helps to understand how Git works.  Let's dive in the .git directory.
 178 The "HEAD" file specifies the current HEAD ("ref: refs/heads/master", sort
 179 of symlink to the file "refs/heads/master" relative to the HEAD file).
 180
 181 The "objects" directory which contains the real data of the repository.  By
 182 data we mean: file contents, trees (list of contents associated with names),
 183 commits (tree with attributes), and tags.  All these are identified by a
 184 (hopefully) unique sha1 sum.  Objects are immutable, and Git will never
 185 delete them unless explicitly asked so.
 186
 187 The "refs" directory contains references to these objects.  It has 2
 188 sub-directories: "heads" and "tags" to store the HEAD of the different
 189 branches and tags.  They basically contains files (possibly in
 190 sub-directories) which contain a 40-byte hex sha1 (plus a \n).
 191
 192 Creating a commit is thus a matter of building a tree in the index and then
 193 adding the associated attributes (commit message, author, parent commits,
 194 etc.).
 195
 196 ** Working with the index
 197 Adding/removing things from the index.  Diffing the index against the
 198 working copy or against HEAD.
 199
 200 ** Creating tags
 201 local (private tags) are only refs, annotated/signed tags are real objects.
 202
 203 * Branches
 204
 205 ** The basics
 206 A branch is nothing more than a pointer to a point in history stored under
 207 .git/refs/heads/branch-name.  The convention is that "master" is the default
 208 branch and is thus assimilated to SVN's "trunk".  There can be
 209 sub-directories under .git/refs/heads/ to express different namespaces of
 210 branches.
 211
 212 "git branch" shows the (local) branches and the current branch (which can
 213 also be known by looking at .git/HEAD).  Switching branches is as easy as
 214 doing "git checkout branch-name" and creating a branch is a matter of
 215 "git branch branch-name" or "git checkout -b new-branch branch-name" to
 216 create a new branch from branch-name's HEAD and switch to that new-branch.
 217
 218 <sample branch setup>
 219 <view in gitk --all>
 220
 221 ** Merging branches
 222 Example of a merge with conflicts.  View the history.  The merge point is a
 223 commit with two parents.  Graphical viewers come handy.  Of course, Git
 224 remembers the merge points.
 225
 226 ** Viewing branches
 227 GUIs, git-show-branch.
 228
 229 ** Remote branches
 230 It's possible to import branches from remote repositories.  They live under
 231 .git/refs/remote and are not meant to be changed locally.  "git-fetch" is
 232 used to retrieve the state of remote branches in the local repositories.  In
 233 order to change a remote branch, it must be forked first by creating a local
 234 branch: "git checkout -b my-branch remote-branch".  Once you have that
 235 remote branch, you just git-merge with it.  Since it's a very common
 236 operation, git-pull does them both: fetch+merge in the current local
 237 branch.  By default when cloning a remote repository, a remote branch named
 238 "origin/HEAD" is created.  It's possible to add/remove remote branches with
 239 git-remote.  In particular, it's possible to fetch remote branches from
 240 multiple different repositories.
 241
 242 ** How do merges work?
 243 Git first finds a common ancestor between the two branches and then uses a
 244 standard 3-way merge algorithm.  It's possible to define custom merge
 245 drivers to automatically merge specific files (e.g. merge Open Document
 246 files or tarballs).  It is also possible to define custom diff drivers.
 247 Anyways, once the common ancestor is found, it is read and written in the
 248 index, along with the HEAD of the two branches to merge.  Each of these 3
 249 trees is stored in a different "stage" of the index.  Stages also the index
 250 to contain multiple trees.  In particular, the stage 0 is used to build the
 251 next commit (that's where git-add puts its stuff).
 252 <look at the output of git-diff during a merge>
 253
 254 * Workflows
 255 It's possible to use Git in an SVN-way but it's not what's most efficient.
 256 Here are some typical workflows that perform well in Git.
 257
 258 ** Simple private project (with 1 developer)
 259 Only one repository where changes are committed (trivial).
 260
 261 ** Simple published project (with 1 developer)
 262 Repository on a public server.  <explain "bare" repositories>.  The
 263 developer publish changes with "git-push" <example>.  Warning "git-push" is
 264 NOT the opposite of "git-pull" because pull = fetch + merge whereas "push"
 265 is actually the opposite of "fetch".
 266 FIXME: pitfalls of git-push (it pushes all refs, stashes, etc).
 267
 268 ** Small project (~10 developers)
 269 One public "reference" repository.  One maintainer (the "integrator") in
 270 charge of it.  Developers can either directly push to the public reference
 271 repository (CVS-like usage of Git, not recommended) or send their changes to
 272 the maintainer for review (git-format-patch / git-send-email for the
 273 developer, and git-am for the maintainer, or push to another per-developer
 274 public repository and ask the maintainer to pull from it).
 275
 276 The hierarchy is informal and can easily be changed or forked.  Nothing in
 277 Git requires it.
 278
 279 ** Large project (many developers)
 280 One public "reference" repository.  One maintainer (the "integrator") and
 281 several sub-maintainers.  The developers first send their changes to the
 282 sub-maintainer in charge of the module they're changing.  The sub-maintainer
 283 approves the change by signing it and pushing it in his own public
 284 repository.  Every once in a while, the integrator will pull from the
 285 sub-maintainers because he trusts them.
 286
 287 The hierarchy is based on trust between people and is easy to adjust as
 288 people join/leave the project.
 289
 290 * Working with Git on a daily basis: How to take most benefits?
 291 Here, we address typical questions or problems that arise during daily
 292 development and explain how Git helps to solve them faster.  Various topics
 293 are discussed and sorted in what seemed to be from the most useful to the
 294 least useful.
 295
 296 ** Housekeeping
 297 As we'll see, you can easily rewrite the history with Git.  We already
 298 said that the objects are immutable and will never be deleted unless you
 299 explicitly asked so.  Moreover, the fact that each object (file content,
 300 tree, commit) has its own file under .git/objects will quickly lead to
 301 degraded performances.  Git can pack the objects together so that they will
 302 consume much less space on the disk and accessing them will be faster.
 303 Therefore, you should repack your repository every once in a while.  The
 304 command "git-gc" does all the required housekeeping for you by packing
 305 together all the objects into one single big pack.  If you want to remove
 306 unreferenced loose objects, you can pass it the "--prune" option.  Beware
 307 though that using this option is not safe if someone else is working on the
 308 repository.  If you want to repack harder, you can use "git gc --aggressive"
 309 which will take more time but produce a much thinner pack (I've seen 200M
 310 .git repositories shrunk down to 12M!).
 311 Alternatively, you can run "git repack" to produce a smaller incremental
 312 pack.
 313 If you never repack, or if you end up having lots of small incremental
 314 packs, the performances will degrade.  Some people did not repack in several
 315 months and observed slow downs where git would take several seconds to
 316 perform basic operations (whereas it would do them instantaneously with
 317 nicely packed data).
 318 As of today, Git's head contains a mechanism called "git-gc --auto" which is
 319 automatically invoked by some commands to do a minimal housekeeping in your
 320 back and warn you if your repository gets insanely unpacked, but this is
 321 only available in Git's HEAD (so you will have to wait until the next
 322 release).
 323
 324 ** Setting up a public repository
 325 FIXME
 326
 327 ** Working from multiple places
 328 Often people need to leave and pick up their work where it stopped, later
 329 and from another place.  It's the "Going back home" commit syndrom.  With
 330 Git it can be addressed by pushing a private branch to one's public
 331 repository or by using git-bundle.
 332
 333 ** Topic branches
 334 Because it's so common to work on multiple independent changes at the same
 335 time and because Git fosters branches, it's very common to have many local
 336 (typically private) branches to work on different ideas.  They are called
 337 topic branches and usually work as follows:
 338   - Fork the current development line (usually "master"):
 339     git checkout -b fix-something master
 340     (the branch is named after the topic it will be about)
 341   - Hack, commit, hack, commit, hack.
 342   - Meanwhile, the "master" branch has changed.  2 scenarios:
 343       1. test the current code against the changes in master, just to see if
 344          it still works.
 345       2. "rebase" the current branch, that is, forward-port local commits to
 346           the updated upstream head.
 347   - once the topic is finished, simply fast-forward merge it back in
 348     "master" (after making sure "master" is up-to-date and having rebased
 349     the topic branch) and delete the topic branch.
 350
 351 ** Detached head
 352 If you ever checkout a given revision that is not the HEAD of a branch (such
 353 as a tag or by directly using the sha1 of a commit) you will be "detached"
 354 because it's not on any branch.  Be careful because any change you make
 355 there will not be referenced by any branch (it won't get lost, you can still
 356 give it a name with "git branch").
 357
 358 ** Doing an urgent fix
 359 You're in the middle of something and your boss comes to demand that you fix
 360 something *immediately*.  You could put your stuff in a temporary branch to
 361 put your changes away and then switch branch to do the emergency fix.  This
 362 involves many operations is tedious.  git-stash will help by saving all your
 363 changes (whether they were saved in the index or still in "dirty" state in
 364 the working copy) in a stash and restore the working copy to HEAD
 365 (git-reset --hard).
 366
 367 <example>
 368
 369 git-stash can also be used to pull in a dirty tree
 370 <example>
 371
 372 ** Since when is it broken?
 373 FIXME: git-bisect / git-blame
 374
 375 ** Submodules
 376 FIXME
 377
 378 ** Git as a better SVN client
 379 If you have SVN installed with its Perl bindings, you will be able to use
 380 git-svn.  On Debian at least, git-svn comes in its own package, separate
 381 from git-core.  git-svn enables you to work with SVN repositories
 382 transparently.  It will basically clone an SVN repository (by checking out
 383 all its revisions or only a given subset) and let you fetch new revisions
 384 from and push your own commits to the SVN repository.  One of the goals of
 385 git-svn is to be truly transparent, no-one should be able to tell whether
 386 you're using SVN or Git.
 387   <examples>
 388
 389 ** Rewriting the history
 390 From what we've said on rebase earlier, it's clear that what happens in this
 391 case is that the history must be rewritten.  What rebase does is that it
 392 removes all the commits to be rebased and saves somewhere.  Then it
 393 fast-forwards the current HEAD to the new HEAD.  Finally, it re-applies all
 394 the commits that were removed.  If there is a conflict, it will stop and let
 395 you fix the problem and do the commit.  Then you will have to invoke
 396 "git rebase --continue" to process with the remaining commits.  This will
 397 not delete commits from the repository, they will still be there but no
 398 longer reachable by this line of development.  If no other branch reference
 399 them, they will be "dangling" and will have to be pruned (see later).
 400
 401 The history must never be rewritten once it has been made public, because
 402 people who took a copy of it (clone) will run into troubles upon their next
 403 fetch.  If the history has been published, use git-revert instead.
 404
 405 The last commit (HEAD) can be easily changed by using "git commit --amend".
 406 What this will do is that it will simply merge the current index in the
 407 previous commit. <example>.  Once again, the previous HEAD will not be
 408 deleted from the repository.
 409
 410 Now rebase can be used to fix a mistake done in earlier (unpublished)
 411 commits (suppose the current branch is "master":
 412   $ git tag bad HEAD~5
 413   $ git checkout bad
 414   $ <hack>
 415   $ git commit -a --amend
 416   $ git rebase --onto HEAD bad master
 417   $ git tag -d bad
 418
 419 ** Recovering from mistakes
 420 // FIXME: reflogs
 421 // Explain that stashes are implemented with the reflogs?
 422
 423 ** Splitting a repository
 424 Frequently, a project starts out as a main application, and after some time,
 425 some parts of it emerge as being rather independent and often useful to
 426 other projects.  Thus, it's often a good idea to extract these independent
 427 parts and make them live in their own repositories so they can be re-used as
 428 git-submodules.  The best thing to do, in order to preserve the full history
 429 of this part of the project, is to extract the part of the history that has
 430 to do with the given module and put it in its own repository.  This can be
 431 done with git-filter-branch, which has been specifically designed for huge
 432 history rewrites.
 433
 434 FIXME: What needs to be done exactly?
 435
 436 ** Rerere (Reuse recorded resolution)
 437 If you have a topic branch which frequently needs to be checked against the
 438 latest changes in, say, master but which you don't want to merge with master
 439 until it's finished (and you don't want to rebase it) you will frequently do
 440 this:
 441   $ git merge master
 442   $ fix conflicts
 443   $ git commit
 444   $ <test>
 445   $ git reset --hard # get rid of the merge
 446 The problem is that you will have the same conflicts to solve over and over
 447 again whenever you want to test your work against the changes in master.
 448 rerere will record the way you solve conflicts so that whenever you get into
 449 the same conflict again, it will re-use the recorded resolution.  All you
 450 need is to run "git config rerere.enabled true" to enable the recording and
 451 re-using of conflict resolutions.
 452
 453
 454
 455  LocalWords:  SCM CVS LocalWords workflow DSCM SCMs testsuite workflows GUIs mv
 456  LocalWords:  SVN gitweb rerere submodules submodule scriptable gitconfig init
 457  LocalWords:  gui shortlog gitk svn symlink rebase rebased
 458  Local Variables:
 459  mode: outline
 460  End: