Checkout Internals

Checkout has to handle a lot of different cases. It examines the differences between the target tree, the baseline tree and the working directory, plus the contents of the index, and groups files into five categories:

  1. UNMODIFIED - Files that match in all places.
  2. SAFE - Files where the working directory and the baseline content match that can be safely updated to the target.
  3. DIRTY/MISSING - Files where the working directory differs from the baseline but there is no conflicting change with the target. One example is a file that doesn't exist in the working directory - no data would be lost as a result of writing this file. Which action will be taken with these files depends on the options you use.
  4. CONFLICTS - Files where changes in the working directory conflict with changes to be applied by the target. If conflicts are found, they prevent any other modifications from being made (although there are options to override that and force the update, of course).
  5. UNTRACKED/IGNORED - Files in the working directory that are untracked or ignored (i.e. only in the working directory, not the other places).

Right now, this classification is done via 3 iterators (for the three trees), with a final lookup in the index. At some point, this may move to a 4 iterator version to incorporate the index better.

The actual checkout is done in five phases (at least right now).

  1. The diff between the baseline and the target tree is used as a base list of possible updates to be applied.
  2. Iterate through the diff and the working directory, building a list of actions to be taken (and sending notifications about conflicts and dirty files).
  3. Remove any files / directories as needed (because alphabetical iteration means that an untracked directory will end up sorted after a blob that should be checked out with the same name).
  4. Update all blobs.
  5. Update all submodules (after 4 in case a new .gitmodules blob was checked out)

Checkout could be driven either off a target-to-workdir diff or a baseline-to-target diff. There are pros and cons of each.

Target-to-workdir means the diff includes every file that could be modified, which simplifies bookkeeping, but the code to constantly refer back to the baseline gets complicated.

Baseline-to-target has simpler code because the diff defines the action to take, but needs special handling for untracked and ignored files, if they need to be removed.

The current checkout implementation is based on a baseline-to-target diff.

Picking Actions

The most interesting aspect of this is phase 2, picking the actions that should be taken. There are a lot of corner cases, so it may be easier to start by looking at the rules for a simple 2-iterator diff:

Key

  • B1,B2,B3 - blobs with different SHAs,
  • Bi - ignored blob (WD only)
  • T1,T2,T3 - trees with different SHAs,
  • Ti - ignored tree (WD only)
  • S1,S2 - submodules with different SHAs
  • Sd - dirty submodule (WD only)
  • x - nothing

Diff with 2 non-workdir iterators

OldNew
0xxnothing
1xB1added blob
2xT1added tree
3B1xremoved blob
4B1B1unmodified blob
5B1B2modified blob
6B1T1typechange blob -> tree
7T1xremoved tree
8T1B1typechange tree -> blob
9T1T1unmodified tree
10T1T2modified tree (implies modified/added/removed blob inside)

Now, let's make the “New” iterator into a working directory iterator, so we replace “added” items with either untracked or ignored, like this:

Diff with non-work & workdir iterators

OldNew
0xxnothing
1xB1untracked blob
2xBiignored file
3xT1untracked tree
4xTiignored tree
5B1xremoved blob
6B1B1unmodified blob
7B1B2modified blob
8B1T1typechange blob -> tree
9B1Tiremoved blob AND ignored tree as separate items
10T1xremoved tree
11T1B1typechange tree -> blob
12T1Biremoved tree AND ignored blob as separate items
13T1T1unmodified tree
14T1T2modified tree (implies modified/added/removed blob inside)

Note: if there is a corresponding entry in the old tree, then a working directory item won't be ignored (i.e. no Bi or Ti for tracked items).

Now, expand this to three iterators: a baseline tree, a target tree, and an actual working directory tree:

Checkout From 3 Iterators (2 not workdir, 1 workdir)

(base == old HEAD; target == what to checkout; actual == working dir)

basetargetactual/workdir
0xxxnothing
1xxB1/Bi/T1/Tiuntracked/ignored blob/tree (SAFE)
2+xB1xadd blob (SAFE)
3xB1B1independently added blob (FORCEABLE-2)
4*xB1B2/Bi/T1/Tiadd blob with content conflict (FORCEABLE-2)
5+xT1xadd tree (SAFE)
6*xT1B1/Biadd tree with blob conflict (FORCEABLE-2)
7xT1T1/iindependently added tree (SAFE+MISSING)
8B1xxindependently deleted blob (SAFE+MISSING)
9-B1xB1delete blob (SAFE)
10-B1xB2delete of modified blob (FORCEABLE-1)
11B1xT1/Tiindependently deleted blob AND untrack/ign tree (SAFE+MISSING !!!)
12B1B1xlocally deleted blob (DIRTY
13+B1B2xupdate to deleted blob (SAFE+MISSING)
14B1B1B1unmodified file (SAFE)
15B1B1B2locally modified file (DIRTY)
16+B1B2B1update unmodified blob (SAFE)
17B1B2B2independently updated blob (FORCEABLE-1)
18+B1B2B3update to modified blob (FORCEABLE-1)
19B1B1T1/Tilocally deleted blob AND untrack/ign tree (DIRTY)
20*B1B2T1/Tiupdate to deleted blob AND untrack/ign tree (F-1)
21+B1T1xadd tree with locally deleted blob (SAFE+MISSING)
22*B1T1B1add tree AND deleted blob (SAFE)
23*B1T1B2add tree with delete of modified blob (F-1)
24B1T1T1add tree with deleted blob (F-1)
25T1xxindependently deleted tree (SAFE+MISSING)
26T1xB1/Biindependently deleted tree AND untrack/ign blob (F-1)
27-T1xT1deleted tree (MAYBE SAFE)
28+T1B1xdeleted tree AND added blob (SAFE+MISSING)
29T1B1B1independently typechanged tree -> blob (F-1)
30+T1B1B2typechange tree->blob with conflicting blob (F-1)
31*T1B1T1/T2typechange tree->blob (MAYBE SAFE)
32+T1T1xrestore locally deleted tree (SAFE+MISSING)
33T1T1B1/Bilocally typechange tree->untrack/ign blob (DIRTY)
34T1T1T1/T2unmodified tree (MAYBE SAFE)
35+T1T2xupdate locally deleted tree (SAFE+MISSING)
36*T1T2B1/Biupdate to tree with typechanged tree->blob conflict (F-1)
37T1T2T1/T2/T3update to existing tree (MAYBE SAFE)
38+xS1xadd submodule (SAFE)
39xS1S1/Sdindependently added submodule (SUBMODULE)
40*xS1B1add submodule with blob confilct (FORCEABLE)
41*xS1T1add submodule with tree conflict (FORCEABLE)
42S1xS1/Sddeleted submodule (SUBMODULE)
43S1xxindependently deleted submodule (SUBMODULE)
44S1xB1independently deleted submodule with added blob (SAFE+MISSING)
45S1xT1independently deleted submodule with added tree (SAFE+MISSING)
46S1S1xlocally deleted submodule (SUBMODULE)
47+S1S2xupdate locally deleted submodule (SAFE)
48S1S1S2locally updated submodule commit (SUBMODULE)
49S1S2S1updated submodule commit (SUBMODULE)
50+S1B1xadd blob with locally deleted submodule (SAFE+MISSING)
51*S1B1S1typechange submodule->blob (SAFE)
52*S1B1Sdtypechange dirty submodule->blob (SAFE!?!?)
53+S1T1xadd tree with locally deleted submodule (SAFE+MISSING)
54*S1T1S1/Sdtypechange submodule->tree (MAYBE SAFE)
55+B1S1xadd submodule with locally deleted blob (SAFE+MISSING)
56*B1S1B1typechange blob->submodule (SAFE)
57+T1S1xadd submodule with locally deleted tree (SAFE+MISSING)
58*T1S1T1typechange tree->submodule (SAFE)

The number is followed by ' ' if no change is needed or ‘+’ if the case needs to write to disk or ‘-’ if something must be deleted and ‘*’ if there should be a delete followed by an write.

There are four tiers of safe cases:

  • SAFE == completely safe to update
  • SAFE+MISSING == safe except the workdir is missing the expect content
  • MAYBE SAFE == safe if workdir tree matches (or is missing) baseline content, which is unknown at this point
  • FORCEABLE == conflict unless FORCE is given
  • DIRTY == no conflict but change is not applied unless FORCE
  • SUBMODULE == no conflict and no change is applied unless a deleted submodule dir is empty

Some slightly unusual circumstances:

  • 8 - parent dir is only deleted when file is, so parent will be left if empty even though it would be deleted if the file were present
  • 11 - core git does not consider this a conflict but attempts to delete T1 and gives “unable to unlink file” error yet does not skip the rest of the operation
  • 12 - without FORCE file is left deleted (i.e. not restored) so new wd is dirty (and warning message “D file” is printed), with FORCE, file is restored.
  • 24 - This should be considered MAYBE SAFE since effectively it is 7 and 8 combined, but core git considers this a conflict unless forced.
  • 26 - This combines two cases (1 & 25) (and also implied 8 for tree content) which are ok on their own, but core git treat this as a conflict. If not forced, this is a conflict. If forced, this actually doesn't have to write anything and leaves the new blob as an untracked file.
  • 32 - This is the only case where the baseline and target values match and yet we will still write to the working directory. In all other cases, if baseline == target, we don't touch the workdir (it is either already right or is “dirty”). However, since this case also implies that a ?/B1/x case will exist as well, it can be skipped.
  • 41 - It's not clear how core git distinguishes this case from 39 (mode?).
  • 52 - Core git makes destructive changes without any warning when the submodule is dirty and the type changes to a blob.

Cases 3, 17, 24, 26, and 29 are all considered conflicts even though none of them will require making any updates to the working directory.