Thanks for sharing!
I’ve also written something on how to break up the last stage in small incremental steps - not as detailed as yours, but at least it provides the structure I wish I had when starting. Initially posted as a comment on on the last stage; re-posting here for visibility.
In case it might help others, here’s the way I split things (numbers in parentheses are number of lines changed in my Python solution, to give an indication of size):
checkout-empty <commit>
(42)
unpack-objects
(undeltified) (56)
unpack-objects
(REF_DELTA
) (57)
ls-remote <url> HEAD
(25)
clone <url> <dir>
(45)
Stating the obvious: you want some local testing for each step.
checkout-empty: like git checkout
except it assumes the current directory is empty (except for .git
of course), so it just write files and directories. This is a very natural continuation from the previous two steps where we created commit and tree objects, here we’re reading them.
unpack-objects (undeltified): like git unpack-objects
but only for packfiles with no deltified objects. You can create such packfiles with git pack-objects --depth=0
for testing.
unpack-objects (REF_DELTA
): extend the previous step with support for REF_DELTA
objects. With my version of git (2.43.0), git pack-objects
uses this; also, we can make sure the packfile sent by the server will not use OFS_DELTA
simply by not advertising support for it (as mentioned in the Haskell post linked by another comment - super useful even for those of us who don’t speak Haskell indeed).
For development, a good first step is to add two very similar files (not too small) with git hash-object -w
and create a packfile that contains only those two objects with git pack-objects
. For example, if one of the files is a prefix of the other, then the packfile will have the larger file undeltified, and the smallest one using a single “copy from base object” instruction. If the two files are the same size with a single difference in the middle, instructions will be “copy, add, copy”.
ls-remote: like git ls-remote
. I just assumed the server speaks v2 and skipped capabilities discovery. For this and the next step, to understand the protocol I used a mix of (1) check the resources and documentation (gitprotocol-*
files) and (2) experiment with mitmproxy (see other comments, super useful, thanks!) - the web interface allows you to edit and replay queries.
clone: same strategy as above regarding understanding the protocol. I found it useful to add no-progress
to my query (so technically I implemented clone -q
) to reduce clutter in the response. As mentioned above, make sure not include ofs-delta
. Then except for fetch
which is new, the rest is just combining all the previous steps together:
- create directory, cd to it and run git init
- ls-remote
- fetch packfile
- unpack-objects
- checkout-empty