Step for git-clone (implementing the git protocol)

Hi, everybody, I have created a doc on the steps to follow to complete the last stage of the codecrafters challenge on git → git clone.

I have only created the description on how things should work and a clearer way on the steps to follow, so you still have the challenge to implement these steps. Just have in mind that there are some instructions on how to implement some of the bits/bytes manipulation.

The doc is not completed and I’m planning in updating it time to time, with more information and better examples, also maybe adding a part in code so if you are stuck in any step is easier to debug.
I will be posting the things I add to the doc here.

Hope you find it usefull! :slight_smile:

Also, if you have any suggestions on what to improve, ideas on what to add or if I made any mistakes, let me know so add any improvements! Or feel free to create a PR to the repository: GitHub - i27ae15/git-protocol-doc


Thanks for sharing the detailed guide! :+1:

1 Like

Thanks for sharing!

I’ve also written something on how to break up the last stage in small incremental steps - not as detailed as yours, but at least it provides the structure I wish I had when starting. Initially posted as a comment on on the last stage; re-posting here for visibility.

In case it might help others, here’s the way I split things (numbers in parentheses are number of lines changed in my Python solution, to give an indication of size):

  1. checkout-empty <commit> (42)
  2. unpack-objects (undeltified) (56)
  3. unpack-objects (REF_DELTA) (57)
  4. ls-remote <url> HEAD (25)
  5. clone <url> <dir> (45)

Stating the obvious: you want some local testing for each step.

checkout-empty: like git checkout except it assumes the current directory is empty (except for .git of course), so it just write files and directories. This is a very natural continuation from the previous two steps where we created commit and tree objects, here we’re reading them.

unpack-objects (undeltified): like git unpack-objects but only for packfiles with no deltified objects. You can create such packfiles with git pack-objects --depth=0 for testing.

unpack-objects (REF_DELTA): extend the previous step with support for REF_DELTA objects. With my version of git (2.43.0), git pack-objects uses this; also, we can make sure the packfile sent by the server will not use OFS_DELTA simply by not advertising support for it (as mentioned in the Haskell post linked by another comment - super useful even for those of us who don’t speak Haskell indeed).

For development, a good first step is to add two very similar files (not too small) with git hash-object -w and create a packfile that contains only those two objects with git pack-objects. For example, if one of the files is a prefix of the other, then the packfile will have the larger file undeltified, and the smallest one using a single “copy from base object” instruction. If the two files are the same size with a single difference in the middle, instructions will be “copy, add, copy”.

ls-remote: like git ls-remote. I just assumed the server speaks v2 and skipped capabilities discovery. For this and the next step, to understand the protocol I used a mix of (1) check the resources and documentation (gitprotocol-* files) and (2) experiment with mitmproxy (see other comments, super useful, thanks!) - the web interface allows you to edit and replay queries.

clone: same strategy as above regarding understanding the protocol. I found it useful to add no-progress to my query (so technically I implemented clone -q) to reduce clutter in the response. As mentioned above, make sure not include ofs-delta. Then except for fetch which is new, the rest is just combining all the previous steps together:

  • create directory, cd to it and run git init
  • ls-remote
  • fetch packfile
  • unpack-objects
  • checkout-empty