Git Objects

I find it difficult to understand or visualize what the various Git commands actually do without first understanding the various Git data structures.

There are three Git objects of interest: blob, tree, and commit. The Git objects are maintained in the .git folder. We do not have to know how they are exactly stored, but I find it immensely useful to know what these objects are.

Before reading on, it is important that you clearly understand what a hash is. There is no need to know how to compute a SHA-1 hash that is used by Git (as at 2021), but there is an absolute need to understand the uniqueness of the one-way function applied to any content.

Git file blob

A Git file blob is an object containing the contents of a file, and its matching hash. This can be best explained and illustrated by a real example.

Consider a plain text file containing the three characters "ABC". (This is a "simple" file in that it contains only characters in the ASCII range. It is thus a file containing exactly three bytes, without the complication of additional bytes required by an additional encoding method.) The Git hash of this file of three bytes is computed from the string: "blob 3\x00ABC". The "3" is the length of the file content, and \x00 (borrowed from the Javascript string syntax) refers to a byte of binary 0. Enter the above into any online SHA-1 generator and you will get the hash: 48b83b862ebc57bd3f7c34ed47262f4b402935af.

(Note: You will need a SHA-1 converter that can handle an input value of binary 0. The easiest way might be to use one that accepts hexadecimal encoding. The hexadecimal encoding for the string "blob 3\x00ABC" is 62 6c 6f 62 20 33 00 41 42 43.)

To prove that the above is true, in your Git project, create a text file containing only "ABC", exactly three characters without any new line characters. Save it, perform a git add to get it into the Staging Area. Then do a git ls-files -s. You will see the exact same SHA-1 hash 48b83b862ebc57bd3f7c34ed47262f4b402935af!

To verify the contents of file a blob in your repo using its hash, try: git cat-file blob 48b8 (first four digits of the hash will do unless it is ambiguous). You will see the content of the file with that blob hash.

Git tree

A Git tree is a listing of the names of all the files (blobs) and sub-directories (trees) in one specified directory.

100644 blob c0b9ad1d30878eecd4ce49e1db40ac59f9668536    README.md
100644 blob 4111d50ada6cc03ec6079f226c23efa3142c9c94    file1.txt
100644 tree 25b690689b298649c027af668c051282a96eed6c    src

The above shows a tree representing a folder containing two files: README.md and file1.txt, and one sub-directory, src. The hash of each of the file blob or tree entries are also shown. The first field of numbers is the file mode, similar to that of the Linux file system.

The content used to compute the Git hash of a tree is (I may be wrong) each line in the list separated by a binary 0, with the hash of each item encoded in binary, and leading zeroes of the file mode omitted. The exact formula used is not important. What is important to know is that a tree and its Git hash can be used to verify the authenticity of that tree.

Git commit

A commit object contains at least four items: a tree object, an author and the time stamp, a committer and the time stamp, and the commit message. The optional item is a parent commit object. If this is the very first commit, there is no parent. A commit can have more than one parent if it is merged from two branches. A sample commit object is:

tree 78b8abc3aff2efd45c411a5c209d7902d286494a
parent 56cd166cb91576e7d8698abfcabf04e868b841d6
author Author Name <author@gmail.com> 1628669777 +0000
committer Committer Name <committerg@gmail.com> 1628669777 +0000

Initial commit

The id of a commit is the hash of the commit information as shown in the above block of text. I have yet to find out the exact formula, but that is not important. What is important is that the Git hash will always be unique for every commit, and given a commit's details together with its Git hash, we can verify whether the commit is genuine.

To view the details of a commit in your Repo, you can use: git cat-file commit abcd where abcd is the first four digits of the commit hash.

Git Object Operations

With a clear understanding on the Git objects, it becomes easy to visualize what happens with the various Git operations.

git add - creates file blob objects, and maintains a list of them in the Staging Area. No tree objects are created yet, even for new files in newly created folders. File names are maintained as their full path, with all sub-directories separated by forward /.

git commit - creates an commit object, together with any new and updated tree objects necessary to house all affected files in the commit.

Comments

Popular posts from this blog

Things that go missing