Since its birth in 2005 has become massively popular especially in the open source world but many of us use it on our job posts also. It is a great VCS tool and has many advantages, but being easy to learn is just not one of them. Which can make us frustrated since we use it so often. In my opinion the only way to get comfortable with using git and maybe even start loving it is to learn about how it works internally. The reason why I think so was perfectly summarized in a statement given by in his lecture : git Edward Thomson Deep Dive Into Git The Git commands are just a leaky abstraction over the data storage. This is why no matter how many git commands or tips ‘n tricks you memorize or store in your git cheatsheet, without understanding of how git works you will remain confused with the strange ways of git because those git internals will every once in a while leak through the abstraction layer git’s (frontend) commands give you. under the hood You can do better So in this series we will cover git’s internals (we will not go into git’s source code don’t worry) and first thing on that list is git’s heart and soul — the . Understanding Git data model To start, we will initialise and empty git repository in our project directory: git init Git will inform us it has created a directory in our project’s directory so let’s take a quick peak at how it looks like: .git $ tree .git/ .git/├── HEAD├── config├── description├── hooks│ ├── applypatch-msg.sample│ ├── commit-msg.sample│ ├── post-update.sample│ ├── pre-applypatch.sample│ ├── pre-commit.sample│ ├── pre-push.sample│ ├── pre-rebase.sample│ ├── pre-receive.sample│ ├── prepare-commit-msg.sample│ └── update.sample├── info│ └── exclude├── objects│ ├── info│ └── pack└── refs├── heads└── tags 8 directories, 14 files Some of these files and directories may sound familiar to you (particularly ) but for now we will focus on the directory which is empty right now, but we will change that in a moment. HEAD .git/objects Let’s create an file index.php touch index.php give it some content <?phpecho "Hello World"; and a file README.md touch README.md and give it some content too: # DescriptionThis is my hello world project Now let’s stage and commit them: git add .git commit -m "Initial Commit" OK, nothing special here, adding and committing — we’ve all “been there, done that”. If we take a look again at our directory we can see that the directory has some subdirectories and files now: .git .git/objects ├── objects│ ├── 5d│ │ └── 92c127156d3d86b70ae41c73973434bf4bf341│ ├── a6│ │ └── dbf05551541dc86b7a49212b62cfe1e9bb14f2│ ├── cf│ │ └── 59e02c3d2a2413e2da9e535d3c116af1077906│ ├── f8│ │ └── 9e64bdfcc08a8b371ee76a74775cfe096655ce│ ├── info│ └── pack (Note: directories and files can/will have different names on you computer) We will get back to but for now notice that every directory name is two characters long. Git generates a 40-character checksum (SHA-1) hash for every object and the first two characters of that checksum are used as directory name and the other 38 as file (object) name. .git/objects The first kind of objects that git creates when we commit some file(s) are objects, in our case two of them, one for each file we committed: blob Blob objects associated with our index.php and README.md files They contain snapshots of our files (content of our files at the time of the commit) and have their checksum header. The next kind of objects git creates are objects. In our case there is only one and it contains a list of all files in our project with a pointer to the blob objects assigned to them (this is how git associates your files with their blob objects): tree Tree object pointing to blob objects And finally git creates a object that has a pointer to it’s tree object (along with some other information): commit Commit object points to it’s tree object If we look back at our directory things should look clearer now. .git/objects ├── objects│ ├── 5d│ │ └── 92c127156d3d86b70ae41c73973434bf4bf341│ ├── a6│ │ └── dbf05551541dc86b7a49212b62cfe1e9bb14f2│ ├── cf│ │ └── 59e02c3d2a2413e2da9e535d3c116af1077906│ ├── f8│ │ └── 9e64bdfcc08a8b371ee76a74775cfe096655ce│ ├── info│ └── pack With we can see our commit history: git log commit a6dbf05551541dc86b7a49212b62cfe1e9bb14f2Author: zspajich < >Date: Tue Jan 23 13:31:43 2018 +0100 zspajich@gmail.com Initial Commit And using the naming convention we mentioned earlier we can find our commit object in : .git/object ├── objects│ ├── a6│ │ └── dbf05551541dc86b7a49212b62cfe1e9bb14f2 To look at it’s content we can’t simply use command since these are not plain text files but git has a command we can use: cat cat-file git cat-file commit a6dbf05551541dc86b7a49212b62cfe1e9bb14f2 to get the content of our commit object: tree f89e64bdfcc08a8b371ee76a74775cfe096655ceauthor zspajich < > 1516710703 +0100committer zspajich < > 1516710703 +0100 zspajich@gmail.com zspajich@gmail.com Initial Commit Here we see the pointer to our commit’s tree object and to examine it’s content we use command: git ls-tree git ls-tree f89e64bdfcc08a8b371ee76a74775cfe096655ce and as expected it does contain a list of our files with pointers to their blob objects: 100644 blob cf59e02c3d2a2413e2da9e535d3c116af1077906 README.md100644 blob 5d92c127156d3d86b70ae41c73973434bf4bf341 index.php We can look at blob object representing (for example) with command: index.php cat-file git cat-file blob 5d92c127156d3d86b70ae41c73973434bf4bf341 and we see that it contains our file’s content index.php <?echo "Hello World!" So that is what happens when we create and commit some files. Now we’ll do another commit, this time let’s say we made some changes to our file (added some code magic) and commited those changes: index.php Git creates a new blob object for the file that has changed As we see, git has now created a new blob object with a new snapshot of . Since hasn’t changed, no new blob object for it is created, git will reuse the existing one instead (we’ll see in a second how). index.php README.md Now, when git creates a tree object, blob pointer assigned to is updated and blob pointer assigned to simply stays the same as in the previous commit’s tree. index.php README.md Pointer to index.php blob is updated and pointer to README.md blob stays the same And at the end, git creates a commit object with a pointer to it’s tree object Commit object points to it’s tree and also has a pointer to it’s parent commit object and also a pointer to it’s parent object (every commit except the first one has at least one parent). commit So now that we know how git handles file adding and editing, the only thing that remains is to see how it handles file deletion: Git deletes the entry for index.php in tree object It’s very simple — git deletes the file entry (file name with pointer to it’s blob object) from the tree object. In this case we deleted in our commit so there is no longer an entry in that commit’s tree object (in other words, our commit’s tree object no longer has a pointer to a blob object representing ). index.php index.php index.php There is just one more addition to this we presented— tree objects can be nested (they can point to other tree objects). You can think of it this way: every blob object represents a file and every tree object represents a directory, so if we have nested directories we will have nested tree objects. data model Let’s look at an example: Tree objects can point to other tree objects Here, our project would have one file and one app directory with two files ( and ). README.md app.php app_dev.php Git uses blob objects to recreate the content of our files at any given point in time (commit) and tree objects to reproduce our project’s folder structure. So there you have have it - git’s . It is in fact a simple data model and in post we’ll look at branching and how git’s data model makes branching very cheap and simple. data model next If you wish to dig deeper into git’s data model I would recommend this from Scott Chacon and also going through chapter from his book. lecture Git Internals Git Pro
Share Your Thoughts