Saturday, 28 February 2015

Git Repository Corruption - Part1

Git Repo Corruption recovery

The biggest fear of any administrator is the Data corruption. As a Git administrator we do take regular backup, We keep our system HA, We even do disk replication. In short we try every thing to keep data safe and secure.
But still there are chances for the git repository corruption and you have to deal with it in real world.

Here are few repo corruption scenarios and solutions.

Note-> Please make sure you backup everything before touching the repo. There are instruction to fix some of the error. It is always suggested to backup your data before trying to perform any action.

First let's see how to identify if your repo is in which state.

login to git server and change directory to git repo example -
cd /repository/repo.git
Run following command to check the repository
git fsck --no-dangling
Above command will check the health of the repository.
If above command  doesn't throw any error then your repository is fine If it throw then here you go for fixing repositories

General fix for stale references 

git reflog expire --stale-fix --expire-unreachable=now –all 
Reflog is to mange the and references information.
This command will safely remove the reflog entries which are stale and not reachable.
Above command will fix the problem if its related to reflog.
Please use the above command for most of the below error before going ahead.

  • Object is missing or corrupt(loose object)

$git fsck --no-danglingChecking object directories: 100% (256/256), done.
fatal: loose object 6929a45368536cc02d6bd678900355384ab9ff77 (stored in
./objects/69/29a45368536cc02d6bd678900355384ab9ff77) is corrupt

Generic error may be

fatal: loose object <something> (stored in ./objects/<something>) is corrupt
This is just a single corrupt object which can be recovered from any local copy. First find the donor repository (can be either any local or forked copy) and get the object like following

git cat-file -t 6929a45368536cc02d6bd678900355384ab9ff77

Note -> Please use your object instead of above 6929a4.....77

If the repository doesn't have the object in question, you would see the following:

fatal: Not a valid object name 6929a45368536cc02d6bd678900355384ab9ff77

But if it does contain the object, the above command will report what kind of object it is (tag/commit/tree/blob).

Next, export an individual object from the donor like this:

git cat-file commit 6929a45368536cc02d6bd678900355384ab9ff77 > /tmp/object-data-recover.datYou may need to replace commit in the above command with whatever type of object it is, as found in your previous command.

Copy this file file to your git server where you are fixing the issue. lets say you are copying it to /tmp, change directory to repository location

git hash-object -t commit -w /tmp/object-data-recover.dat
Again, you may need to swap commit for whatever the object type was.

After doing that, the object should be restored. Please run sudo -u git git fsck --no-dangling again from the repository directory to make sure the corruption has cleared.

You may keep repeating the process if there is any other missing object.

  • Unborn Branch / no default reference

If you run the fsck on your git repository and you get following error.

$git fsck --no-dangling notice: HEAD points to an unborn branch [BRANCH]
Checking object directories: 100% (256/256), done.
notice: No default references

This means HEAD file is pointing to a branch that doesn't exist within the repository there is no refs/head/[BRANCH] file in the repository nor a reference to refs/head/[BRANCH] in the packed-refs file.

To resolve this issue you have to look at other copy of the repo and see where HEAD should point,

$git update-ref refs/heads/[BRANCH] [SHA]

If Branch does not exist, You can update the HEAD reference to point to new valid branch.

$git symbolic-ref HEAD refs/heads/[NEW_BRANCH]


There could be one more reason for this error.

This could be empty repository where, Nothing is creates so far (no branch, no code .....). This is why git says that no default reference.

Usually you will see this message for master branch like below

notice: HEAD points to an unborn branch (master)

In this case no need to worry, just push some code and this error should vanish.

  • Duplicate file entries

$git fsck --no-dangling 
Checking object directories: 100% (256/256), done. 
error in tree 30546108324724aedfaa0eee2c07c13457a6cbab: contains duplicate
file entries 
Checking objects: 100% (33697/33697), done. 

This error is means, that the tree referenced in the SHA has identical file entries. This violates the Git object model and shouldn't happen.
This may be due to user error or bug in the git client which could have introduced this scenario. 

If developers are able to work without any issue and push, pull, other activities are not throwing any error, then you can ignore it. It may be possible that you don't need it and it has been removed from the reference and don't need.

If there is any issue working with the repository, You will have to remove duplicate entry and rewrite the history.
I am covering these steps in my other blog to remove the file/entry from the entire history.

  • Ref/branch does not point to a valid object

$git fsck --no-dangling 
Checking object directories: 100% (256/256), done. 
Checking objects: 100% (4924/4924), done. 
error: refs/heads/master does not point to a valid object! 

$git reflog expire --stale-fix --expire-unreachable=now --all 
error: refs/heads/master does not point to a valid object! 
error: refs/heads/master does not point to a valid object! 

generic error could be "error: refs/heads/[BRANCH]"

If you try to push changes to the repo, You  may see following error

$git push origin master 
Counting objects: 168, done. 
Delta compression using up to 2 threads. 
Compressing objects: 100% (55/55), done. 
Writing objects: 100% (131/131), 24.29 KiB, done. 
Total 131 (delta 55), reused 106 (delta 32) 
remote: error: failed to lock refs/heads/master 
To https://<git_repo_URL>.git 
! [remote rejected] master -> master (failed to lock) 
error: failed to push some refs to '

Note-> I am referring to branch master, Your error could  be for any other branch  exist in your repo.

References points to a commit object. If you see that error message, it either means the commit object is missing or it is corrupted

Solutions 1.
go to your git repo on server and run the following
git show-ref refs/heads/developThis will give you the SHA of the missing or corrupted object.
Once you get the SHA then you can follow the steps to recover the object mentioned in the step Object is missing or corrupt.

Solution 2.

If solution 1 does not help and you are not able to the the SHA from show-ref then you can try to get in some other way.

Since this is the case, you will need to look at the reference files manually. You should be able to find them in the refs directory in your repository directory. For example in the case of repository "repo", you should normally be able to see the SHA for the commit that the refs/heads/master reference is pointing to using this command.

cat /repository/repo.git/refs/heads/masterThis will give you the SHA for the object.
Note however, that the above command will not work if the references have been previously packed. If the above command fails to show you the reference, then you will need to use the following command instead.

grep refs/heads/master /repository/repo.git/packed-refsOnce you get the SHA from about commads you can then you can follow the steps to recover the object mentioned in the step Object is missing or corrupt.

Solution 3.

There is possibility that you may not get missing object from any of your backed up or old repo, because that missing object is the latest one after the backup (old copy), In that case you will have to find the SHA values for the previous values for the [BRANCH] reference, Here is the method describe how can you get the previous reference.

Take a latest good copy of the repo to recover the SHA. Your repository needs to be in bare form. This means you should be able to see some directories like "info objects packed-refs refs" instead of the code.

If your repository is not in bare form, Use following command to get the bare repository

git clone --bare <your backup local repo> <new location where you want to clone>

Ex -> git clone --base /opt/repository/repo.git /tmp/repo.git

If your copy is in form of bare repository (or you get it from above) Now, run following steps to get the SHA

cat repo.git/refs/heads/masteror

grep refs/heads/master repo.git/packed-refsAbove will give you the SHA.
Once you find a commit object that exists, you can point the "master" branch to it using the following command.

Verify if this SHA exist in server repo by running following command

git cat-file -t [SHA]

If you find it is available then use following to point it to master in your server

git update-ref refs/heads/master [SHA]