LogoPhly, boy, phly
the weblog and site of Matthew Weier O'Phinney

Wednesday, March 24. 2010

GPG-signing Git Commits

We're working on migrating Zend Framework to Git. One issue we're trying to deal with is enforcing that commits come from CLA signees.

One possibility presented to us was the possibility of utilizing GPG signing of commit messages. Unfortunately, I was able to find little to no information on the 'net about how this might be done, so I started to experiment with some solutions.

The approach I chose utilizes git hooks, specifically the commit-msg hook client-side, and the pre-receive hook server-side.

Client-side commit-msg hook

The commit-msg hook receives a single argument, the path to the temporary file containing the commit message. This allows you to inspect it or modify it prior to completing the commit. Like all git hooks, a non-zero exit status will abort the commit.

My commit-msg hook looks like the following:


#!/bin/sh
echo -n "GPG Signing message... ";
PASSPHRASE=$(git config --get hooks.gpg.passphrase)
if [ "" = "$PASSPHRASE" ];then
    echo "no passphrase found! Set it with git config --add hooks.gpg.passphrase <passphrase>"
    exit 1
fi
gpg --clearsign --yes --passphrase $PASSPHRASE -o $1.asc $1
mv $1.asc $1
echo "[DONE]"
 

This hook requires that you first add your GPG key's passphrase to your local git configuration, which can be done as follows:


% git config --add hooks.gpg.passphrase "mySecret"
 

Once this hook is in place, all commit messages are then clear-signed, leading to commit logs that look like the following:


commit f921f0defb18f8a5218d5c3346693dbb4179920e
Author: Matthew Weier O'Phinney <somebody@example.com>
Date:   Tue Mar 23 17:18:35 2010 -0400

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1
   
    how now, brown cow
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)
   
    iEYEARECAAYFAkupMCsACgkQtUV5aSPtKdqERQCeN5taRATpB4/XJZiP9Vs5FVNY
    PcoAn0OZbIIcn7nC01yxp9tY7HbxVVFu
    =C/Ju
    -----END PGP SIGNATURE-----
 

Server-side pre-receive hook

The pre-receive hook is a lot less straight-forward. This hook receives input via STDIN. Each line consists of three items, separated by a single space:

[previous commit's sha1] [new commit's sha1] [refspec]

Typically, only the new sha1 is of much use to us. Internally, git is actually keeping track of the new commit, even though it has not technically been accepted into the repository. This allows us to use tools such as git show to get information on the commit and act on that information.

What I needed to do was inspect the commit message for a GPG-signed message; if none was found, reject the commit outright, but if one was present, validate it against my keyring, and abort if the signed message is invalid.

I originally started by using git show --pretty="format:%b" [sha1] However, I discovered that git does something... odd... to commit messages. The first 50 characters or so are considered the commit's "subject" -- and any newlines found in the subject are silently stripped. This meant that I was getting, for my purposes, a truncated message that would never validate (as the GPG signature header was getting stripped); even including the subject in the format did not work, since the newlines within it were missing. The only way I found to get the full commit message was to use git show --pretty=raw [sha1]. This, however, gives me also the commit headers as well as the diff -- which means I have to parse the response.

What follows is a PHP implementation I did that does exactly that: grabs the full message and redirects it to a temporary file, parses that file for the commit message, and then acts on it.


#!/usr/bin/php
<?php
echo "Checking for GPG signature... ";
$fh     = fopen('php://stdin', 'r');
$tmpdir = sys_get_temp_dir();
while (!feof($fh)) {
    $line = fgets($fh);
    list($old, $new, $ref) = explode(' ', $line);

    // Create a tmp file with the commit log
    $logTmp   = tempnam($tmpdir, 'LOG_');
    $body     = shell_exec('git show --pretty=raw ' . $new . ' > ' . $logTmp);

    $msgTmp   = tempnam($tmpdir, 'MESSAGE_');

    // Scan the commit log for a commit message
    $log = fopen($logTmp, 'r');
    $msg = fopen($msgTmp, 'a');
    $signatureDetected = false;
    while (!feof($log)) {
        $line = fgets($log);
        if (preg_match('/^(commit(ter)?|tree|parent|author)\s/', $line)) {
            // Skip the commit log headers
            continue;
        }
        if (preg_match('/^diff\s/', $line)) {
            // Stop scanning when we reach the diff
            break;
        }
        if (preg_match('/^\s+-+BEGIN [A-Z]+ SIGNED MESSAGE/', $line)) {
            // We have a signed message, so start appending it
            // to a separate tmp file
            $signatureDetected = true;
            $line = preg_replace('/^\s+/', '', $line);
            fwrite($msg, $line);
            continue;
        }
        if ($signatureDetected) {
            // If we have detected a signed message, continue appending lines to
            // it. Commit message lines are indented, so strip indentation.
            $line = preg_replace('/^\s+/', '', $line);
            if ('' === $line) {
                $line = "\n";
            }
            fwrite($msg, $line);
        }
    }
    fclose($log);
    fclose($msg);

    if (!signatureDetected) {
        // No signed message detected; report and abort
        unlink($logTmp);
        unlink($msgTmp);
        echo "no GPG signature detected; commit aborted\n";
        exit(1);
    }

    $verification = shell_exec('gpg --verify ' . $msgTmp . ' 2>&1');
    if (!preg_match('/Good signature/s', $verification)) {
        // Failed to verify signed message; report and abort
        unlink($logTmp);
        unlink($msgTmp);
        echo "invalid GPG signature; commit aborted\n";
        exit(1);
    }

    unlink($logTmp);
    unlink($msgTmp);
}
echo "verified!\n";
exit(0);
 

There are likely more elegant ways to accomplish this, including solutions in other languages. However, it works quite well.

Conclusions

Git hooks are quite powerful, and delving into them has given me confidence that I can create some nice automation for the ZF git repository when we are ready to open it to the public.

That said, I don't know if we'll actually use commit signing such as this, as it has a few drawbacks:

  • The commit signing is not really cross-platform. This can likely be remedied, but it would require that people on different operating systems and using different tools (such as EGit, TortoiseGit, etc) develop and provide signing mechanisms for the client-side.
  • It introduces complexity for those developing patches. If developers begin without having the commit-msg hook in place, they then have to create a new branch and a squashed commit afterwards in order to ensure the final patches can go into the canonical repository.
  • The two reasons above kind of defeat the purpose of moving to a Distributed VCS in the first place -- which is to simplify development and make it more democratic.

Regardless of whether or not we decide to use this technique, when researching the issue, I saw plenty of posts from people wanting to implement commit signing, but not sure how to accomplish it. Perhaps this post will serve as a starting point for many.

Posted by Matthew Weier O'Phinney in Linux, PHP at 12:26 | Comments (15) | Trackbacks (0)
Defined tags for this entry: git, linux, php
Related entries by tags:
Autoloading Benchmarks
Applying FilterIterator to Directory Iteration
Running mod_php and FastCGI side-by-side
Creating Zend_Tool Providers
State of Zend Framework 2.0

Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)

That's exactly, what i was looking for!
Thanks for sharing!
#1 Dennis Winter (Link) on 2010-03-24 12:40 (Reply)
It seems the commit messages are pretty ugly... However the only other solution I can think of is pulling only from people with CLAs. The code itself can propagate between different repositories, but it's rarer. The development process now is: if you have commit access, commit to the Svn repository; otherwise, someone with commit access will commit for you. With Git could be: commit to your public repository on github or where do you want; ask someone with access to the official repo to pull from you; he checks you are in the list, and execute the pull.
#2 Giorgio Sironi (Link) on 2010-03-24 15:10 (Reply)
I was looking for a more automated solution, so that contributors could simply drop in the hook, and then when they push the code to the canonical repository, they could then be notified if specific changesets came from unknown sources.

The more I look at it, though, the more I don't like it -- and I state a number of the reasons against it in the post.
#2.1 Matthew Weier O'Phinney (Link) on 2010-03-24 15:18 (Reply)
As this only really verifies the commit message, and not the commit itself, couldn't that be exploited by someone without credentials who copies an existing signed message?

Would it possible to get the commit revision hash and sign that along with the message? Perhaps by a post-commit hook that signs the last commit once the hash is known?

I'm not keen on storing passphrases in plaintext configs, but I guess the commit hook could also work with interactive prompts for the passphrase.
#3 Andy Thompson (Link) on 2010-03-24 15:14 (Reply)
Or alternatively, a post-commit hook that automatically creates a tag using git's tag signing ability.
#3.1 Andy Thompson (Link) on 2010-03-24 15:17 (Reply)
I'm not terribly keen on getting a tag per commit, to be honest. ;-) This was one possibility somebody raised, and it was shot down for the same reason I state. Tags become meaningless if we do this.
#3.1.1 Matthew Weier O'Phinney (Link) on 2010-03-24 15:19 (Reply)
True, it could be spoofed - but so can an author's name and email (and that's way easier, and how most manual checks are done).

The signing could definitely be moved to post-commit so that the sha1 of the commit is known -- just requires a bit more work to implement (similar to what I did with the post-receive hook). As I noted in the post -- I know this is imperfect. :-)

As for the passphrase issue -- I'm not quite sure how to get around it. My original solution didn't include it, and it wasn't until I'd had a few unsigned messages that I realized that my GPG session had expired, and that gpg was not prompting me for the passphrase. Potentially, you might be able to have the hook script prompt for the passphrase, and then pass the captured value to gpg, but I'd have to experiment to see if that actually works.
#3.2 Matthew Weier O'Phinney (Link) on 2010-03-24 15:24 (Reply)
Are you planning on using GitHub exclusively for the "blessed" repo? If that, you could add users who've signed the CLA to the collaborators of that repository, which should give them push-access to the repositories as well.

Or you could go old-school and setup a server and do ssh-auth, just like you have added users manually (i presume) to get SVN write access?

For a more elaborate system, it could be quite easily be setup with something like gitolite (http://progit.org/book/ch4-8.html, http://github.com/sitaramc/gitolite), which even makes way for access/deny for certain branches and even paths within the repository.

Not entirely sure what legal stuff you'd need covered regarding the CLA, but it seems you should be able to setup a system without much effort, just as you had with SVN (again, i have no clue how you've managed it until now..)
#4 David Reuss (Link) on 2010-03-24 15:41 (Reply)
We're going to be using gitosis on our own servers, which, as you point out, lets us push SSH keys for committers to the server.

That's not the problem. That stuff is easy. It's pull requests that introduce the issue we're trying to deal with.

Imagine this situation:

* User "Alice" clones the canonical repo so they can work on a bugfix. They branch locally, and then push their local branch to a branch on a public repository somewhere.
* User "Alice" does not have direct commit access to the canonical repository, so they contact a committer, "Bob". "Bob" adds a remote in his working copy pointing to Alice's remote; after review of the changes, Bob merges the branch to their development branch.
* Later, Bob pushes his development branch to the canonical repository.

The question that arises is: how do we know that Alice has signed a CLA? How does Bob know that Alice has signed a CLA?

Signed commits help answer this in that they can help verify identity (though, as noted in another comment, we'd have to do the commit signing slightly differently to ensure they aren't spoofed). If the author doesn't sign the commit, or signs with a key that's not in our keyring, we reject the commit.

That said, I don't like the approach, as it introduces new barriers to committing (users need to have the hook in place on any commit they do; it introduces another tool developers must be familiar with; etc.).

I'm looking now at some tools we might use to help committers validate identity prior to accepting pull requests, and also looking into using the "--signoff" flag as a rudimentary measure for establishing trust -- this would make the committer responsible for verifying the source, but also allow them to get verbal and/or written agreement from somebody to allow pushing changes upstream (which is sufficient in most cases). It also simplifies the process, as no new tools are necessary.
#4.1 Matthew Weier O'Phinney (Link) on 2010-03-24 15:58 (Reply)
Instead of having your on git repo, why not use github, that make it easy for others to work.

I don't know much about need for CLA. Other open source projects like Linux need developers sign CLA ?. Paper works like this make it difficult for some one to contribute to open source projects.
#5 Santhosh on 2010-03-25 04:47 (Reply)
The good thing with git is that a fork is identical to the original. They can have a privately hosted one, and a read-only GitHub project for GitHub users to fork.

Changes to any fork can then still be pulled directly to their private one.

No doubt they're already planning that.
#5.1 Andy Thompson (Link) on 2010-03-25 04:59 (Reply)
Precisely. I will be hosting a remote on github, and have my working repository point remotes at both github and the canonical host.

@Santosh: hosting ourselves allows us to define our own hooks without needing to worry about the network latency of web hooks as used by github. Additionally, it makes it easier for us to provide tooling integration on our own site (such as a repository browser, integration with the issue tracker, etc.).
#5.1.1 Matthew Weier O'Phinney (Link) on 2010-03-25 06:52 (Reply)
The CLA is an important guarantee to our end users, and also helps protect the integrity of the project. In an ideal world, it would not be necessary, but as long as there are intellectual property laws on file, it makes sense. The "paperwork" involved is simply reading through the CLA, signing it, and sending it in -- which can be done by scanning it and emailing it, faxing, or via postal service.
#5.2 Matthew Weier O'Phinney (Link) on 2010-03-25 06:54 (Reply)
Great news, DVCS is much better but why not give HG shot! Thanks for the great effort!
#6 Mina R Waheeb on 2010-03-25 06:52 (Reply)
Mainly because not many in our community have expressed an interest in it, but we have had many, many requests to switch to git. The tooling support for git has matured greatly in the past year, and the developer community, and particularly the PHP community, has become quite familiar with git in that time frame as well.

We will be providing a read-only svn mirror of our repository, and I fully expect community contributors will likely do so with other DVCS systems as well.
#6.1 Matthew Weier O'Phinney (Link) on 2010-03-25 06:57 (Reply)

Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

 
 
  • Home
  • Resume
  • Blog
  • Phly PEAR Channel
  • Twitter
  • Contact Me
  • About this site

ZCE

Zend Education Advisory Board Member

Add to Technorati Favorites

Calendar

Back September '10
Mon Tue Wed Thu Fri Sat Sun
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Quicksearch

Links

  • PHLY - PHp LibrarY
  • Planet PHP
  • Zend Framework, where I'm project lead
  • Sebastian Bergmann
  • Cal Evans
  • Shahar Evron
  • Paul M. Jones
  • Bill Karwin
  • Mike Naberezny
  • Fabien Potencier
  • Ben Ramsey
  • Derick Rethans
  • Ralph Schindler
  • Marco Tabini

Archives

September 2010
August 2010
July 2010
Recent...
Older...

Categories

XML Linux
XML Personal
XML Aikido
XML Family
XML Programming
XML Dojo
XML Perl
XML PHP

All categories

Syndicate This Blog

XML RSS 0.91 feed
XML RSS 1.0 feed
XML RSS 2.0 feed
ATOM/XML ATOM 0.3 feed
ATOM/XML ATOM 1.0 feed
XML RSS 2.0 Comments

Show tagged entries

xml apache
xml best practices
xml books
xml conferences
xml cw09
xml decorators
xml dojo
xml dpc08
xml file_fortune
xml git
xml linux
xml mvc
xml oop
xml pear
xml perl
xml personal
xml php
xml phpworks08
xml programming
xml rest
xml ubuntu
xml vim
xml webinar
xml zendcon
xml zendcon08
xml zendcon09
xml zend framework
© 2004 - present, Matthew Weier O'Phinney
matthew-web <at> weierophinney.net