Serving ruby gems, the paranoid way
As I wrote in a previous blog post, there are good reasons to be paranoid with Ruby gems: they may have been hacked and “enhanced” with malicious code. It would be great if we could check every gem that we want to install, including their dependencies. You may think “this is not practical at all”, and you are probably right. But still, I wanted to give this idea a try and learn about the challenges that people will face if they want to review their gems before installation.
Let’s consider a company whose business is about making web applications. The tech team is divided in two:
- a development team that writes the company software, leveraging Ruby gems
- a security team that focuses on security issues
The security team is in charge of reviewing all the gems needed to run the company applications. This policy could bring a lot of tension between the two teams, so I hope that members of both teams enjoy having coffee breaks together.
A check point
The security team wants to ensure that all the gems dependencies used by the company software are safe. So they set up a kind of check point: a process in which all gems needed by the development team will be reviewed and the unsafe ones filtered out.
Since the company does not trust the
rubygems.org source anymore,
the development team is not allowed to download gems directly from there.
If they do so, they have to use a sandbox environment, which could be a self-contained virtual machine.
Once successfully been reviewed, the gems will be available for both development and production environments. Over time, the development team will have access to more and more safe gems to work with.
The workflow to validate a new gem could look like this:
- development team thinks they need new gems that have not been reviewed yet
- the sandbox environment is used to experiment with those gems
- the exact names and versions of each gem to review are identified
- the gems and all their dependencies are reviewed by the security team
- if deemed safe, the gems are made available on the internal gem server
In other words, the security team acts as a middleman for rubygems.org.
The development team should be able to do its job the normal way,
using bundler and the
gem command-line too with minimal annoyances.
The built-in gem server
To serve Ruby gems,
the security team first considers using the
gem server command.
It comes with
rubygems itself so there’s nothing special to install.
gem server command serves all the gems installed on the machine;
so it’s very easy to add a new gem and its dependencies with
The gem server runs on
webrick and can only serve one client at a time.
It does no caching and is not compatible with Rack.
So it will not run under more powerful application servers such as puma
Fortunately, most rubygems clients cache the data on their side.
Let’s grab a debian-compatible server and try to run the gem server under its own user account:
$ sudo adduser --home /var/lib/gem-server gem-server $ sudo su - gem-server $ export GEM_HOME=$HOME/gems $ export GEM_PATH=$HOME/gems $ gem fetch json -v 1.8.0 # security team checks json 1.8.0 gem $ gem install json -v 1.8.0 $ gem server Server started at http://0.0.0.0:8808
Note that I had to review and install a first gem before running the server, otherwise it would have complained about missing directories.
From here, many things could be improved:
- add SSL-enabled proxy with nginx; it’s all about security, after all
- run the gem server from user’s crontab, init scripts or supervisor tool
- setup environment variables in the user’s profile
Even without these steps, the gem server works out of the box.
Each member of the development team must setup their rubygems clients so that gems are only fetched from the company’s private gem server. It can be done using the gem sources command.
$ gem sources --add http://checkpoint:8808 http://checkpoint:8808 added to sources $ gem sources --remove http://rubygems.org/ http://rubygems.org/ removed from sources $ gem sources --list http://checkpoint:8808
All these settings are stored in the
.gemrc configuration file,
making it very easy to share them with team mates.
Everything seems OK. Remember that most rubygems clients maintain a cache of downloaded gems, and this cache may already contain plenty of code that we do not trust. So it’s best to move to a new setup.
# store new gems into user's HOME directory $ export GEM_HOME=$HOME/gems # fetch new gems from there only $ export GEM_PATH=$HOME/gems # cleanup $ rm -rf $GEM_HOME $ mkdir $GEM_HOME
Now we are ready to go: no local gem, and the only remote gem is a trusted one.
$ gem list --local *** LOCAL GEMS *** $ gem list --remote *** REMOTE GEMS *** json (1.8.0)
Switching to the user account running the gem server, we can check that our rubygem client has been talking to our internal gem server, as expected:
$ gem server Server started at http://0.0.0.0:8808 localhost - - [02/Sep/2013:09:51:40 CEST] "GET /latest_specs.4.8.gz HTTP/1.1" 200 75 - -> /latest_specs.4.8.gz localhost - - [02/Sep/2013:09:58:11 CEST] "GET /latest_specs.4.8.gz HTTP/1.1" 200 75 - -> /latest_specs.4.8.gz
Add trusted gems
We are now done with the setup.
Now, the development team wants to build a new web application using sinatra version 1.4.3.
The security team should fetch, unpack and review the gem.
If deemed safe, the gem can then be installed and shared using
# fetch the gem archive $ gem fetch sinatra -v 1.4.3 Fetching: sinatra-1.4.3.gem (100%) Downloaded sinatra-1.4.3 # extract it $ gem unpack sinatra-1.4.3.gem Unpacked gem: '/var/lib/gem-server/sinatra-1.4.3' # make extensive review $ vim sinatra-1.4.3/sinatra.gemspec sinatra-1.4.3/Rakefile # install to share with others... but wait! $ gem install sinatra -v 1.4.3^C
But something is wrong:
gem install will install sinatra along with its dependencies,
yet we have checked none of those!
gem fetch command does fetch the dependencies,
so I have written a small rubygems plugin called
gem deep_fetch will ignore the dependencies that are already in your cache.
gem deep_fetch, the security team goes back to hard work.
They can ignore the packages that are already in the cache
as they have already been checked.
They just want to fetch and review the missing ones.
# fetch the gem and its missing dependencies $ gem deep_fetch sinatra --version 1.4.3 Fetching: sinatra-1.4.3.gem (100%) Downloaded sinatra-1.4.3 Fetching: rack-1.5.2.gem (100%) Downloaded rack-1.5.2 Fetching: rack-protection-1.5.0.gem (100%) Downloaded rack-protection-1.5.0 # unpack everything $ gem unpack *gem Unpacked gem: '/home/fabien/tmp/rack-1.5.2' Unpacked gem: '/home/fabien/tmp/rack-protection-1.5.0' Unpacked gem: '/home/fabien/tmp/sinatra-1.4.3' # review everything $ vim */*gemspec # install everything $ gem install sinatra -v 1.4.3 Fetching: rack-1.5.2.gem (100%) Fetching: tilt-1.4.1.gem (100%) Fetching: rack-protection-1.5.0.gem (100%) Successfully installed rack-1.5.2 Successfully installed tilt-1.4.1 Successfully installed rack-protection-1.5.0 Successfully installed sinatra-1.4.3 4 gems installed # share $ gem server Server started at http://0.0.0.0:8808
tilt are all small gems,
so the security team was able to review all of them within a reasonable time.
That would be different for complex gems like
Playing with bundler
So far, we only have played with the rubygems client,
however the development team is more likely to use
They have updated the
Gemfile for this new sinatra-based web application they are working on:
source "http://checkpoint:8088" gem "sinatra"
Bundler is very good at caching,
so to avoid cache effects every developer was asked to clean up his/her
.bundler directory beforehand.
$ bundle config path ~/.bundler Settings for `path` in order of priority. The top value will be used Set for the current user (/home/fabien/.bundle/config): "/home/fabien/.bundler" $ rm -rf /home/fabien/.bundler
No we’re clean. Let’s run
bundle twice to do some benchmarking.
$ time bundle Fetching gem metadata from http://checkpoint:8808/. Fetching full source index from http://checkpoint:8808/ Installing rack (1.5.2) Installing rack-protection (1.5.0) Installing tilt (1.4.1) Installing sinatra (1.4.3) Using bundler (1.1.5) Your bundle is complete! It was installed into /home/fabien/.bundler real 0m1.000s user 0m0.688s sys 0m0.084s $ time bundle ... real 0m0.481s user 0m0.448s sys 0m0.028s
The logs shows up some interesting things on the server side.
$ gem server Server started at http://0.0.0.0:8808 localhost - - [02/Sep/2013:11:00:48 CEST] "GET /api/v1/dependencies?gems=rack,rack-protection,tilt,sinatra HTTP/1.1" 404 289 - -> /api/v1/dependencies?gems=rack,rack-protection,tilt,sinatra localhost - - [02/Sep/2013:11:00:48 CEST] "GET /specs.4.8.gz HTTP/1.1" 200 157 - -> /specs.4.8.gz localhost - - [02/Sep/2013:11:00:48 CEST] "GET /prerelease_specs.4.8.gz HTTP/1.1" 404 293 - -> /prerelease_specs.4.8.gz localhost - - [02/Sep/2013:11:00:48 CEST] "GET /quick/Marshal.4.8/rack-1.5.2.gemspec.rz HTTP/1.1" 200 554 - -> /quick/Marshal.4.8/rack-1.5.2.gemspec.rz localhost - - [02/Sep/2013:11:00:48 CEST] "GET /quick/Marshal.4.8/rack-protection-1.5.0.gemspec.rz HTTP/1.1" 200 764 - -> /quick/Marshal.4.8/rack-protection-1.5.0.gemspec.rz localhost - - [02/Sep/2013:11:00:48 CEST] "GET /quick/Marshal.4.8/sinatra-1.4.3.gemspec.rz HTTP/1.1" 200 493 - -> /quick/Marshal.4.8/sinatra-1.4.3.gemspec.rz localhost - - [02/Sep/2013:11:00:48 CEST] "GET /quick/Marshal.4.8/tilt-1.4.1.gemspec.rz HTTP/1.1" 200 615 - -> /quick/Marshal.4.8/tilt-1.4.1.gemspec.rz localhost - - [02/Sep/2013:11:00:49 CEST] "GET /gems/rack-1.5.2.gem HTTP/1.1" 200 216576 - -> /gems/rack-1.5.2.gem localhost - - [02/Sep/2013:11:00:49 CEST] "GET /gems/rack-protection-1.5.0.gem HTTP/1.1" 200 15872 - -> /gems/rack-protection-1.5.0.gem localhost - - [02/Sep/2013:11:00:49 CEST] "GET /gems/tilt-1.4.1.gem HTTP/1.1" 200 42496 - -> /gems/tilt-1.4.1.gem localhost - - [02/Sep/2013:11:00:49 CEST] "GET /gems/sinatra-1.4.3.gem HTTP/1.1" 200 333312 - -> /gems/sinatra-1.4.3.gem
Bundler first tries to query the dependency API but it is unsuccessful
since the feature is not available in the standard
As a consequence,
bundler falls back to retrieving the full index to resolve the gem dependencies on the client side.
By the way, the Rubygems 2 client also knows about this new dependency API, but Rubygems 1.8 does not.
We also notice that the server was not queried on our second
That means that bundler is smart enough to cache the dependency resolution.
No network connection is required when nothing has changed in the bundle. Very nice.
gem server can work with bundler, but it will quickly hits his limits as the security team adds more gem in the trusted gems database.
Do you remember how slow bundler felt previous to 1.1 version? You got it.
Better gem serving with geminabox
geminabox makes it very easy to serve your own gems. It can be installed as a gem and has two main features:
- a sinatra-based web application to host your gems
- a plugin to add a new command to the
Once geminabox is installed, Rubygems is enhanced with a new
gem inabox command.
*.gem arguments and behaves like the
gem push command
(that publishes to the official rubygems.org repository).
The geminabox gem server is more efficient than
because it implements the dependency API.
It is also compatible with Rack so it’s possible to run it using a modern web server
(which can serve a lot faster than webrick).
Rack compatibility makes it very easy to add SSL protection and HTTP authentication using middleware.
The security team is running the
geminabox server under a dedicated user, using puma as application server:
$ sudo adduser --home /var/lib/geminabox geminabox $ su geminabox $ mkdir /var/lib/geminabox/data $ puma --port 8808 config.ru Puma starting in single mode... * Version 2.5.1, codename: Astronaut Shoelaces * Min threads: 0, max threads: 16 * Environment: development * Listening on tcp://0.0.0.0:8808 Use Ctrl-C to stop
Here is a basic Rack config for the latest stable version:
require "rubygems" require "rubygems/user_interaction" require "geminabox" Geminabox.data = "/var/lib/geminabox/data" run Geminabox
geminabox rather than the standard
gem server won’t break anything on the client side, however it may feel faster with bundler and rubygems 2 clients
due to its support of the bundler dependency API.
The performance gain is noticeable, even if sometimes difficult to measure on small gem sets. Starting from an empty bundler cache, using geminabox on the server side will decrease our installation time from 1000 ms to almost 800 ms.
$ time bundle Fetching gem metadata from http://checkpoint:8808/.. ... real 0m0.773s user 0m0.684s sys 0m0.064s
The security team now has to publish the approved gems using
gem inabox followed by the package filename.
They cannot install dependencies automatically using
so they could really use some kind of “deep fetch”,
like in our
$ gem deep_fetch sinatra --version 1.4.3 ... # review unpack and review new gems $ gem inabox --host http://checkpoint:8808/ *gem ... Gem tilt-1.4.1.gem received and indexed.
Geminabox also provides an administration web interface, so that the security team can unpublish the gems they don’t need or trust anymore.
The development team also gains a server to publish their own private gems. After all, this is what geminabox has been designed for.
How about a proxy?
I’ve experimented with
gem server and
geminabox to implement our check point.
Along the way, it gave me a better understanding of
the relationship between a gem server and its clients (i.e. rubygems and bundler)
and was good to remind about the dependency API introduced with bundler 1.1.
Using similar techniques, it’s also possible to set up a proxy for rubygems.org, and even work off-line.
The gem mirror is a rubygems plugin that aims to do so.
But so far there is no open source project to setup an intelligent proxy in front of rubygems.org,
that could anticipate the upcoming needs of the clients.
geminabox may evolve to become such a cache, like mentioned in a recent forum discussion, but this is just a guess.
We’re still missing something as smart as