Yum repository snapshots made easy with pakrat

A while ago I wrote a hack bash script that would use curl to download RPM artifacts from a remote source. I used it to create my own local YUM repositories in an interesting way.

Normally a YUM repository is just two things: some place that contains a packages repository, and some other place that contains the YUM metadata associated with it. That's fine if the client pointing at the repository is a single, individual unit. It gets a little wonky though, when you point say, 100 servers at the same repo. You never know at what point in time the repo will be updated by its upstream maintainer. You might also have some mission-critical systems that you'd want to park on a known-good snapshot of a YUM repository until you've had time to test changes coming in from upstream.

How do you take a snapshot of a YUM repository at any point in time? At first it sounds easy. "I'll just use reposync and save the resulting dir.". However, the data set will grow rapidly because you are storing all packages each time you make a new snapshot. "Well then, I'll just use reposync in a common directory and make symlinks back from some versioned directory to the common package location." You're getting closer. However, the resulting YUM metadata will not accurately represent what the repository looked like when the snapshot was taken - it will instead represent one huge repository containing everything since you started mirroring the YUM repository.

I spent a few weekends coming up with a generic solution to this problem. pakrat is a tool for mirroring and optionally versioning YUM repositories. The trick to it is using the YUM metadata as the versioning. It will store all downloaded packages for a particular repository in its own common packages directory, and during each run it will create a new set of YUM metadata for only the packages which were present in the repository at the time the snapshot was taken.

I've recently released pakrat under an MIT license. Pakrat is available on PyPI, so you can check it out using easy_install pakrat.


Pakrat

A command-line tool to mirror YUM repositories with versioning

What does it do?

  • You invoke pakrat and pass it some repository baseurl's or the path to some YUM configurations
  • Pakrat mirrors the YUM repositories, and optionally arranges the data in a versioned manner.

Installation

Pakrat is available in PyPI as pakrat. That means you can install it with easy_install:

# easy_install pakrat

NOTE Installation from PyPI should work on any Linux. However, since Pakrat depends on YUM and Createrepo, which are not available in PyPI, these dependencies will not be detected as missing. The easiest install path is to install on some kind of RHEL like so:

# yum -y install createrepo
# easy_install pakrat

How to use it

Pakrat is mainly a command-line driven tool. The simplest possible example would involve mirroring a YUM repository in a very basic way:

$ pakrat --name centos --url http://mirror.centos.org/centos/6/os/x86_64
$ tree -d centos
centos/
+-- Packages
+-- repodata

A slightly more complex example would be to version the same repository. To do this, you must pass in a version number. An easy example is to mirror a repository daily.

$ pakrat \
    --repoversion $(date +%Y-%m-%d) \
    --name centos \
    --url http://mirror.centos.org/centos/6/os/x86_64
$ tree -d centos
centos/
+-- 2013-07-29
|   +-- Packages -> ../Packages
|   +-- repodata
+-- latest -> 2013-07-29
+-- Packages

If you were to configure the above to command to run on a daily schedule, eventually you would see something like:

$ tree -d centos
centos/
+-- 2013-07-29
|   +-- Packages -> ../Packages
|   +-- repodata
+-- 2013-07-30
|   +-- Packages -> ../Packages
|   +-- repodata
+-- 2013-07-31
|   +-- Packages -> ../Packages
|   +-- repodata
+-- latest -> 2013-07-31
+-- Packages

Pakrat is also capable of handling multiple YUM repositories in the same mirror run. If multiple repositories are specified, each repository will get its own download thread. This is handy if you are syncing from a mirror that is not particularly quick. The other repositories do not need to wait on it to finish.

$ pakrat \
    --repoversion $(date +%Y-%m-%d) \
    --name centos \
    --url http://mirror.centos.org/centos/6/os/x86_64 \
    --name epel \
    --url http://dl.fedoraproject.org/pub/epel/6/x86_64
$ tree -d centos epel
centos/
+-- 2013-07-29
|   +-- Packages -> ../Packages
|   +-- repodata
+-- latest -> 2013-07-29
+-- Packages
epel/
+-- 2013-07-29
|   +-- Packages -> ../Packages
|   +-- repodata
+-- latest -> 2013-07-29
+-- Packages

Configuration can also be passed in from YUM configuration files. See the CLI --help for details.

Building an RPM

Pakrat can be easily packaged into an RPM.

  1. Download a release and name the tarball pakrat.tar.gz:
curl -o pakrat.tar.gz -L https://github.com/ryanuber/pakrat/archive/master.tar.gz
  1. Build it into an RPM:
rpmbuild -tb pakrat.tar.gz

What's missing

  • Better logging (currently console-only)

Thanks

Thanks to Keith Chambers for help with the ideas and useful input on CLI design.