If you are running PostgreSQL database server, you will surely need to do backups at some point. If you already do, you are probably using the pg_dump
and pg_restore
combo. These commands are good for small to medium sized databases, but not for big ones. Also you can not run pg_dump
continuously, so you really can lose the new data if something bad happens between the dumps. Fortunately PostgreSQL is using the technique called Write Ahead Log (WAL)
to ensure the integrity of the data. The simple explanation is: for every data change, there has to be a log entry written first and flushed to permanent storage. WAL
is a continuous log of changes, which can be used to restore database to consistent state. If you disable fsync
, you will no longer have the guarantee that your database can be recovered to the original state before the crash. Now how do we use this WAL
to do continuous backups ? PostgreSQL allows us to specify custom command to run for every generated WAL
change (archive). While this is great, it is still quite complex to setup continuous archiving by hand.
Enter WAL-E
WAL-E is a program designed to perform continuous archiving of PostgreSQL WAL files and base backups.
WAL-E
is written in Python and supports most of the cloud providers. So without further ado, lets jump right into the process of setting it up. Please note that examples in this article were tested on Ubuntu 14.04
, Python 2.7
, WAL-E 9.2
and PostgreSQL 9.4
.
Setting up WAL-E
Before installing the Python package we need to install some system dependencies:
apt-get install python-dev python-virtualenv libevent-dev pv lzop daemontools
It is recommended to install WAL-E
into its own virtualenv
in order to prevent dependency version collision:
virtualenv -p /opt/wal-e
Now we just need to install WAL-E
using pip
which is located in the virtualenvs bin
directory:
/opt/wal-e/bin/pip install wal-e
In order to setup WAL-E
we need to set some environment variables to configure the backup process. Head over to your Rackspace cloud admin and create a new container (or use the API). Rackspace Cloud Files are behind the scenes powered by OpenStack Swift. Specify the container name, region and type, these correspond to environment variables WALE_SWIFT_PREFIX
, SWIFT_REGION
and SWIFT_ENDPOINT_TYPE
respectively. To avoid bandwidth charges, choose the same region as the cloud resource that will access these files. Also use the internalURL
if possible, because access to a publicURL
usually includes traffic charges. To check your SWIFT_TENANT
, just go to the account settings and look for the Account Number
field. Last two variables SWIFT_USER
and SWIFT_PASSWORD
are the login credentials of your account.
WALE_SWIFT_PREFIX="swift://my_container_name" # dont forget the swift:// prefix
SWIFT_AUTHURL="https://identity.api.rackspacecloud.com/v2.0/" # just copy this
SWIFT_ENDPOINT_TYPE="internalURL" # change to publicURL if required
SWIFT_REGION="IAD" # setup your region here
SWIFT_TENANT="my_user_id" # this is your numeric account id in rackspace
SWIFT_USER="my_user_name" # this is your login username in rackspace
SWIFT_PASSWORD="my_user_password" # this is your login password in rackspace
I recommend setting up envdir
for managing environment variables. First step is to create a directory that will store our envdir
variable files:
mkdir -p /etc/wal-e.d/env
And now for each variable create file named after the variable and put the value of the variable inside the file on the first line. The directory structure should look similar to this:
/etc/wal-e.d/
└── env
├── SWIFT_AUTHURL
├── SWIFT_ENDPOINT_TYPE
├── SWIFT_PASSWORD
├── SWIFT_REGION
├── SWIFT_TENANT
├── SWIFT_USER
└── WALE_SWIFT_PREFIX
To test if envdir
is picking our environment variables you can use the env
command (you should see a list of all available enviromental variables):
/usr/bin/envdir /etc/wal-e.d/env env
Testing connection to Cloud Files
After configuring our WAL-E
setup, we can test that it works by running the backup-list
command. This command outputs a list of uploaded base backups on success. If anything fails at this point, just stop and go over the setup steps again.
/usr/bin/envdir /etc/wal-e.d/env /opt/wale/bin/wal-e backup-list
Setting up CRON jobs for periodic backups
Now that we can communicate with Rackspace Cloud Files, we need to setup some periodic jobs. The first command backup-push
will be responsible for uploading base backups periodically (every Monday in our example). The second command will cleanup the old base backups and WAL
logs. In the example we are using the retain
subcommand to keep the latest 5 base backups and corresponding WAL
logs. Note that we also need to pass --confirm
option in order to actually delete the files (it runs in dry run mode by default).
# create base backup every monday
0 0 * * 1 /usr/bin/envdir /etc/wal-e.d/env /opt/wale/bin/wal-e backup-push /var/lib/postgresql/9.4/main
# cleanup old backups every day (keep last 5)
0 0 * * * /usr/bin/envdir /etc/wal-e.d/env /opt/wale/bin/wal-e delete --confirm retain 5
Updating PostgreSQL config
The last bit that is missing is the actual PostgreSQL configuration. The %p
in the archive command will be replaced by the current WAL
archive everytime it gets called.
# enable WAL archiving
archive_mode = on
# local shell command to execute to archive a completed WAL file segment
archive_command = '/usr/bin/envdir /etc/wal-e.d/env /opt/wale/bin/wal-e wal-push %p'
# when this parameter is greater than zero, the server will switch to a new WAL segment file
# whenever this many seconds have elapsed since the last segment file switch
archive_timeout = 60
# determines how much information is written to the WAL
wal_level = hot_standby
The following wal_level types can be set:
- minimal: adds information needed to recover from a crash or immediate shutdown
- hot_standby: adds information required to run read-only queries on a standby server
- logical: adds information necessary to support logical decoding
After updating the config, you will need to restart the server. Right after the restart, head over to your postgresql.log
file and look for archiving logs. If everything looks ok you should be seeing lines similar to these:
wal_e.main INFO MSG: starting WAL-E
DETAIL: The subcommand is "wal-push".
STRUCTURED: time=2016-05-16T12:33:36.413415-00 pid=17223
wal_e.worker.upload INFO MSG: begin archiving a file
DETAIL: Uploading "pg_xlog/00000001000005C80000002A" to "swift://my_container_name/wal_005/00000001000005C80000002A.lzo".
STRUCTURED: time=2016-05-16T12:33:36.469265-00 pid=17223 action=push-wal key=swift://my_container_name/wal_005/00000001000005C80000002A.lzo prefix= seg=00000001000005C80000002A state=begin
wal_e.worker.upload INFO MSG: completed archiving to a file
DETAIL: Archiving to "swift://my_container_name/wal_005/00000001000005C80000002A.lzo" complete at 3105.82KiB/s.
STRUCTURED: time=2016-05-16T12:33:37.525434-00 pid=17223 action=push-wal key=swift://my_container_name/wal_005/00000001000005C80000002A.lzo prefix= rate=3105.82 seg=00000001000005C80000002A state=complete
Be aware that if your archive_command
fails for some time, your archives might be already deleted by PostgreSQL over time. This can be tweaked using the wal_keep_segments
paramater. Also you need to be sure that your CRON cleanup script is working, otherwise you can quickly run out of space. Either way, you should setup some sort of monitoring to detect failures early and to be able to react accordingly. Now you have a working continuous WAL
archiving setup. This setup can be easily tweaked to support replication to other PostgreSQL server.