QuickStart
This quickstart will help you setup PGHoard for simple use case:
“Remote” basebackup using pg_basebackup
WAL Archiving using the
pg_receivewal
toolLocal archiving to
/mnt/pghoard_backup/
The local archiving is chosen because it’s the easiest to demonstrate without external dependencies. For the object storage of your choice (S3, Azure, GCP…), refer to the appropriate section at Storage configuration.
Installation
Follow the instructions at Installation from your distribution package manager. Then, setup PostgreSQL following PostgreSQL Configuration.
Configuration
It is advised to use a replication slot to prevent WAL files from being recycled when we haven’t consumed them yet.
You can use pg_receivewal to create your replication slot:
pg_receivewal --create-slot -S pghoard_slot -U pghoard
Create a /var/lib/pghoard.json
file containing the following information, replacing
the user and password you chose at the previous step:
{
"backup_location": "/mnt/pghoard_backup/state/",
"backup_sites": {
"my_test_cluster": {
"nodes": [
{
"host": "127.0.0.1",
"password": "secret",
"user": "pghoard",
"slot": "pghoard_slot"
}
],
"object_storage": {
"storage_type": "local",
"directory": "/mnt/pghoard_backup/"
},
"pg_data_directory": "/var/lib/postgres/data/",
"pg_receivexlog_path": "/usr/bin/pg_receivewal",
"pg_basebackup_path": "/usr/bin/pg_basebackup",
"basebackup_interval_hours": 24,
"active_backup_mode": "basic"
}
}
}
Testing your first backup
Launching pghoard
Launch pghoard using:
pghoard pghoard.json
If everything went well, you should see something like this in the logs of pghoard:
2021-07-30 15:56:48,678 PGBaseBackup Thread-23 INFO Started: ['/usr/bin/pg_basebackup', '--format', 'tar', '--label', 'pghoard_base_backup', '--verbose', '--pgdata', '/mnt/pghoard_backup/state/my_test_cluster/basebackup_incoming/2021-07-30_13-56_0', '--wal-method=none', '--progress', '--dbname', "dbname='replication' host='127.0.0.1' replication='true' user='pghoard'"], running as PID: 3652881, basebackup_location: '/mnt/pghoard_backup/state/my_test_cluster/basebackup_incoming/2021-07-30_13-56_0/base.tar'
2021-07-30 15:56:48,805 PGBaseBackup Thread-23 INFO Ran: ['/usr/bin/pg_basebackup', '--format', 'tar', '--label', 'pghoard_base_backup', '--verbose', '--pgdata', '/mnt/pghoard_backup/state/my_test_cluster/basebackup_incoming/2021-07-30_13-56_0', '--wal-method=none', '--progress', '--dbname', "dbname='replication' host='127.0.0.1' replication='true' user='pghoard'"], took: 0.127s to run, returncode: 0
2021-07-30 15:56:48,922 Compressor Thread-3 INFO Compressed 33009152 byte open file '/mnt/pghoard_backup/state/my_test_cluster/basebackup_incoming/2021-07-30_13-56_0/base.tar' to 6797509 bytes (21%), took: 0.091s
2021-07-30 15:56:48,925 TransferAgent Thread-12 INFO Uploading file to object store: src='/mnt/pghoard_backup/state/my_test_cluster/basebackup/2021-07-30_13-56_0' dst='my_test_cluster/basebackup/2021-07-30_13-56_0'
2021-07-30 15:56:48,928 TransferAgent Thread-12 INFO Deleting file: '/mnt/pghoard_backup/state/my_test_cluster/basebackup/2021-07-30_13-56_0' since it has been uploaded
What this means is that pghoard performed the following sequence of actions:
it launched pg_basebackup to perform the first basebackup of your cluster, and stored it in a temporary location (
backup_location
from the config file)then it “uploaded” it. Since we chose a local storage for backup, it is just copied to the destination.
finally it removes the temporary files
This process would have been the same had you used a remote object storage like
S3
or Swift
.
You can check the contents of the final storage location:
❯ tree /mnt/pghoard_backup/my_test_cluster
/mnt/pghoard_backup/my_test_cluster
└── basebackup
├── 2021-07-30_13-56_0
└── 2021-07-30_13-56_0.metadata
Restoration
You can list your database basebackups by running:
❯ pghoard_restore list-basebackups --config pghoard.json -v
Available 'my_test_cluster' basebackups:
Basebackup Backup size Orig size Start time
---------------------------------------- ----------- ----------- --------------------
my_test_cluster/basebackup/2021-07-30_13-56_0 6 MB 31 MB 2021-07-30T13:56:48Z
metadata: {'backup-decision-time': '2021-07-30T13:56:48.673846+00:00', 'backup-reason': 'scheduled', 'start-wal-segment': '000000010000000000000081', 'pg-version': '130003', 'compression-algorithm': 'snappy', 'compression-level': '0', 'original-file-size': '33009152', 'host': 'myhost'}
If we’d want to restore to the latest point in time we could fetch the required basebackup by running:
pghoard_restore get-basebackup --config pghoard.json \
--target-dir <destination> --restore-to-primary
Basebackup complete.
You can start PostgreSQL by running pg_ctl -D foo start
On systemd based systems you can run systemctl start postgresql
On SYSV Init based systems you can run /etc/init.d/postgresql start
Note that the target-dir
needs to be either an empty or non-existent
directory in which case PGHoard will automatically create it.
After this we’d proceed to start both the PGHoard server process and the PostgreSQL server normally by running (on systemd based systems, assuming PostgreSQL 9.5 is used):
systemctl start pghoard
systemctl start postgresql-9.5
Which will make PostgreSQL start recovery process to the latest point in time. PGHoard must be running before you start up the PostgreSQL server. To see other possible restoration options please look at pghoard_restore.
Optional: Adding encryption
If you want to encrypt your backups, you need to generate a public / private RSA key pair.
The pghoard_create_keys
script is used for that:
pghoard_create_keys --site my_test_site --key-id 1
It will output a config snippet of the form:
{
"backup_sites": {
"my_test_site": {
"encryption_key_id": "1",
"encryption_keys": {
"1": {
"private": "-----BEGIN PRIVATE KEY-----<actual key>-----END PRIVATE KEY-----\n",
"public": "-----BEGIN PUBLIC KEY-----<actual key>-----END PUBLIC KEY-----\n"
}
}
}
}
}
If you want this server to perform both backup and restore, you will need to
copy both keys to your config file, under the backup_sites/my_test_site
section.
If you only need to perform backups, you can store only the public key, in which case the host running pghoard will not be able to decipher the encrypted backups.
Danger
Always keep a safe copy of your private key ! You WILL need it to access your backups