Tuesday, January 10, 2012

Postgresql - Recovery through archiving (PITR). Simple example

Ok, I have a brand new PostgreSQL 9.1 installation and I'll set up a simple recovery scenario.
My (brand new)database cluster path:
/var/lib/pgsql/pitr1

Step1. Before starting the server, enable archiving.
a. Define where the WAL logs will be archived.
For these purposes I'll use the directory /var/lib/pgsql/wal_archive

b.Configure the connection and archiving options (I changed these from the default values):

vi postgresql.conf
listen_addresses = '*'          # what IP address(es) to listen on;
port = 5433                     # I have the 5432 busy with another experiment
wal_level = archive             # minimal, archive, or hot_standby
                                # (change requires restart)
archive_mode = on               # allows archiving to be done
                                # (change requires restart)
archive_command = 'cp %p /var/lib/pgsql/wal_archive/%f'
max_wal_senders = 5             # max number of walsender processes
                                # (change requires restart)


The part worth explaining is this:
archive_command = 'cp %p /var/lib/pgsql/wal_archive/%f'
with this command we instruct postgresql to archive each wal segment (equivalent to a redo log in Oracle) it generates, represented with the variable %p (this variable will be replaced with its full path) to the directory /var/lib/pgsql/wal_archive. %f stands for the name of the wal segment and it will be replaced with the actual wal segment name behind the scenes too. PostgreSQL will handle for us those %p and %f variables.
The default value for the parameter max_wal_senders must be set to a positive integer value greater than cero for this to work.

c. Configure the connection settings. (I'll make the base backup through the pg_basebackup function, so authentication for replication is required.
vi pg_hba.conf
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     postgres                                trust
host    replication     postgres        127.0.0.1/32            trust
#host    replication     postgres        ::1/128                 trust



Step2. Start up the server, create a test database, make some ddl/dml operations and confirm that wal segments are being sent to the location specified.


Starting the server up:
postgres@linux-2561:~/pitr1> pg_ctl start -D .
postgres@linux-2561:~/pitr1> cd pg_log/
postgres@linux-2561:~/pitr1/pg_log> ls
postgresql-2012-01-10_114123.log
postgres@linux-2561:~/pitr1/pg_log> tail -f postgresql-2012-01-10_114123.log 
2012-01-10 11:41:23 CST   LOG:  database system was shut down at 2012-01-10 11:10:12 CST
2012-01-10 11:41:23 CST   LOG:  database system is ready to accept connections
2012-01-10 11:41:23 CST   LOG:  autovacuum launcher started
#so it started without error

Creating the database and filling it with dummy data:
postgres@linux-2561:~/pitr1/pg_log> createdb -p 5433 pgbenchdb
postgres@linux-2561:~/pitr1/pg_log> pgbench -i -s 10 -p 5433 pgbenchdb
NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
creating tables...
1000000 tuples done.
set primary key...
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_branches_pkey" for table "pgbench_branches"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_tellers_pkey" for table "pgbench_tellers"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_accounts_pkey" for table "pgbench_accounts"
vacuum...done.

verifying that wal segments are being archived.
postgres@linux-2561:~/pitr1/pg_log> ls ../../wal_archive/
000000010000000000000001  000000010000000000000004  000000010000000000000007  00000001000000000000000A
000000010000000000000002  000000010000000000000005  000000010000000000000008
000000010000000000000003  000000010000000000000006  000000010000000000000009

make some more transactions that among other thing affect the pgbench_history table that keeps track of the time an insertion was made.
postgres@linux-2561:~/pitr1/pg_log> pgbench -s 10 -c 6 -j 2 -t 100 -p 5433 pgbenchdb 
Scale option ignored, using pgbench_branches table count = 10
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 6
number of threads: 2
number of transactions per client: 100
number of transactions actually processed: 600/600
tps = 187.111393 (including connections establishing)
tps = 187.944077 (excluding connections establishing)

Step3. Make the base backup .
In this case I'll perform the recovery on the same machine.

postgres@linux-2561:~/pitr1> cd 
postgres@linux-2561:~> ls
data_sr  pgbench_transaction.sql  pgpool_scripts_backup  pitr1  repmgr  unclassified  wal_archive
postgres@linux-2561:~> mkdir pitr2
postgres@linux-2561:~> pg_basebackup -p 5433 -D /var/lib/pgsql/pitr2/
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived

A file with backup extension was generated with the following contents:
START WAL LOCATION: 0/E000020 (file 00000001000000000000000E) <- the name of the starting wal file
STOP WAL LOCATION: 0/E000094 (file 00000001000000000000000E)
CHECKPOINT LOCATION: 0/E000020
BACKUP METHOD: streamed
START TIME: 2012-01-10 13:10:00 CST <- the time at which pg_start_backup was run
LABEL: pg_basebackup base backup <- label given in pg_start_backup (executed automatically by pg_basebackup)
STOP TIME: 2012-01-10 13:10:02 CST
a. Change this backup server port so I does not conflict with the original one.
postgres@linux-2561:~/pitr2> ls
backup_label  global   pg_hba.conf    pg_log        pg_notify  pg_stat_tmp  pg_tblspc    PG_VERSION  postgresql.conf
base          pg_clog  pg_ident.conf  pg_multixact  pg_serial  pg_subtrans  pg_twophase  pg_xlog     postmaster.opts
postgres@linux-2561:~/pitr2> vi postgresql.conf 
listen_addresses = '*'          # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost', '*' = all
                                        # (change requires restart)
port = 5434 
Step4. Create the recovery.conf file on the backup cluster The backup server will read this file on start up and it will "know" that it has to perform recovery (read wal segments from the specified location and apply them to data files) starting from the first wal segment generated during backup.
vi /var/lib/pgsql/pitr2/recovery.conf
recovery_command = 'cp /var/lib/pgsql/wal_archive/%f %p' #this is the only mandatory setting.
Before starting up the server, I'd like to generate more WAL segments on the master.
postgres@linux-2561:~/pitr2> pgbench -s 10 -c 6 -j 2 -t 100 -p 5433 pgbenchdb 
Scale option ignored, using pgbench_branches table count = 10
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 6
number of threads: 2
number of transactions per client: 100
number of transactions actually processed: 600/600
tps = 191.475816 (including connections establishing)
tps = 192.120436 (excluding connections establishing)
Take note of the last transaction timestamp.
pgbenchdb=# SELECT max(mtime) from pgbench_history;
            max             
----------------------------
 2012-01-10 13:54:47.366648
(1 row)
Now, there's the possibility that the last transactions on the principal server aren't still stored in the last wal segment, so we force a log switch (that executes with a previous checkpoint). Also, you can force a segment switch manually with pg_switch_xlog, if you want to ensure that a just-finished transaction is archived as soon as possible http://www.postgresql.org/docs/8.2/static/functions-admin.html
postgres@linux-2561:~/pitr2> psql -p 5433 -c "select pg_switch_xlog();"
 pg_switch_xlog 
----------------
 0/F879168
(1 row)
One thing I need to review is this info in the manual.
you should at least save the contents of the cluster’s pg_xlog subdirectory, as it might contain logs which were not archived before the system went down.
Step5. Having everything set up. Let's start up the backup server. Expected result: to have an exact copy of the master server being the last transaction timestamp: 2012-01-10 13:54:47.366648
postgres@linux-2561:~/pitr2> pg_ctl start -D .
server starting
postgres@linux-2561:~/pitr2> 2012-01-10 21:21:04 GMT   FATAL:  data directory "/var/lib/pgsql/pitr2" has group or world access
2012-01-10 21:21:04 GMT   DETAIL:  Permissions should be u=rwx (0700).
Yikes!!!!! Ok then.
postgres@linux-2561:~> chmod 700 -R pitr2
postgres@linux-2561:~> pg_ctl start -D pitr2/
server starting
postgres@linux-2561:~> tail -f pitr2/pg_log/postgresql-2012-01-10_1
postgresql-2012-01-10_114123.log  postgresql-2012-01-10_130842.log  postgresql-2012-01-10_152456.log
postgres@linux-2561:~> tail -f pitr2/pg_log/postgresql-2012-01-10_152456.log 
2012-01-10 15:24:56 CST   LOG:  could not open file "pg_xlog/000000010000000000000010" (log file 0, segment 16): No such file or directory
2012-01-10 15:24:56 CST   LOG:  redo done at 0/F87914C
2012-01-10 15:24:56 CST   LOG:  last completed transaction was at log time 2012-01-10 13:55:13.965712-06
2012-01-10 15:24:56 CST   LOG:  restored log file "00000001000000000000000F" from archive
cp: cannot stat `/var/lib/pgsql/wal_archive/00000002.history': No such file or directory
2012-01-10 15:24:56 CST   LOG:  selected new timeline ID: 2
cp: cannot stat `/var/lib/pgsql/wal_archive/00000001.history': No such file or directory
2012-01-10 15:24:56 CST   LOG:  archive recovery complete
2012-01-10 15:24:59 CST   LOG:  autovacuum launcher started
2012-01-10 15:24:59 CST   LOG:  database system is ready to accept connections
Regarding to the .history files not found, the docs say: You may also see an error message at the start of recovery for a file named something like 00000001.history. This is also normal and does not indicate a problem in simple recovery situations.
Now, to make some generic tests for consistency. Data in the primary server:
postgres@linux-2561:~> psql -p 5433 pgbenchdb
psql (9.1.1)
Type "help" for help.

pgbenchdb=# \dt+
                          List of relations
 Schema |       Name       | Type  |  Owner   |  Size  | Description 
--------+------------------+-------+----------+--------+-------------
 public | pgbench_accounts | table | postgres | 124 MB | 
 public | pgbench_branches | table | postgres | 40 kB  | 
 public | pgbench_history  | table | postgres | 88 kB  | 
 public | pgbench_tellers  | table | postgres | 40 kB  | 
(4 rows)

pgbenchdb=# select max(mtime) from pgbench_history;
            max             
----------------------------
 2012-01-10 13:54:47.366648
(1 row)
Data in the backup server:
postgres@linux-2561:~> psql -p 5434 pgbenchdb
psql (9.1.1)
Type "help" for help.

pgbenchdb=# \dt+
                          List of relations
 Schema |       Name       | Type  |  Owner   |  Size  | Description 
--------+------------------+-------+----------+--------+-------------
 public | pgbench_accounts | table | postgres | 124 MB | 
 public | pgbench_branches | table | postgres | 40 kB  | 
 public | pgbench_history  | table | postgres | 88 kB  | 
 public | pgbench_tellers  | table | postgres | 40 kB  | 
(4 rows)

pgbenchdb=# select max(mtime) from pgbench_history;
            max             
----------------------------
 2012-01-10 13:54:47.366648
(1 row)

Now my wal_archive directory looks like this:
postgres@linux-2561:~/wal_archive> ls
000000010000000000000001  000000010000000000000006  00000001000000000000000B  00000001000000000000000F
000000010000000000000002  000000010000000000000007  00000001000000000000000C  00000002000000000000000F
000000010000000000000003  000000010000000000000008  00000001000000000000000D  00000002.history
000000010000000000000004  000000010000000000000009  00000001000000000000000E
000000010000000000000005  00000001000000000000000A  00000001000000000000000E.00000020.backup
This recovery produced an 00000002.history file that contains this timeline:
1       00000001000000000000000F        no recovery target specified
So this is the last wal segment that was applied to data files in this recovery. After that wal segment a new series was created, namely:
00000002000000000000000F
So, it only incremented the left most hexadecimal number to create this new wal segment series.

1 comment: