Wednesday, January 11, 2012

PostgreSQL - PITR . Getting back lost data, due to "user error".

There are circumstances in which something bad happens in the database (like dropping a table or deleting important data), but if you've got a base backup and a series of consistent wal segments, there's nothing to worry about.

Case: Important data was deleted by a malicious or careless database user.
  
Assumptions:
  • A base backup created before the data was deleted.
  • wal archiving enabled  and working
  • log reports set with log_statement(mod)

Let's say the "malicious" user performs the following operations:


postgres@linux-2561:~> psql -p 5433 pgbenchdb
psql (9.1.1)
Type "help" for help.

pgbenchdb=# select count(*) from pgbench_history;
 count 
-------
 10000
(1 row)

pgbenchdb=# delete from pgbench_history where extract(second from mtime)::int % 7 = 0;
DELETE 1350
pgbenchdb=# delete from pgbench_history where extract(second from mtime)::int % 11 = 0;
DELETE 611
pgbenchdb=# delete from pgbench_history where extract(second from mtime)::int % 13 = 0;
DELETE 289
pgbenchdb=# delete from pgbench_history where extract(second from mtime)::int % 17 = 0;
DELETE 779
pgbenchdb=# select count(*) from pgbench_history;
 count 
-------
  6971
(1 row)


After some time we get a call, telling us that today, some data was deleted from one very important table called pgbench_history.

Problem 1. When was this data deleted?
If we've got the reporting logs set to an appropriate level, this is a rather simple task.

postgres=# select log_time,message from pglog where log_time::date = current_date and message like '%delete%' and message like '%history%' order by log_time asc;
          log_time          |                                        message                                         
----------------------------+----------------------------------------------------------------------------------------
 2012-01-11 17:30:26.378-06 | statement: delete from pgbench_history where extract(second from mtime)::int % 7 = 0;
 2012-01-11 17:30:34.854-06 | statement: delete from pgbench_history where extract(second from mtime)::int % 11 = 0;
 2012-01-11 17:30:41.01-06  | statement: delete from pgbench_history where extract(second from mtime)::int % 13 = 0;
 2012-01-11 17:30:45.457-06 | statement: delete from pgbench_history where extract(second from mtime)::int % 17 = 0;
(4 rows)


So, we roll forward (starting from the time that the base backup was taken) up to the time, just before any of those deletions took place, that is, before 2012-01-11 17:30:26.378-06.

To make sure all transactions are present in the archiving directory.
pgbenchdb=# select pg_switch_xlog();
 pg_switch_xlog 
----------------
 0/2A1A0A80
(1 row)

The only thing we have to do is to start our base backup with a recovery.conf file that specifies the "recovery_target_time" as '2012-01-11 17:30:26' for example.

Now, when you're not sure about up to what time to replay the wal files, I think there's no other means than by trial and error. In such cases you better make a copy of your base backup before replaying the wal files.

vi recovery.conf
restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
recovery_target_time = '2012-01-11 17:30:26'

Once the recovery finishes, querying the pgbench_history table:

postgres@linux-2561:~/pitr2/pg_log> psql -p 5436 pgbenchdb
psql (9.1.1)
Type "help" for help.

pgbenchdb=# select count(*) from pgbench_history;
 count 
-------
 10000
(1 row)

One thing that would be wonderful is to go back again in time. The thing here is that WAL replaying means that the changes to the database are reproduced and applied to the data files, then no going back is possible. But nothing could stop you from going further ahead ... I think.

Well you can actually go back by restoring your copy of the base backup and setting an earlier "recovery_target_time" =-) . On this terms it's obvious to infer that you cannot go further back from the time at which the base backup was created.

No comments:

Post a Comment