Friday, January 6, 2012

Postgresql - Streaming replication: synchronous replication

One cool new feature in postgresql 9.1 is the ability to set up synchronous streaming replication. What does that mean? That one transaction will be committed only when it has been written to wal records (being sent) and applied to the standby servers that are replicating data in a synchronous manner. This sounds like an extra overhead to dml operations, so I guess it's advisable for you to have an excellent communication pathway between the master and the synchronous standby servers. The main advantage of this set up is that you can guarantee to a great deal the reliability of data in case of failure, if one transaction was given the status of committed it will become available instantly in all the synchronous standby's as well, so you'll have virtually no data loss in case of contingencies.

Here are the steps to configure this kind of streaming replication:
http://vibhorkumar.wordpress.com/2011/10/20/asynchronoussynchronous-streaming-replication-in-postgresql-9-1/

The important part being:

To switch Asynchronous replication to Synchronous Replication,use following steps:

1. Change following parameter in postgresql.conf on Primary Server:
synchronous_standby_names = 'sync_replication'

2. Reload the above changes on primary using following command:
pg_ctl -D [primary data directory] reload

3. Change following parameter in recovery.conf file on Standby:
primary_conninfo = 'host=localhost port=5432 application_name=sync_replication'
4. Restart the standby using following command:
pg_ctl -D [standby directory] restart -m fast
Then ,the difference is that :
  • you specify the synchronous_standby_names in the master's configuration, for the standby servers that will replicate in a synchronous fashion.
  • reload the master's configuration
  • add the application_name option to the primary_conninfo string in the recovery.conf file in each synchronous standby.
  • restart the standby server.
Testing case:
+ Start a big transaction on the master server and kill one synchronous slave in the middle of it.
Expected result:
+ The transaction will  be rolled back. No more transactions will be allowed 'till the synchronous standby server becomes available and re-synchronizes.

Results:
+ The transaction is stuck!!!
This is what the master says about it:

in the log:
2012-01-06 17:09:58 CST [unknown] postgres LOG:  could not send data to client: Connection reset by peer

current activity in pg_stat_activity:
-[ RECORD 2 ]----+--------------------------------
datid            | 24576
datname          | pgbenchdb
procpid          | 13296
usesysid         | 10
usename          | postgres
application_name | psql
client_addr      | 
client_hostname  | 
client_port      | -1
backend_start    | 2012-01-06 17:09:50.726341-06
xact_start       | 2012-01-06 17:09:50.728063-06
query_start      | 2012-01-06 17:10:01.36678-06
waiting          | f
current_query    | commit;

the replication state in pg_stat_replication:

pgbenchdb=# select * from pg_stat_replication;
(No rows)

I was kind of expecting that.

There's no remedy for me other than to kill the transaction manually (there must me some sort of replication timeout parameter somewhere!).
But before that, am I able to execute dml operations through another session?
Yes!!!! Oops! (but this happens after I cancel it)


If I cancel manually the hung up transaction I get the following message:

pgbenchdb=# insert into pgbench_history (tid) values ( generate_series(1,100));
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
INSERT 0 100

postgres@debian1:~$ psql -p 5432 pgbenchdb < test_sync_replication.sql
BEGIN
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
COMMIT
Well, canceling a request is not something that applications usually are programmed to. But the issue that worries me is that it won't return an error message, it will freeze waiting for the synchronous standby to reply! Need to dive in the docs to figure out what the hell is going on. Despite the above results, I think it's important to pinpoint that the status of committed was not granted in neither the transactions.

1 comment:

  1. I'm facing the same problem. Did you find a any solution to above situation ?

    ReplyDelete