Wednesday, August 24, 2011

what's failback & failover / pgpool - failover mechanism explanation

failback & failover

I'm delving into more terminology regarding database replication and recovery.

Failover. Refers to the process of switching production to a backup facility (normally your recovery site)

Failback. Is the process of returning production to its original location
http://publib.boulder.ibm.com/infocenter/dsichelp/ds8000ic/index.jsp?topic=%2Fcom.ibm.storage.ssic.help.doc%2Ff2c_pprcfailbackov_1v262p.html

pgpool - failover mechanism explanation

I took this diagram from the Gerd Koening's tutorial pgpool-II for beginners



http://pgpool.projects.postgresql.org/contrib_docs/pgpool-II_for_beginners.pdf

Replication configuration

  1. node 1 (master) and node 2 are working along, and suddenly node 2 breaks for some reason.

  2. pgpool through its health_check detects that node 2 is having issues and shuts it down (degenerates it) . In this particular example the failover_command and failback_command parameters both just generate a log file depicting what's happening (who's the new master and the old master). The parameter controlling this situation is the "fail_over_on_backend_error", in this case set to true, meaning that fail over will be triggered when the writing to a backend communication socket fails. In this configuration the node is removed from the cluster.

  3. Calling the pcp_recovery_node triggers our create_base_backup (first stage) script which transfers an initial backup of the master server to the slave that was previously offline.

  4. On the second stage of recovery the last transactions are switched to database log.

  5. And finally a call to the pgpool_remote_start  expands the base backup,applies the last changes registered through the database log (that is being store in the slave) and restarts the database instance in recovery mode so the last WAL's are replayed.


Still, I need to figure out some more details.

No comments:

Post a Comment