Friday, December 23, 2011

Postgresql - pgpool - analysis of the recovery, failover and failback operations

I realized I had misunderstood the concepts related to the administration of a pgpool cluster. I have summarize those concepts here.

Recovery ->
Commands (1st stage and 2nd stage) run by the master to incorporate a new slave node.
These scripts are run by the postgresql server (naturally, by the same user that started the instance...right? )
Thus the recovery procedure requires that the primary server is up and running.
The cluster variables such as %m %M are not available for this operation. As the parameters (three) are  passed by the pgpool_recovery function.
% pgpool_recovery datadir remote_host remote_datadir
datadir:        Database cluster path of a master node.
remote_host:    Hostname of a recovery target node.
remote_datadir: Database cluster path of a recovery target node.

What do the provided examples do?
recovery_1st_stage_command = '' -> Performs a base backup of the primary and sends it to the new (or in recovery) node
recovery_2nd_stage_command = '' -> does nothing

This is what happens behind the curtain:
For your reference, here are the steps taken in the recovery procedure.

Pgpool-II connects to primary server's template1 database as user = recovery_user, password = recovery_password.
Primary server executes pgpool_recovery function.
pgpool_recovery function executes recovery_1st_stage_command. Note that PostgreSQL executes functions with database cluster as the current directory. Thus recovery_1st_stage_command is executed in the database cluster directory.
Primary server executes pgpool_remote_start function. This function executes a script named "pgpool_remote_start" in the database cluster directory, and it executes pg_ctl command on the standby server to be recovered via ssh. pg_ctl will start postmaster in background. So we need to make sure that postmaster on the standby actually starts.
pgpool-II tries to connect to the standby PostgreSQL as user = recovery_user and password = recovery_password. The database to be connected is "postgres" if possible. Otherwise "template1" is used. pgpool-II retries for recovery_timeout seconds. If success, go to next step.
If failback_command is not empty, pgpool-II parent process executes the script.
After failback_command finishes, pgpool-II restart all child processes.

+ The unusual thing for me at this point is that this functions are executed by the postgresql server. I need to analyze how does it do it.

This is the signature for the pgpool functions:
template1=# \df
List of functions
Schema |         Name         | Result data type | Argument data types |  Type  
public | pgpool_recovery      | boolean          | text, text, text    | normal
public | pgpool_regclass      | oid              | cstring             | normal
public | pgpool_remote_start  | boolean          | text, text          | normal
public | pgpool_switch_xlog   | text             | text                | normal
public | pgpool_walrecrunning | boolean          |                     | normal
(5 rows)

+ pgpool_remote_start has two arguments:
DEST=$1 -> the slave server
DESTDIR=$2 -> the cluster (datadir) on the slave server

Then, this is why you can't specify parameters for these commands in the pgpool.conf file, as they're passed implicitly.

Now, where the hell is this script located? It must be present in the database cluster (datadir) where the master is running).

Corollary. my recovery_1st_stage_command would look like this:

# first stage recovery  
# $1 datadir
# $2 desthost
# $3 destdir

#as I'm using repmgr it's not necessary for me to know datadir(master) $1
#repmgr needs to know the master's ip
MASTERNODE=`/sbin/ifconfig eth0 | grep inet | awk '{print $2}' | sed 's/addr://'`

cmd1=`ssh postgres@$RECOVERY_NODE "repmgr -D $CLUSTER_PATH --force standby clone $MASTERNODE"`
echo $cmd1  

The location of the failback and failover scripts is not relative to the database cluster, the absolute path is needed, these are executed by the apache (or wwwrun ) user.

Failback ->
failback_command. command to run when a node is attached. If not empty, pgpool-II parent process executes the script.
After it finishes,child processes are restarted.

Failover ->
command to run when a node is detached
--an alert maybe ?
side effects.
1. pgpool kills all its child processes, which will in turn terminate all active sessions to pgpool
2. pgpool invokes the failover_command and waits for its completion
3. after this, pgpool starts new child processes and is ready again to accept connection from clients

command to run only after a master failover.Typically the command should be used to recover the slave from the new master by call the pcp_recovery_node command for example
1. If follow_master_command is not empty, when a master failover is completed:
a) pgpool degenerates all nodes except the new master
b) starts new child processes to be ready to accept connections from clients
c) after this, pgpool run the command set into the 'follow_master_command' for each degenerated nodes ( typically the command should be used to recover the slave from the new master by call the pcp_recovery_node command for example ).

Other concepts
What do we understand for node degeneration? disconnection from the cluster.

For the failover_command, failback_command and follow_master_command you can use these parameters in your scripts:

Special character Description
%d Backend ID of an attached node.
%h Hostname of an attached node.
%p Port number of an attached node.
%D Database cluster path of an attached node.
%M Old master node
%m New master node
%H Hostname of the new master node.
%P Old primary node ID.
%% '%' character

They're not available for the recovery commands.

In summary:
Recovery. applies when attaching a new slave node, or reattaching a previously detached slave node, the master is required to be up and running.
Failback. command that runs when a node is attached (or after a recovery)
Failover. when a node is detached (master or slave)

Possible failover strategy.
1. If the master is detached, promote the first slave node that is available.
2. Degenerate all the slave nodes except the new master.
3. Tell all the other nodes to now follow the new master.

No comments:

Post a Comment