Postgres-XC and High-AvailabilityEdit
To configure Postgres-XC high availability feature, you should do it for each component, namely, gtm, gtm_proxy, coordinator and datanode.' 'They provide specific backup and failover mechanism. Before digging their feature in detail, let's see what we should to when specific XC comopnent crashes.
HA configuration for each componentEdit
The following describes overview of how to configure HA feature of each component. Integrating them into Pacemaker/Heartbeat is now being done by another team. This information will be provided elsewhere.
Because GTM is the central component which provides key MVCC (Multi-Version Concurrent Control) and sequence information to all the other components, we must have its backup to maintain current transaction and sequence status and failover to the backup when GTM crashes. This is calles GTM-Standby and is implemented as a part of GTM.
When GTM fails over to the standby, connection information (host and port) may change. This change may have to be informed to components connecting directly to it.
Visit GTM Standby Configuration for more details.
GTM proxy is just a proxy to group up communication between GTM and Coordinator/Datanode for performance. It does not maintain any persistent data so you can just restart GTM proxy when it crashes. When GTM crashes, GTM proxy can accept a command to reconnect to the new GTM without any transaction loss. This is another reason why you should use GTM Proxy.
Visit GTM-Proxy HA configuration for more details.
Coordinator stores only catalog data to handle SQL statements and does not store rows of tables. Therefore, you don't need specific coordinator backup. In this case, applications should not connect to the crashed coordinator. Because DDL will visit the crashed coordinator, you must drop this coordinator from the cluster configuration. You can restore coordinator by copying other coordinator's database in offline basis.
Of course, you can use shared disk system and failover the coordinator to other server using the same data. Another means is to use synchronous replication slave.
When a coordinator fails, you should tell other coordinators to remove the failed coordinator from your Postgres-XC cluster by "DROP NODE" statement. Also, to prevent failed node to accept any incoming transactions, you should be sure to kill it.
Datanode stores actual rows of tables. Please note that transactions which do not need data on crashed datanode can continue to run without any error. However, in the case of distributed tables, there are no replica of rows and you need backup of each datanode to handle datanode crash. We can use synchronous replication slave for this purpose. Of course, you can use traditional shared disk configuration.
Visit Datanode HA configuration for more details.