Koichi Suzuki
Sep.18, 2012


Clustering GTM is required by some of community members for the following purpose.

  1. Achieve more scalability
  2. Distribution among different location

GTM is only one central facility to maintain database integrity among the cluster and was considered not suitable to cluster itself into distributed fashion. This article proposes how to clusetr GTM itself with simple assumption.


We assume that following in the application above.

  1. Divide whole Postgres-XC cluster into more than one sub-cluster which consists of GTM, GTM-proxy, coordinator and datanode so that each of sub-cluster can be operated solely.
  2. Guarantee to maintain whole database integrity among such sub-clusters.
  3. Application can generally (not completely) be partitioned so that most of the transactions are involved by local coordinator and datanode within a sub-cluster.

The last assumption is not critical but important to provide specific algorithm to maintain performance reasonably.

How to do itEdit


We divide Postgres-XC cluseter into more than one sub-clusters. Each of them consists of GTM and other Postgres-XC component.


Now we introduce new global entity called “G-GTM” (global gtm) on top of GTMs. G-GTM provides each GTM the range of GXID. Each GTM requests G-GTM for GXID range, provide GXID values to transactions withing this range. If the range runs out, then GTM asks next range to G-GTM to continue its operation.

Each GTM also maintain transaction status as well as snapshot within its sub-cluster, except for occasional external transaction as described below.


Each transaction should be aware which sub-cluster it should access. Within local sub-cluster, operation is the same as the current implementation. If the transaction should read or write data outside its sub-cluster, it should do the following:

  1. Tell GTM of the other sub-cluster that new transaction is beginning. In this case GTM in the other sub-cluster is given GXID value from the transaction and maintain this as running.
  2. The transaction obtains snapshot both from local sub-cluster and external cluster, combine them to determine the visibility. At remote sub-cluster, it does not have to worry about the snapshot in local sub-cluster. If any other transactions in the local sub-cluster is involved by this external sub-cluster, their GXID will be reported to the external sub-cluster's GTM and will be included in the snapshot from this external sub-cluster.
  3. When the transaction finishes, it will also report the external sub-cluster's GTM.

In such way, we can maintain database integrity among sub-clusters. Because most of the transactions are local to its sub-cluster, it is just occasional when a transaction is involved by the data in external sub-clusters and we will not have serious performance penalty by this.

GTM should be extended so that it can receive some range of GXID and renew it. Coordinator should be extended too so that it recognizes what components are within the local subcluster and whichi is not, then connect to GTM for external subcluster to begin/end the transaction. Of course, in this case, implicit 2PC must be used.

Because each subcluster's transaction is controlled by its own GTM, there could be slight difference when commit is visible from subcluster to subcluster. Further analysis may be needed about this influence to applications.

Other things to doEdit

The following capabilities are needed to implement thiss.

  1. Component catalog
    We need to know each node inside and outside the sub-cluster. There are some possible options to implement this and this can be left to further study and design.
  2. Sub-cluster catalog
    Need sub-cluster information to determine which sub-cluster each transaction should access. This information should contain the followings:
    1. GTM to be involved
    2. Set of nodes, that is, coordinators and datanodes.

We need further study if GTM information in the other cluster should be provided by coordinator, gtm_proxy or gtm.

  1. Large gap in GXID in many scenes.
    At present, PostgreSQL base implementation does not allow big gap in XID in many scenes, including starting slave. We need to fix this. CLOG maintenance is another case.
  2. Recent xmin caliculation
    Because any external transaction can come at any time and its GXID can be smaller than local recent xmin, recent xmin could not be local. GTM should receive recent xmin value from other GTMs, could be through G-GTM. Update of this information should not be done in synchronized manner but should be done periodically to reduce communication among GTMs. Details of this should is also left to further study and design.
  3. GXID wraparound
    After GXID wraps around locally, external transaction's GXID may hot have wrapped around. Should be able to handle this corner case (this is also for single sub-cluster case, though).

Distribution Among Subcluster and DDLEdit

You will notice that there is not restriction in distributing and replicating tables among subclusters. You should be careful in designing transaction and tables so that most of the transactions are local to subcluster to maintain performance. DDL may need some extension to handle subclusters.

Remaining workEdit

I believe this can be used as a basis of XC sub-clustering. Hope other community members are involved in this and provide more detailed study/design.


Conceptual diagram is shown below.