Last modified: Wed Jun 6 17:29:52 JST 2007

Welcome to pgpool -II page

(last update: 2007/02/12)
[Japanese page]

What is pgpool-II?

pgpool-II is a middleware that works between PostgreSQL servers and a PostgreSQL database client. It provides following features.

Limiting Excess Connections

There is a limit on the maximum number of concurrent connections with PostgreSQL, and connections are rejected after so many connections. Setting the maximum number of connections, however, increases resource consumption and affect system performance. pgpool-II also has a limit on the maximum number of connections, but extra connections will be queued instead of returning an error immediately.

Connection Pooling

pgpool-II saves connections to the PostgreSQL servers, and reuse them whenever a new connection with the same properties (i.e. username, database, protocol version) comes in. It reduces connection overhead, and improves system's overall throughput.

Replication

pgpool-II can manage multiple PostgreSQL servers. Using the replication function enables creating a realtime backup on 2 or more physical disks, so that the service can continue without stopping servers in case of a disk failure.

Load Balance

If a database is replicated, executing a SELECT query on any server will return the same result. pgpool-II takes an advantage of the replication feature to reduce the load on each PostgreSQL server by distributing SELECT queries among multiple servers, improving system's overall throughput. At the best, performance improves proportionally to the number of PostgreSQL servers. Load balance works the best in a situation where there are a lot of users executing many queries at the same time.

Parallel Query

Using the parallel query function, data can be divided among the multiple servers, so that a query can be executed on all the servers parallely to reduce the overall execution time. Parallel query works the best with searching large-scale data.

pgpool-II talks PostgreSQL's backend and frontend protocol, and relays a connection between them. Therefore, a database application (frontend) thinks that pgpool-II is the actual PostgreSQL server, and the server (backend) sees pgpool-II as one of clients. Because pgpool-II is transparent to both the server and the client, an existing database application can be used with pgpool-II almost without a change to its sources.

Supported Platforms

pgpool-II works on Linux, Solaris, FreeBSD, and most of the UNIX-like architectures. Windows is not supported. Supported PostgreSQL server's versions are 6.4 and higher. To use the parallel query, however, 7.4 and higher must be used.

pgpool-II Installation

Installing pgpool-II requires gcc 2.9 or higher, and GNU make. Also, pgpool-II links libpq library, so it must be installed on a machine used to build pgpool-II.

configure

After extracting the source tar ball, execute the configure script.

./configure

There are options that you can set if you want to set values other than the default.

--prefix=path
pgpool-II binaries and docs will be installed to this directory. Default value is /usr/local
--with-pgsql=path
Top directory where PostgreSQL's client libraries are installed. Default values is obtained by pg_config

make

make
make install

will install pgpool-II. (If you use FreeBSD, replace make with gmake)

Configuring pgpool-II

Configuration files for pgpool-II are /usr/local/etc/pgpool.conf and /usr/local/etc/pcp.conf by default. There are several operation modes in pgpool-II. Each mode has associated functions which can be enabled or disabled, and specific configuration parameters to control their behaviors.

Function/Mode	raw Mode	Connection Pool Mode	Replication Mode	Master/Slave Mode	Parallel Query Mode
Connection Pool	X	O	O	O	O
Replication	X	X	O	X	X
Load Balance	X	X	O	O	X
Degeneration	X	X	O	O	X
Failover	O	O	X	X	X
Parallel Query	X	X	X	X	O
Required # of Servers	1 or higher	1 or higher	2 or higher	2 or higher	2 or higher
System DB required?	no	no	no	no	yes

Configuring `pcp.conf`

pgpool-II provides the controll interface where an administrator can collect pgpool-II status, and terminate pgpool-II processes via network. The pcp.conf is the user/password file for authentication with the interface. All modes require the pcp.conf file to be set. After installing pgpool-II, $prefix/etc/pcp.conf.sample is created. Change the name of the file to pcp.conf and add your username and the password.

cp $prefix/etc/pcp.conf.sample $prefix/etc/pcp.conf

An empty line or a line starting with "#" is treated as a comment and will be ignored. The username and the password must be written on one line using the following format:

username:[password encrypted in md5]

[password encrypted in md5] can be produced with the $prefix/bin/pg_md5 command.

./pg_md5 foo
acbd18db4cc2f85cedef654fccc4a4d8

The pcp.conf file must be readable by the user who executes pgpool-II.

Configuring `pgpool.conf`

As described already, each operation mode has specific configuration parameters in pgpool.conf. After instaling pgpool-II, $prefix/etc/pgpool.conf.sample is created. Change the name of the file to pgpool.conf and edit the contents.

cp $prefix/etc/pgpool.conf.sample $prefix/etc/pgpool.conf

An empty line or a line starting with "#" is treated as a comment and will be ignored.

raw Mode

In the raw mode, clients simply connect to the PostgreSQL servers via pgpool-II. This mode is useful for simply limiting excess connections to the servers, or enabling failover with multiple servers.

listen_addresses

Specifies the address in hostname or IP address, which will be accepted by pgpool-II via TCP/IP network. '*' accepts all incoming connections. '' disables TCP/IP connections. Default is 'localhost'. Connections via UNIX domain socket are always accepted.

port

The port number where pgpool-II accepts connections. Default is 9999.

socket_dir

The directory path of the UNIX domain socket accepting connections for pgpool-II. Default is '/tmp'. Be aware that the socket might be deleted by cron. We recommend to set this value to '/var/run' or such directory.

pcp_port

The port number where PCP process accepts connections. Default is 9898.

pcp_socket_dir

The directory path of the UNIX domain socket accepting connections for PCP process. Default is '/tmp'. Be aware that the socket might be deleted by cron. We recommend to set this value to '/var/run' or such directory.

backend_socket_dir

The directory path of the PostgreSQL server's UNIX domain socket, which is used by pgpool-II to communicate with the server. Default is '/tmp'.

pcp_timeout

PCP connection timeout value in seconds. If a client doesn't respond within the set seconds, PCP closes the connection with the client. Default is 10 seconds. 0 means no timeout.

num_init_children

The number of preforked pgpool-II server processes. Default is 32. Please note that cancelling a query creates another connection to the backend; thus, a query cannot be cancelled if the connections are full. If you want to ensure that queries can be cancelled, set this values to twice the expected connections.

child_life_time

A pgpool-II child process' life time in seconds. When a child is idle for so many seconds, it is terminated and the new child will be created. This parameter is a measure to prevent memory leaks and other unexpected errors. Default values is 300 (5 minutes). 0 diables this function. Note that processes that have not accepted any connections are not applicable for this.

child_max_connections

A pgpool-II child process will be terminated after so many connections from clients. This parameter is useful on the server if it is too busy that child_life_time and connection_life_time are not effective.

logdir

The directory path of the logs created by pgpool-II. Currently, only the pgpool.pid file is created. Default is '/tmp'.

print_timestamp

Add timestamps to the logs when set to true. Default is true.

connection_cache

Caches connections when set to true. Default is true.

health_check_timeout

pgpool-II periodically tries to connect to the backends to detect any errors on the servers or networks. This error check procedure is called "health check". If an error is detected, pgpool-II tries to perform failover or degeneration. This parameter is to prevent the health check to wait for a long time in a case like network cable has been disconnected. The timeout values is in seconds. Default values is 20. 0 disables timeout (waits until TCP/IP timeout). The health check requires one (1) extra connection to each backend, so max_connections in the postgresql.conf needs to be incremented as needed.

health_check_period

This parameter specifies the interval between the health checks in seconds. Default is 0, which means health check is disabled.

health_check_user

The user name to perform health check. This user must exist in all the PostgreSQL backends.

ignore_leading_white_space

pgpool-II ignores white spaces at the beginning of SQL queries while in the load balance mode. It is useful for using APIs like DBI/DBD:Pg which adds white spaces against the user's will.

log_statement

Produces SQL log messages when true. This is similar to the log_statement option in PostgreSQL. It produces logs even if the debug option was not passed to pgpool-II at the startup.

log_hostname

If true, ps command status will show the client's hostname instead of an IP address. Also, if log_connections is enabled, hostname will be logged.

log_connections

If true, all incoming connections will be printed to the log.

enable_pool_hba

If true, use pool_hba.conf for client authentication. See Setting up pool_hba.conf for client authentication.

backend_hostname

Specifies the host name of the PostgreSQL backend. The empty string ('') means pgpool-II uses UNIX domain socket. Multiple backends can be specified by adding a number at the end of the parameter name (e.g.backend_hostname0). This number is referred to as "DB node ID", and it starts from 0. The backend which was given the DB node ID of 0 will be called "Master DB". When multiple backends are defined, the service can be continued even if the Master DB is down (not true in some modes). In this case, the youngest DB node ID alive will be the new Master DB.

If you plan to use only one PostgreSQL server, specify it by backend_hostname0.

backend_port

Specifies the port number of the backends. Multiple backends can be specified by adding a number at the end of the parameter name (e.g. backend_port0). If you plan to use only one PostgreSQL server, specify it by backend_port0.

backend_weight

Specifies the load balance ratio for the backends. Multiple backends can be specified by adding a number at the end of the parameter name (e.g. backend_weight). If you plan to use only one PostgreSQL server, specify it by backend_weight0. In the raw mode, set to 1.

Failover in the raw Mode

Failover can be performed in the raw mode if multiple servers are defined. pgpool-II usually accesses the backend specified by backend_hostname0 during the normal operation. If the backend_hostname0 fails for some reason, pgpool-II tries to access the backend specified by backend_hostname1. If that fails, pgpool-II tries the backend_hostname2, 3 and so on.

Connection Pool Mode

In the connection pool mode, all functions in raw mode and the connection pool function can be used. To enable this mode, set configuration parameters in the raw mode and the below.

max_pool

The maximum number of cached connections in the pgpool-II child processes. pgpool-II reuses the cached connection if an incoming connection is connecting to the same database by the same username. If not, pgpool-II creates a new connection to the backend. If the number of cached connections exceeds max_pool, the oldest connection will be discarded, and uses that slot for the new connection. Default value is 4. Please be aware that the number of connections from pgpool-II processes to the backend will be num_init_children * max_pool.

connection_life_time

Cached connections expiration time in seconds. The expired cached connection will be disconnected. Default is 0, which means the cached connections will not be disconnected.

reset_query_list

Specifies the SQL commands sent to the backend when exitting a session to reset the connection. Multiple commands can be specified by delimitting each by ";". Default is the following, but can be changed to suite your system.

      reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'

Commands differ in each PostgreSQL versions, so be careful especially in PostgreSQL 7.3 or prior. "ABORT" will not be executed outside the transaction block by PostgreSQL 7.4 or above.

Failover in the Connection Pool Mode

Failover in the connection pool mode is the same as in the raw mode.

Replication Mode

This mode enables data replication between the backends. The configuration parameters below must be set in addtion to everything above.

replication_mode

Setting true enables replication mode. Default is false.

replication_strict

When set to true, pgpool-II waits for the Master DB for all queries to be processed before moving on to the next backend. This option eliminates the risk of deadlocks, but the performance may degrade due to the lack of a parallelism. Default value is true.

replication_timeout

Specifies the timeout value in milliseconds, for detecting deadlocks while replication_strict is false. Default is 5000 (5 seconds), and 0 means no timeout.

load_balance_mode

When set to true, SELECT queries will be distributed to each backend for load balance. Default is false.

replication_stop_on_mismatch

When set to true, pgpool-II degenerates the backends and keeps the service only with the Master DB if data mismatch occurs. If false, pgpool-II just terminates the query. Default is false.

replicate_select

When set to true, pgpool-II replicate SELECTs. If false, pgpool-II only sends them to Master DB. Default is false.

insert_lock

Replicating a table with SERIAL data type, the SERIAL column value may differ between the backends. This problem is avoidable by locking the table explicitly (although, transactions' parallelism will be lost). To achieve this, however, the following change must be made:

      INSERT INTO ...

      BEGIN;
      LOCK TABLE ...
      INSERT INTO ...
      COMMIT;

Setting insert_lock to true, pgpool-II automatically adds the above queries each time INSERT is executed (if already in transaction, simply adds LOCK TABLE ....). The downside for this is that pgpool-II does not know which table has SERIAL columns, so all the INSERT statements will acquire table locks. To avoid this problem, there are two options:

set insert_lock to true, and add /*NO INSERT LOCK*/ at the beginning of an INSERT statement which you don't want to acquire the table lock.
set insert_lock to false, and add /*INSERT LOCK*/ at the beginning of an INSERT statement which you want to acquire the table lock.

Default value is false. If insert_lock is enabled, the regression test for PostgreSQL 8.0 will fail in transactions, privileges, rules and alter_table. The reasons for this is that pgpool-II tries to LOCK the VIEW for the rule test, and others will produce the following error message.

      ! ERROR: current transaction is aborted, commands ignored until
      end of transaction block

For example, the transactions test tries to INSERT into a table which does not exist, and pgpool-II causes PostgreSQL to acquire the lock before that. The transaction will be aborted, and the following INSERT statement produces the above error message.

condition for load balance

For the query to be load balanced, all the requirements below must be met:

PostgreSQL version 7.4 or later
the query must not be in the explicitly declared transaction (i.e. not in a BEGIN ~ END block)
it's not SELECT nextval or SELECT setval
it's not SELECT INTO
it's not SELECT FOR UPDATE
start with "SELECT" (ignore_leading_white_space = true will ignore leading white space)

Failover in the Replication Mode

pgpool-II degenerates a dead backend and continues the service. The service can be continued if there is at least one backend alive.

Master/Slave Mode

This mode is for using pgpool-II with another master/slave replication software (like Slony-I) making it do the actual data replication. DB nodes' information must be set as in the same manner as the replication mode. In addtion to that, set master_slave_mode and load_balance_mode to true. pgpool-II will send queries that need to be replicated to the Master DB, and others will be load balanced if possible.

In the master/slave mode, replication_mode must be set to false, and master_slave_mode to true.

Parallel Mode

This mode enables parallel execution of queries. The replication function or the load balance function can not be used while in this mode.

Configuring the System DB

To use the parallel mode, the System DB must be configured properly. The System DB maintains rules, in the format of the database table, for deciding an appropriate backend to send partitioned data. The System DB does not need to be placed on the same host as pgpool-II. The System DB's configuration is done in the pgpool.conf.

system_db_hostname: The hostname where the System DB is placed. Specifying the empty string ('') means the System DB is at the same host as the pgpool-II, and will be connected via UNIX domain socket.
system_db_port: The port number for the System DB
system_dbname: The partitioning rules and other information will be defined in the database specified here. Default value is 'pgpool'.
system_db_schema: The partitioning rules and other information will be defined in the schema specified here. Default value is 'pgpool_catalog'.
system_db_user: The username to connect to the System DB.
system_db_password: The password for the System DB. If no password is set, set the empty string ('').

Initial Configuration of the System DB

First, create the database and schema specified in the pgpool.conf file. An sample script can be found in $prefix/share/system_db.sql. If you have specified a different database name or schema, change them accordingly.

psql -f $prefix/share/system_db.sql pgpool

Registering a Partitioning Rule

The rules for data partitioning must be registered to pgpool_catalog.dist_def table.

CREATE TABLE pgpool_catalog.dist_def(
dbname TEXT,                                              -- database name
schema_name TEXT,                                         -- schema name
table_name TEXT,                                          -- table name
col_name TEXT NOT NULL CHECK (col_name = ANY (col_list)), -- partitioning key column name
col_list TEXT[] NOT NULL,                                 -- names of table attributes
type_list TEXT[] NOT NULL,                                -- types of table attributes
dist_def_func TEXT NOT NULL,                              -- name of the partitioning rule function
PRIMARY KEY (dbname,schema_name,table_name)
);

Here is the example for partitioning pgbench tables.

INSERT INTO pgpool_catalog.dist_def VALUES (
'pgpool',
'public',
'accounts',
'aid',
ARRAY['aid','bid','abalance','filler'],
ARRAY['integer','integer','integer','character(84)'],
'pgpool_catalog.dist_def_accounts'
);

INSERT INTO pgpool_catalog.dist_def VALUES (
'pgpool',
'public',
'branches',
'bid',
ARRAY['bid','bbalance','filler'],
ARRAY['integer','integer','character(84)'],
'pgpool_catalog.dist_def_branches'
);

INSERT INTO pgpool_catalog.dist_def VALUES (
'pgpool',
'public',
'tellers',
'tid',
ARRAY['tid','bid','tbalance','filler'],
ARRAY['integer','integer','integer','character(84)'],
'pgpool_catalog.dist_def_tellers'
);

The partitioning rule function (namely, pgpool_catalog.dist_def_accounts, pgpool_catalog.dist_def_branches, pgpool_catalog.dist_def_tellers) takes a value for the partitioning key column, and returns the corresponding DB node ID. Note that ID must start from 0. Below is the example of each function for pgbench.

CREATE OR REPLACE FUNCTION pgpool_catalog.dist_def_accounts (val ANYELEMENT) RETURNS INTEGER AS '
SELECT CASE WHEN $1 >= 1 and $1 <= 30000 THEN 0
WHEN $1 > 30000 and $1 <= 60000 THEN 1
ELSE 2
END' LANGUAGE SQL;

CREATE OR REPLACE FUNCTION pgpool_catalog.dist_def_branches (val ANYELEMENT) RETURNS INTEGER AS '
SELECT 0
' LANGUAGE SQL;

CREATE OR REPLACE FUNCTION pgpool_catalog.dist_def_tellers (val ANYELEMENT) RETURNS INTEGER AS '
SELECT CASE WHEN $1 >= 1 and $1 <= 3 THEN 0
WHEN $1 > 3 and $1 <= 6 THEN 1
ELSE 2
END' LANGUAGE SQL;

Setting up pool_hba.conf for client authentication (HBA)

Just like pg_hba.conf with PostgreSQL, pgpool supports a similar client authentication function using a configuration file called "pool_hba.conf".

When you install pgpool, pool_hba.conf.sample will be installed in "/usr/local/etc", which is the default directory for configuration files. Copy pool_hba.conf.sample as pool_hba.conf and edit it if necessary. By default, pool_hba authentication is enabled. See "6. Setting up pgpool.conf" for more detail.

The format of pool_hba.conf file follows very closely to pg_hba.conf.

    local      DATABASE  USER  METHOD  [OPTION]
    host       DATABASE  USER  CIDR-ADDRESS  METHOD  [OPTION]

See "pool_hba.conf.sample" for details of each field.

Here are the limitations of pool_hba.

"hostssl" connection type is not supported

"hostssl" is invalid since pgpool currently does not support SSL connections.

"samegroup" for DATABASE field is not supported

Since pgpool does not know anything about users in the backend server, database name is simply checked against the entries in the DATABASE field of pool_hba.conf.

group names following "+" for USER field is not supported

This is the same reason as in the "samegroup" described above. A user name is simply checked against the entries in the USER field of pool_hba.conf.

IPv6 for IP address/mask is not supported

pgpool currently does not support IPv6.

Only "trust", "reject" and "pam" for METHOD field are supported

Again, this is the same reason as in the "samegroup" described above. pgpool does not hold user/password information.

Note that everything described in this section is about a client authen- tication between a client and pgpool; a client still have to go through an authentication process with PostgreSQL. As far as pool_hba is concerned, it does not matter if a user name and/or database name given by a client (i.e. psql -U testuser testdb) really exist in the backend. pool_hba only cares if a match in the pool_hba.conf is found or not.

PAM authenticaion is supported using user information on the host where pgpool is executed. To enable PAM support in pgpool, specify "--with-pam" option to configure:

    configure --with-pam

To enable PAM authentication, you need to create a service-configuration file for pgpool in the system's PAM configuration directory (which is usually at "/etc/pam.d"). A sample service-configuration file is installed as "share/pgpool.pam" under the install directory.

Starting/Stopping pgpool-II

All the backends and the System DB (if necessary) must be started before starting pgpool-II.

pgpool [-c][-f config_file][-a hba_file][-F pcp_config_file][-n][-d]

-c	deletes query cache
-f config_file	specifies pgpool.conf
-a hba_file	specifies pool_hba.conf
-F pcp_config_file	specifies pcp.conf
-n	no daemon mode (terminal is not detached)
-d	debug mode

There are two ways to stop pgpool-II. One is via PCP command (described later) or pgpool-II command. Below is an example of the pgpool-II command.

pgpool [-f config_file][-F pcp_config_file] [-m {s[mart]|f[ast]|i[mmediate]}] stop

`-m s[mart]`	waits for clients to disconnect, and shutdown (default)
`-m f[ast]`	does not wait for clients; shutdown immediately
`-m i[mmediate]`	the same as `'-m f'`

Restrictions

Authentication / Access Controls

In the replication mode or master/slave mode, trust, clear text password, pam methods are supported.
In all the other modes, trust, clear text password, crypt, md5, pam methods are supported.
pgpool-II does not support pg_hba.conf-like access controls. If the TCP/IP connection is enabled, pgpool-II accepts all the connections from any host. If needed, use iptables and such to control access from other hosts. (PostgreSQL server accepting pgpool-II connections can use pg_hba.conf, of course).

Functions, etc. in the Replication Mode

There is no guarantee that the data, which returns different values each time even though the same query was executed (e.g. random number, transaction ID, OID, SERIAL, sequence, CURRENT_TIMESTAMP), will be replicated correctly on multiple backends.

Tables created by CREATE TEMP TABLE will not be deleted after exitting a session. It is because of the connetion pooling which, from the backend's point of view, seems that the session is still alive. To avoid this, you must explicitly drop the temporally tables by DROP TABLE, or use CREATE TEMP TABLE ... ON COMMIT DROP inside the transaction block.

Queries

Here are the queries which cannot be processed by pgpool-II

INSERT (for parallel mode)

You cannot use DEFAULT to the partitioning key column. For example, if the column x in the table t was the partitioning key column,

INSERT INTO t(x) VALUES (DEFAULT);

is invalid. Also, functions cannot be used as the value either.

INSERT INTO t(x) VALUES (func());

Constant values must be used to INSERT into the partitioning key. SELECT INTO and INSERT INTO ... SELECT are also not supported.

UPDATE (for parallel mode)

Data consistency between the backends may be lost if the partitioning key column values are updated. pgpool-II does not re-partition the updated data.

A transaction cannot be rolled back if a query has caused an error on some backends due to the constraint violation.

If a function is called in the WHERE clause, that query might not be executed correctly.

e.g.)  UPDATE branches set bid = 100 where bid = (select max(bid) from beances);

SELECT ... FOR UPDATE (for parallel mode)

If a function is called in the WHERE clause, that query might not be executed correctly.

e.g.)  SELECT * FROM  branches where bid = (select max(bid) from beances) FOR UPDATE;

COPY (for parallel mode)

COPY BINARY is not supported. Copying from files are also not supported. Only COPY FROM STDIN and COPY TO STDOUT is supported.

ALTER/CREATE TABLE (for parallel mode)

To update the partitioning rule, pgpool-II must be restarted in order to read them from the System DB.

Transaction (for parallel mode)

SELECT statements executed inside a transaction block will be executed in the separate transaction. Here is an example:

BEGIN;
INSERT INTO t(a) VALUES (1);
SELECT * FROM t ORDER BY a; <-- INSERT above is not visible from this SELECT statement
END;

Views / Rules (for parallel mode)

The same definition will be created on all the backends for views and rules.

SELECT * FROM a, b where a.i = b.i

JOINs like above will be executed within one backend, and then merged with the results returned by each backend. Views and Rules that joins across the nodes cannot be created.

Functions / Triggers (for parallel mode)

The same definition will be created on all the backends for functions. Joins across the nodes, and data on the other nodes cannot be manipulated inside the functions.

Extended Query Protocol (for parallel mode)

The extended query protocol used by JDBC drivers, etc. is not supported. The simple query protocols must be used.

Multi-byte Characters (for all modes)

pgpool-II does not translate between different multi-byte characters. The encoding for the client, backend and System DB must be the same.

Deadlocks (for parallel mode)

Deadlocks across the backends cannot be detected. For example:

(tellers table is partitioned using the following rule)
  tid <= 10  --> node 0
  tid >= 10  --> node 1

A) BEGIN;
B) BEGIN;
A) SELECT * FROM tellers WHERE tid = 11 FOR UPDATE;
B) SELECT * FROM tellers WHERE tid = 1 FOR UPDATE;
A) SELECT * FROM tellers WHERE tid = 1 FOR UPDATE;
B) SELECT * FROM tellers WHERE tid = 11 FOR UPDATE;

In the case above, a single node cannot detect the deadlock, so pgpool-II will continually wait for the response. This phenomenon occurs with other queries that acquire row level locks. To avoid this problem, set replication_timeout.

Also, if a deadlock occurs in one node, transaction states in each node will not be consistent. Therefore, pgpool-II terminates the process if a deadlock is detected.

pool_read_kind: kind does not match between master(84) slot[1] (69)

Schemas (for parallel mode)

Objects in a schema other than public must be fully referenced like:

schema.object

pgpool-II cannot resolve the correct schema when the path was set as follows:

set search_path = xxx

and the schema name was omitted in a query.

System DB

Partitioning Rules

Only one (1) partitioning key column can be defined in one partitioning rule. Conditions like 'x or y' are not supported.

Environment Requirements

libpq

libpq is linked while building pgpool-II. libpq version must be 3.0. Building pgpool-II with libpq version 2.0 will fail. Also, the System DB must be PostgreSQL 7.4 or later.

Query Cache

Currently, query cache must be deleted manually. pgpool-II does not invalidate old query cache automatically when the data is updated.

Compatibility with pgpool

References

PCP Command Reference

PCP Command List

PCP commands are UNIX commands which manipulate pgpool-II via network.

* pcp_node_count        - retrives the number of nodes
* pcp_node_info         - retrives the node information
* pcp_proc_count        - retrives the process list
* pcp_proc_info         - retrives the process information
* pcp_systemdb_info     - retrives the System DB information
* pcp_detach_node       - detaches a node from pgpool-II
* pcp_attach_node       - attaches a node to pgpool-II
* pcp_stop_pgpool       - stops pgpool-II

Common Command-line Arguments

There are five arguments common to all of the PCP commands. They are information about pgpool-II and for authentication. Extra arguments may be needed for some commands.

e.g.)  $ pcp_node_count 10 localhost 9898 postgres hogehoge

First argument    - timeout value in seconds. PCP disconnects if
                    pgpool-II does not respond in so many seconds.
Second argument   - pgpool-II's hostname
Third argument    - pgpool-II's port number for PCP server
Fourth argument   - PCP username
Fifth argument    - PCP password

PCP usernames and passwords must use ones in the pcp.conf in $prefix/etc directory. -F option can be used when starting pgpool-II if pcp.conf is placed somewhere else. The password does not need to be in md5 format when passing it to the PCP commands.

PCP Commands

All PCP commands display the results to the standard output.

pcp_node_count

Format:
pcp_node_count  _timeout_  _host_  _port_  _userid_  _passwd_

Displays the number of total nodes defined in pgpool.conf. It does not distinguish nodes status between attached/detached. ALL nodes are counted.

pcp_node_info

Format:
pcp_node_info  _timeout_  _host_  _port_  _userid_  _passwd_  _nodeid_

Displays the information on the given node ID. The output example is as follows:

$ pcp_node_info 10 localhost 9898 postgres hogehoge 0
host1 5432 1 1073741823.500000

The result is in the following order: 1. hostname 2. port number 3. status 4. load balance weight Status is represented by a digit from [0 to 3]. 0 - This state is only used during the initialization. PCP will never display it. 1 - Node is up. No connections yet. 2 - Node is up. Connections are pooled. 3 - Node is down. The load balance weight is displayed in normalized format.

Specifying an invalid node ID will result in error with exit status 12, and BackendError will be displayed.

pcp_proc_count

Format:
pcp_proc_count  _timeout_  _host_  _port_  _userid_  _passwd_

Displays the list of pgpool-II child process IDs. If there is more than one process, IDs will be delimitted by a white space.

pcp_proc_info

Format:
pcp_proc_info  _timeout_  _host_  _port_  _userid_  _passwd_  _processid_

Displays the information on the given pgpool-II child process ID. The output example is as follows:

$ pcp_proc_info 10 localhost 9898 postgres hogehoge 3815
postgres_db postgres 1150769932 1150767351 3 0 1

The result is in the following order: 1. connected database name 2. connected username 3. process start-up timestamp 4. connection created timestamp 5. protocol major version 6. protocol minor version 7. connection-reuse counter If there is no connection to the backends, nothing will be displayed. If there are multiple connections, one connection's information will be this displayed on each line for multiple times. Timestamps are displayed in EPOCH format.

Specifying an invalid node ID will result in error with exit status 12, and BackendError will be displayed.

pcp_systemdb_info

Format:
pcp_systemdb_info  _timeout_  _host_  _port_  _userid_  _passwd_

Displays the System DB information. The output example is as follows:

$ pcp_systemdb_info 10 localhost 9898 postgres hogehoge
localhost 5432 yamaguti '' pgpool_catalog pgpool 3
yamaguti public accounts aid 4 aid bid abalance filler integer integer integer character(84) dist_def_accounts
yamaguti public branches bid 3 bid bbalance filler integer integer character(84) dist_def_branches
yamaguti public tellers bid 4 tid bid tbalance filler integer integer integer character(84) dist_def_tellers

First, the System DB information will be displayed on the first line. The result is in the following order: 1. hostname 2. port number 3. username 4. password. '' for no password. 5. schema name 6. database name 7. number of partioning rules defined Second, partioning rules will be displayed on the following lines. If there are multiple definitions, one definition will be displayed on each line for multiple times. The result is in the following order: 1. targeted partitioning database name 2. targeted partitioning schema name 3. targeted partitioning table name 4. partitioning key column name 5. number of columns in the targeted table 6. column names (displayed as many as said in 5.) 7. column types (displayed as many as said in 5.) 8. partitioning rule function name If the System DB is not defined (i.e. not in pgpool-II mode, and query cache is disabled), it results in error with exit status 12, and BackendError will be displayed.

pcp_detach_node

Format:
pcp_detach_node  _timeout_  _host_  _port_  _userid_  _passwd_  _nodeid_

Detaches the given node from pgpool-II.

pcp_attach_node

Format:
pcp_attach_node  _timeout_  _host_  _port_  _userid_  _passwd_  _nodeid_

Attaches the given node to pgpool-II.

pcp_stop_pgpool

Format:
pcp_stop_pgpool  _timeout_  _host_  _port_  _userid_  _passwd_  _mode_

Terminate pgpool-II process with the given shutdown mode. The availabe modes are as follows: s - smart mode f - fast mode i - immediate mode If pgpool-II process does not exist, it results in error with exit status 8, and ConnectionError will be displayed. * Currently, there is no difference between the fast and immediate mode. pgpool-II terminates all the processes wheter there are clients connected to the backends.

Exit Status

PCP command exits with status 0 when everything goes well. If any error had occured, it will exit with the following error status.

UNKNOWNERR      1      Unknown Error (shouldn't occur)
EOFERR          2      EOF Error
NOMEMERR        3      Memory shortage
READERR         4      Error while reading from the server
WRITEERR        5      Error while writing to the server
TIMEOUTERR      6      Timeout
INVALERR        7      Argument(s) to the PCP command was invalid
CONNERR         8      Server connection error
NOCONNERR       9      No connection exists
SOCKERR         10     Socket error
HOSTERR         11     Hostname resolution error
BACKENDERR      12     PCP process error on the server (specifying an invalid ID, etc.)
AUTHERR         13     Authorization failure

What is pgpool-II?

Supported Platforms

pgpool-II Installation

Configuring pgpool-II

Configuring pcp.conf

Configuring pgpool.conf

raw Mode

Failover in the raw Mode

Connection Pool Mode

Failover in the Connection Pool Mode

Replication Mode

condition for load balance

Failover in the Replication Mode

Master/Slave Mode

Parallel Mode

Configuring the System DB

Initial Configuration of the System DB

Registering a Partitioning Rule

Setting up pool_hba.conf for client authentication (HBA)

Starting/Stopping pgpool-II

Restrictions

Authentication / Access Controls

Functions, etc. in the Replication Mode

Queries

INSERT (for parallel mode)

UPDATE (for parallel mode)

SELECT ... FOR UPDATE (for parallel mode)

COPY (for parallel mode)

ALTER/CREATE TABLE (for parallel mode)

Transaction (for parallel mode)

Views / Rules (for parallel mode)

Functions / Triggers (for parallel mode)

Extended Query Protocol (for parallel mode)

Multi-byte Characters (for all modes)

Deadlocks (for parallel mode)

Schemas (for parallel mode)

System DB

Partitioning Rules

Environment Requirements

libpq

Query Cache

Compatibility with pgpool

References

PCP Command Reference

PCP Command List

Common Command-line Arguments

PCP Commands

pcp_node_count

pcp_node_info

pcp_proc_count

pcp_proc_info

pcp_systemdb_info

pcp_detach_node

pcp_attach_node

pcp_stop_pgpool

Exit Status

Configuring `pcp.conf`

Configuring `pgpool.conf`