What is pgpool-II?
pgpool-II is a middleware that works between PostgreSQL servers
and a PostgreSQL database client. It provides following features.
- Limiting Excess Connections
There is a limit on the maximum number of concurrent
connections with PostgreSQL, and connections are rejected after so
many connections. Setting the maximum number of connections, however,
increases resource consumption and affect system
performance. pgpool-II also has a limit on the maximum number of
connections, but extra connections will be queued instead of returning
an error immediately.
- Connection Pooling
pgpool-II saves connections to the PostgreSQL servers, and reuse
them whenever a new connection with the same properties
(i.e. username, database, protocol version) comes in. It reduces
connection overhead, and improves system's overall throughput.
- Replication
pgpool-II can manage multiple PostgreSQL servers. Using the
replication function enables creating a realtime backup on 2 or
more physical disks, so that the service can continue without stopping
servers in case of a disk failure.
- Load Balance
If a database is replicated, executing a SELECT query on any
server will return the same result. pgpool-II takes an advantage of
the replication feature to reduce the load on each PostgreSQL server
by distributing SELECT queries among multiple servers, improving
system's overall throughput. At the best, performance improves
proportionally to the number of PostgreSQL servers. Load balance works
the best in a situation where there are a lot of users executing many
queries at the same time.
- Parallel Query
Using the parallel query function, data can be divided
among the multiple servers, so that a query can be executed on all the
servers parallely to reduce the overall execution time. Parallel query
works the best with searching large-scale data.
pgpool-II talks PostgreSQL's backend and frontend protocol, and
relays a connection between them. Therefore, a database application
(frontend) thinks that pgpool-II is the actual PostgreSQL server, and
the server (backend) sees pgpool-II as one of clients. Because
pgpool-II is transparent to both the server and the client, an
existing database application can be used with pgpool-II almost
without a change to its sources.
Supported Platforms
pgpool-II works on Linux, Solaris, FreeBSD, and most of the
UNIX-like architectures. Windows is not supported. Supported
PostgreSQL server's versions are 6.4 and higher. To use the parallel
query, however, 7.4 and higher must be used.
pgpool-II Installation
Installing pgpool-II requires gcc 2.9 or higher, and GNU
make. Also, pgpool-II links libpq library, so it must be installed on
a machine used to build pgpool-II.
- configure
-
After extracting the source tar ball, execute the configure script.
./configure
There are options that you can set if you want to set values other
than the default.
--prefix=path
pgpool-II binaries and docs will be installed to this
directory. Default value is /usr/local
--with-pgsql=path
Top directory where PostgreSQL's client libraries are
installed. Default values is obtained by pg_config
- make
-
make
make install
will install pgpool-II. (If you use FreeBSD, replace make with gmake)
Configuring pgpool-II
Configuration files for pgpool-II are
/usr/local/etc/pgpool.conf and
/usr/local/etc/pcp.conf by default. There are several
operation modes in pgpool-II. Each mode has associated functions which
can be enabled or disabled, and specific configuration parameters to
control their behaviors.
Function/Mode |
raw Mode |
Connection Pool Mode |
Replication Mode |
Master/Slave Mode |
Parallel Query Mode |
Connection Pool |
X |
O |
O |
O |
O |
Replication |
X |
X |
O |
X |
X |
Load Balance |
X |
X |
O |
O |
X |
Degeneration |
X |
X |
O |
O |
X |
Failover |
O |
O |
X |
X |
X |
Parallel Query |
X |
X |
X |
X |
O |
Required # of Servers |
1 or higher |
1 or higher |
2 or higher |
2 or higher |
2 or higher |
System DB required? |
no |
no |
no |
no |
yes |
Configuring pcp.conf
pgpool-II provides the controll interface where an administrator
can collect pgpool-II status, and terminate pgpool-II processes via
network. The pcp.conf is the user/password file for
authentication with the interface. All modes require the
pcp.conf file to be set. After installing pgpool-II,
$prefix/etc/pcp.conf.sample is created. Change the name
of the file to pcp.conf and add your username and the
password.
cp $prefix/etc/pcp.conf.sample $prefix/etc/pcp.conf
An empty line or a line starting with "# " is treated as a
comment and will be ignored. The username and the password must be
written on one line using the following format:
username:[password encrypted in md5]
[password encrypted in md5] can be produced with the
$prefix/bin/pg_md5 command.
./pg_md5 foo
acbd18db4cc2f85cedef654fccc4a4d8
The pcp.conf file must be readable by the user who
executes pgpool-II.
Configuring pgpool.conf
As described already, each operation mode has specific
configuration parameters in pgpool.conf . After instaling
pgpool-II, $prefix/etc/pgpool.conf.sample is
created. Change the name of the file to pgpool.conf and
edit the contents.
cp $prefix/etc/pgpool.conf.sample $prefix/etc/pgpool.conf
An empty line or a line starting with "#" is treated as a
comment and will be ignored.
raw Mode
In the raw mode, clients simply connect to the PostgreSQL servers
via pgpool-II. This mode is useful for simply limiting excess
connections to the servers, or enabling failover with multiple
servers.
- listen_addresses
-
Specifies the address in hostname or IP address, which will
be accepted by pgpool-II via TCP/IP network. '*' accepts
all incoming connections. '' disables TCP/IP
connections. Default is 'localhost' . Connections via UNIX
domain socket are always accepted.
- port
-
The port number where pgpool-II accepts connections. Default
is 9999.
- socket_dir
-
The directory path of the UNIX domain socket accepting
connections for pgpool-II. Default is '/tmp' . Be
aware that the socket might be deleted by cron. We recommend to
set this value to '/var/run' or such directory.
- pcp_port
-
The port number where PCP process accepts
connections. Default is 9898.
- pcp_socket_dir
-
The directory path of the UNIX domain socket accepting
connections for PCP process. Default is '/tmp' . Be
aware that the socket might be deleted by cron. We recommend to
set this value to '/var/run' or such directory.
- backend_socket_dir
-
The directory path of the PostgreSQL server's UNIX domain
socket, which is used by pgpool-II to communicate with the
server. Default is '/tmp' .
- pcp_timeout
- p
PCP connection timeout value in seconds. If a client doesn't
respond within the set seconds, PCP closes the connection with
the client. Default is 10 seconds. 0 means no timeout.
- num_init_children
-
The number of preforked pgpool-II server processes. Default
is 32. Please note that cancelling a query creates another
connection to the backend; thus, a query cannot be cancelled if
the connections are full. If you want to ensure that queries can
be cancelled, set this values to twice the expected connections.
- child_life_time
-
A pgpool-II child process' life time in seconds. When a child
is idle for so many seconds, it is terminated and the new child
will be created. This parameter is a measure to prevent memory
leaks and other unexpected errors. Default values is 300 (5
minutes). 0 diables this function. Note that processes that have
not accepted any connections are not applicable for this.
- child_max_connections
-
A pgpool-II child process will be terminated after so many
connections from clients. This parameter is useful on the server
if it is too busy that child_life_time and connection_life_time
are not effective.
- logdir
-
The directory path of the logs created by
pgpool-II. Currently, only the pgpool.pid file is
created. Default is '/tmp' .
- print_timestamp
-
Add timestamps to the logs when set to true. Default is
true.
- connection_cache
-
Caches connections when set to true. Default is
true.
- health_check_timeout
-
pgpool-II periodically tries to connect to the backends to
detect any errors on the servers or networks. This error check
procedure is called "health check". If an error is detected,
pgpool-II tries to perform failover or degeneration.
This parameter is to prevent the health check to wait for a long
time in a case like network cable has been disconnected. The
timeout values is in seconds. Default values is 20. 0 disables
timeout (waits until TCP/IP timeout).
The health check requires one (1) extra connection to each
backend, so max_connections in the
postgresql.conf needs to be incremented as
needed.
- health_check_period
-
This parameter specifies the interval between the health
checks in seconds. Default is 0, which means health check is
disabled.
- health_check_user
-
The user name to perform health check. This user must exist
in all the PostgreSQL backends.
- ignore_leading_white_space
-
pgpool-II ignores white spaces at the beginning of SQL
queries while in the load balance mode. It is useful for using
APIs like DBI/DBD:Pg which adds white spaces against the user's
will.
- log_statement
-
Produces SQL log messages when true. This is similar to the
log_statement option in PostgreSQL. It produces logs even if the
debug option was not passed to pgpool-II at the startup.
- log_hostname
-
If true, ps command status will show the client's hostname instead
of an IP address. Also, if log_connections is enabled, hostname will
be logged.
- log_connections
-
If true, all incoming connections will be printed to the log.
- enable_pool_hba
-
If true, use pool_hba.conf for client authentication. See
Setting up pool_hba.conf for client authentication.
- backend_hostname
-
Specifies the host name of the PostgreSQL backend. The empty
string ('' ) means pgpool-II uses UNIX domain
socket.
Multiple backends can be specified by adding a number at the end
of the parameter name (e.g.backend_hostname0 ). This
number is referred to as "DB node ID", and it starts from 0. The
backend which was given the DB node ID of 0 will be called
"Master DB". When multiple backends are defined, the service can
be continued even if the Master DB is down (not true in some
modes). In this case, the youngest DB node ID alive will be the
new Master DB.
If you plan to use only one PostgreSQL server, specify it by
backend_hostname0 .
- backend_port
-
Specifies the port number of the backends. Multiple backends
can be specified by adding a number at the end of the parameter
name (e.g. backend_port0 ). If you plan to use only
one PostgreSQL server, specify it by
backend_port0 .
- backend_weight
-
Specifies the load balance ratio for the backends. Multiple
backends can be specified by adding a number at the end of the
parameter name (e.g. backend_weight ). If you plan
to use only one PostgreSQL server, specify it by
backend_weight0 . In the raw mode, set to 1.
Failover in the raw Mode
Failover can be performed in the raw mode if multiple servers are
defined. pgpool-II usually accesses the backend specified by
backend_hostname0 during the normal operation. If the
backend_hostname0 fails for some reason, pgpool-II tries to access the
backend specified by backend_hostname1. If that fails, pgpool-II tries
the backend_hostname2, 3 and so on.
Connection Pool Mode
In the connection pool mode, all functions in raw mode and the
connection pool function can be used. To enable this mode, set
configuration parameters in the raw mode and the below.
- max_pool
-
The maximum number of cached connections in the pgpool-II
child processes. pgpool-II reuses the cached connection if an
incoming connection is connecting to the same database by the
same username. If not, pgpool-II creates a new connection to the
backend. If the number of cached connections exceeds max_pool,
the oldest connection will be discarded, and uses that slot for
the new connection.
Default value is 4. Please be aware that the number of
connections from pgpool-II processes to the backend will be
num_init_children * max_pool .
- connection_life_time
-
Cached connections expiration time in seconds. The expired
cached connection will be disconnected. Default is 0, which
means the cached connections will not be disconnected.
- reset_query_list
-
Specifies the SQL commands sent to the backend when exitting
a session to reset the connection. Multiple commands can be
specified by delimitting each by ";". Default is
the following, but can be changed to suite your system.
reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
Commands differ in each PostgreSQL versions, so be careful
especially in PostgreSQL 7.3 or prior. "ABORT" will not be
executed outside the transaction block by PostgreSQL 7.4 or
above.
Failover in the Connection Pool Mode
Failover in the connection pool mode is the same as in the raw mode.
Replication Mode
This mode enables data replication between the backends. The
configuration parameters below must be set in addtion to everything above.
- replication_mode
-
Setting true enables replication mode. Default
is false.
- replication_strict
-
When set to true, pgpool-II waits for the Master
DB for all queries to be processed before moving on to the next
backend. This option eliminates the risk of deadlocks, but the
performance may degrade due to the lack of a
parallelism. Default value is true.
- replication_timeout
-
Specifies the timeout value in milliseconds, for detecting
deadlocks while replication_strict is
false. Default is 5000 (5 seconds), and 0 means no
timeout.
- load_balance_mode
-
When set to true, SELECT queries will be
distributed to each backend for load balance. Default is
false.
- replication_stop_on_mismatch
-
When set to true, pgpool-II degenerates the
backends and keeps the service only with the Master DB if data
mismatch occurs. If false, pgpool-II just
terminates the query. Default is false.
- replicate_select
-
When set to true, pgpool-II replicate SELECTs. If false,
pgpool-II only sends them to Master DB. Default is false.
- insert_lock
-
Replicating a table with SERIAL data type, the SERIAL column
value may differ between the backends. This problem is avoidable by
locking the table explicitly (although, transactions' parallelism will
be lost). To achieve this, however, the following change must be made:
INSERT INTO ...
to
BEGIN;
LOCK TABLE ...
INSERT INTO ...
COMMIT;
Setting insert_lock to true, pgpool-II
automatically adds the above queries each time INSERT is
executed (if already in transaction, simply adds LOCK TABLE
....).
The downside for this is that pgpool-II does not know which
table has SERIAL columns, so all the INSERT statements will
acquire table locks.
To avoid this problem, there are two options:
- set
insert_lock to true, and add /*NO
INSERT LOCK*/ at the beginning of an INSERT
statement which you don't want to acquire the table
lock.
- set
insert_lock to false, and add
/*INSERT LOCK*/ at the beginning of an INSERT
statement which you want to acquire the table lock.
Default value is false. If insert_lock is enabled,
the regression test for PostgreSQL 8.0 will fail in transactions,
privileges, rules and alter_table. The reasons for this is that
pgpool-II tries to LOCK the VIEW for the rule test, and others will
produce the following error message.
! ERROR: current transaction is aborted, commands ignored until
end of transaction block
For example, the transactions test tries to INSERT into a table
which does not exist, and pgpool-II causes PostgreSQL to acquire the
lock before that. The transaction will be aborted, and the following
INSERT statement produces the above error message.
condition for load balance
For the query to be load balanced, all the requirements below
must be met:
- PostgreSQL version 7.4 or later
- the query must not be in the explicitly declared
transaction (i.e. not in a BEGIN ~ END block)
- it's not SELECT nextval or SELECT setval
- it's not SELECT INTO
- it's not SELECT FOR UPDATE
- start with "SELECT" (ignore_leading_white_space = true will
ignore leading white space)
Failover in the Replication Mode
pgpool-II degenerates a dead backend and continues the
service. The service can be continued if there is at least one backend
alive.
Master/Slave Mode
This mode is for using pgpool-II with another master/slave
replication software (like Slony-I) making it do the actual data
replication. DB nodes' information must be set as in the same manner
as the replication mode. In addtion to that, set
master_slave_mode and load_balance_mode to
true. pgpool-II will send queries that need to be replicated to the
Master DB, and others will be load balanced if possible.
In the master/slave mode, replication_mode must be set
to false, and master_slave_mode to true.
Parallel Mode
This mode enables parallel execution of queries. The replication
function or the load balance function can not be used while in this
mode.
Configuring the System DB
To use the parallel mode, the System DB must be configured
properly. The System DB maintains rules, in the format of the database
table, for deciding an appropriate backend to send partitioned
data. The System DB does not need to be placed on the same host as
pgpool-II. The System DB's configuration is done in the
pgpool.conf .
- system_db_hostname
-
The hostname where the System DB is placed. Specifying the
empty string ('') means the System DB is at the same host as the
pgpool-II, and will be connected via UNIX domain socket.
- system_db_port
-
The port number for the System DB
- system_dbname
-
The partitioning rules and other information will be defined
in the database specified here. Default value is
'pgpool' .
- system_db_schema
-
The partitioning rules and other information will be defined
in the schema specified here. Default value is
'pgpool_catalog' .
- system_db_user
-
The username to connect to the System DB.
- system_db_password
-
The password for the System DB. If no password is set,
set the empty string ('').
Initial Configuration of the System DB
First, create the database and schema specified in the
pgpool.conf file. An sample script can be found in
$prefix/share/system_db.sql . If you have specified a
different database name or schema, change them accordingly.
psql -f $prefix/share/system_db.sql pgpool
Registering a Partitioning Rule
The rules for data partitioning must be registered to
pgpool_catalog.dist_def table.
CREATE TABLE pgpool_catalog.dist_def(
dbname TEXT, -- database name
schema_name TEXT, -- schema name
table_name TEXT, -- table name
col_name TEXT NOT NULL CHECK (col_name = ANY (col_list)), -- partitioning key column name
col_list TEXT[] NOT NULL, -- names of table attributes
type_list TEXT[] NOT NULL, -- types of table attributes
dist_def_func TEXT NOT NULL, -- name of the partitioning rule function
PRIMARY KEY (dbname,schema_name,table_name)
);
Here is the example for partitioning pgbench tables.
INSERT INTO pgpool_catalog.dist_def VALUES (
'pgpool',
'public',
'accounts',
'aid',
ARRAY['aid','bid','abalance','filler'],
ARRAY['integer','integer','integer','character(84)'],
'pgpool_catalog.dist_def_accounts'
);
INSERT INTO pgpool_catalog.dist_def VALUES (
'pgpool',
'public',
'branches',
'bid',
ARRAY['bid','bbalance','filler'],
ARRAY['integer','integer','character(84)'],
'pgpool_catalog.dist_def_branches'
);
INSERT INTO pgpool_catalog.dist_def VALUES (
'pgpool',
'public',
'tellers',
'tid',
ARRAY['tid','bid','tbalance','filler'],
ARRAY['integer','integer','integer','character(84)'],
'pgpool_catalog.dist_def_tellers'
);
The partitioning rule function (namely,
pgpool_catalog.dist_def_accounts, pgpool_catalog.dist_def_branches,
pgpool_catalog.dist_def_tellers) takes a value for the partitioning
key column, and returns the corresponding DB node ID. Note that ID
must start from 0. Below is the example of each function for pgbench.
CREATE OR REPLACE FUNCTION pgpool_catalog.dist_def_accounts (val ANYELEMENT) RETURNS INTEGER AS '
SELECT CASE WHEN $1 >= 1 and $1 <= 30000 THEN 0
WHEN $1 > 30000 and $1 <= 60000 THEN 1
ELSE 2
END' LANGUAGE SQL;
CREATE OR REPLACE FUNCTION pgpool_catalog.dist_def_branches (val ANYELEMENT) RETURNS INTEGER AS '
SELECT 0
' LANGUAGE SQL;
CREATE OR REPLACE FUNCTION pgpool_catalog.dist_def_tellers (val ANYELEMENT) RETURNS INTEGER AS '
SELECT CASE WHEN $1 >= 1 and $1 <= 3 THEN 0
WHEN $1 > 3 and $1 <= 6 THEN 1
ELSE 2
END' LANGUAGE SQL;
Setting up pool_hba.conf for client authentication (HBA)
Just like pg_hba.conf with PostgreSQL, pgpool supports a similar
client authentication function using a configuration file called
"pool_hba.conf".
When you install pgpool, pool_hba.conf.sample will be installed in
"/usr/local/etc", which is the default directory for configuration
files. Copy pool_hba.conf.sample as pool_hba.conf and edit it if necessary.
By default, pool_hba authentication is enabled. See "6. Setting up
pgpool.conf" for more detail.
The format of pool_hba.conf file follows very closely to pg_hba.conf.
local DATABASE USER METHOD [OPTION]
host DATABASE USER CIDR-ADDRESS METHOD [OPTION]
See "pool_hba.conf.sample" for details of each field.
Here are the limitations of pool_hba.
- "hostssl" connection type is not supported
"hostssl" is invalid since pgpool currently does not support SSL
connections.
- "samegroup" for DATABASE field is not supported
Since pgpool does not know anything about users in the backend server,
database name is simply checked against the entries in the DATABASE
field of pool_hba.conf.
- group names following "+" for USER field is not supported
This is the same reason as in the "samegroup" described above. A
user name is simply checked against the entries in the USER field
of pool_hba.conf.
- IPv6 for IP address/mask is not supported
pgpool currently does not support IPv6.
- Only "trust", "reject" and "pam" for METHOD field are supported
Again, this is the same reason as in the "samegroup" described above.
pgpool does not hold user/password information.
Note that everything described in this section is about a client authen-
tication between a client and pgpool; a client still have to go through
an authentication process with PostgreSQL. As far as pool_hba is concerned,
it does not matter if a user name and/or database name given by a client
(i.e. psql -U testuser testdb) really exist in the backend. pool_hba only
cares if a match in the pool_hba.conf is found or not.
PAM authenticaion is supported using user information on the host where
pgpool is executed. To enable PAM support in pgpool, specify "--with-pam"
option to configure:
configure --with-pam
To enable PAM authentication, you need to create a
service-configuration file for pgpool in the system's PAM
configuration directory (which is usually at "/etc/pam.d"). A sample
service-configuration file is installed as "share/pgpool.pam" under
the install directory.
Starting/Stopping pgpool-II
All the backends and the System DB (if necessary) must be started
before starting pgpool-II.
pgpool [-c][-f config_file][-a hba_file][-F pcp_config_file][-n][-d]
-c | deletes query cache |
-f config_file | specifies pgpool.conf |
-a hba_file | specifies pool_hba.conf |
-F pcp_config_file | specifies pcp.conf |
-n | no daemon mode (terminal is not detached) |
-d | debug mode |
There are two ways to stop pgpool-II. One is via PCP command
(described later) or pgpool-II command. Below is an example of the
pgpool-II command.
pgpool [-f config_file][-F pcp_config_file] [-m {s[mart]|f[ast]|i[mmediate]}] stop
-m s[mart] | waits for clients to
disconnect, and shutdown (default) |
-m f[ast] | does not wait for clients;
shutdown immediately |
-m i[mmediate] | the same as '-m
f' |
Restrictions
Authentication / Access Controls
- In the replication mode or master/slave mode, trust, clear text
password, pam methods are supported.
- In all the other modes, trust, clear text password, crypt, md5,
pam methods are supported.
- pgpool-II does not support pg_hba.conf-like access controls. If
the TCP/IP connection is enabled, pgpool-II accepts all the
connections from any host. If needed, use iptables and such to
control access from other hosts. (PostgreSQL server accepting
pgpool-II connections can use pg_hba.conf, of course).
Functions, etc. in the Replication Mode
There is no guarantee that the data, which returns different values
each time even though the same query was executed (e.g. random number,
transaction ID, OID, SERIAL, sequence, CURRENT_TIMESTAMP), will be
replicated correctly on multiple backends.
Tables created by CREATE TEMP TABLE will not be
deleted after exitting a session. It is because of the connetion
pooling which, from the backend's point of view, seems that the
session is still alive. To avoid this, you must explicitly drop the
temporally tables by DROP TABLE , or use CREATE TEMP
TABLE ... ON COMMIT DROP inside the transaction block.
Queries
Here are the queries which cannot be processed by pgpool-II
INSERT (for parallel mode) You cannot use DEFAULT to the
partitioning key column. For example, if the column x in the table t
was the partitioning key column,
INSERT INTO t(x) VALUES (DEFAULT);
is invalid. Also, functions cannot be used as the value either.
INSERT INTO t(x) VALUES (func());
Constant values must be used to INSERT into the partitioning
key. SELECT INTO and INSERT INTO ... SELECT
are also not supported.
UPDATE (for parallel mode)
Data consistency between the backends may be lost if the
partitioning key column values are updated. pgpool-II does not
re-partition the updated data.
A transaction cannot be rolled back if a query has caused an error
on some backends due to the constraint violation.
If a function is called in the WHERE clause, that
query might not be executed correctly.
e.g.) UPDATE branches set bid = 100 where bid = (select max(bid) from beances);
SELECT ... FOR UPDATE (for parallel mode)
If a function is called in the WHERE clause, that
query might not be executed correctly.
e.g.) SELECT * FROM branches where bid = (select max(bid) from beances) FOR UPDATE;
COPY (for parallel mode)
COPY BINARY is not supported. Copying from files are
also not supported. Only COPY FROM STDIN and COPY
TO STDOUT is supported.
ALTER/CREATE TABLE (for parallel mode)
To update the partitioning rule, pgpool-II must be restarted in
order to read them from the System DB.
Transaction (for parallel mode)
SELECT statements executed inside a transaction block
will be executed in the separate transaction. Here is an example:
BEGIN;
INSERT INTO t(a) VALUES (1);
SELECT * FROM t ORDER BY a; <-- INSERT above is not visible from this SELECT statement
END;
Views / Rules (for parallel mode)
The same definition will be created on all the backends for views and rules.
SELECT * FROM a, b where a.i = b.i
JOIN s like above will be executed within one backend, and
then merged with the results returned by each backend. Views and Rules
that joins across the nodes cannot be created.
Functions / Triggers (for parallel mode)
The same definition will be created on all the backends for
functions. Joins across the nodes, and data on the other nodes cannot
be manipulated inside the functions.
Extended Query Protocol (for parallel mode)
The extended query protocol used by JDBC drivers, etc. is not
supported. The simple query protocols must be used.
Multi-byte Characters (for all modes)
pgpool-II does not translate between different multi-byte
characters. The encoding for the client, backend and System DB must be
the same.
Deadlocks (for parallel mode)
Deadlocks across the backends cannot be detected. For example:
(tellers table is partitioned using the following rule)
tid <= 10 --> node 0
tid >= 10 --> node 1
A) BEGIN;
B) BEGIN;
A) SELECT * FROM tellers WHERE tid = 11 FOR UPDATE;
B) SELECT * FROM tellers WHERE tid = 1 FOR UPDATE;
A) SELECT * FROM tellers WHERE tid = 1 FOR UPDATE;
B) SELECT * FROM tellers WHERE tid = 11 FOR UPDATE;
In the case above, a single node cannot detect the deadlock, so
pgpool-II will continually wait for the response. This phenomenon
occurs with other queries that acquire row level locks. To avoid this
problem, set replication_timeout .
Also, if a deadlock occurs in one node, transaction states in each
node will not be consistent. Therefore, pgpool-II terminates the
process if a deadlock is detected.
pool_read_kind: kind does not match between master(84) slot[1] (69)
Schemas (for parallel mode)
Objects in a schema other than public must be fully referenced like:
schema.object
pgpool-II cannot resolve the correct schema when the path was set as
follows:
set search_path = xxx
and the schema name was omitted in a query.
System DB
Partitioning Rules
Only one (1) partitioning key column can be defined in one
partitioning rule. Conditions like 'x or y' are not supported.
Environment Requirements
libpq
libpq is linked while building pgpool-II. libpq
version must be 3.0. Building pgpool-II with libpq version 2.0 will
fail. Also, the System DB must be PostgreSQL 7.4 or later.
Query Cache
Currently, query cache must be deleted manually. pgpool-II does not
invalidate old query cache automatically when the data is updated.
Compatibility with pgpool
References
PCP Command Reference
PCP Command List
PCP commands are UNIX commands which manipulate pgpool-II via network.
* pcp_node_count - retrives the number of nodes
* pcp_node_info - retrives the node information
* pcp_proc_count - retrives the process list
* pcp_proc_info - retrives the process information
* pcp_systemdb_info - retrives the System DB information
* pcp_detach_node - detaches a node from pgpool-II
* pcp_attach_node - attaches a node to pgpool-II
* pcp_stop_pgpool - stops pgpool-II
Common Command-line Arguments
There are five arguments common to all of the PCP commands. They
are information about pgpool-II and for authentication. Extra
arguments may be needed for some commands.
e.g.) $ pcp_node_count 10 localhost 9898 postgres hogehoge
First argument - timeout value in seconds. PCP disconnects if
pgpool-II does not respond in so many seconds.
Second argument - pgpool-II's hostname
Third argument - pgpool-II's port number for PCP server
Fourth argument - PCP username
Fifth argument - PCP password
PCP usernames and passwords must use ones in the
pcp.conf in $prefix/etc
directory. -F option can be used when starting pgpool-II
if pcp.conf is placed somewhere else. The password does
not need to be in md5 format when passing it to the PCP commands.
PCP Commands
All PCP commands display the results to the standard output.
pcp_node_count
Format:
pcp_node_count _timeout_ _host_ _port_ _userid_ _passwd_
Displays the number of total nodes defined in pgpool.conf . It does
not distinguish nodes status between attached/detached. ALL nodes are counted.
pcp_node_info
Format:
pcp_node_info _timeout_ _host_ _port_ _userid_ _passwd_ _nodeid_
Displays the information on the given node ID. The output example is
as follows:
$ pcp_node_info 10 localhost 9898 postgres hogehoge 0
host1 5432 1 1073741823.500000
The result is in the following order:
1. hostname
2. port number
3. status
4. load balance weight
Status is represented by a digit from [0 to 3].
0 - This state is only used during the initialization. PCP will never display it.
1 - Node is up. No connections yet.
2 - Node is up. Connections are pooled.
3 - Node is down.
The load balance weight is displayed in normalized format.
Specifying an invalid node ID will result in error with exit
status 12, and BackendError will be displayed.
pcp_proc_count
Format:
pcp_proc_count _timeout_ _host_ _port_ _userid_ _passwd_
Displays the list of pgpool-II child process IDs. If there is more than
one process, IDs will be delimitted by a white space.
pcp_proc_info
Format:
pcp_proc_info _timeout_ _host_ _port_ _userid_ _passwd_ _processid_
Displays the information on the given pgpool-II child process ID. The
output example is as follows:
$ pcp_proc_info 10 localhost 9898 postgres hogehoge 3815
postgres_db postgres 1150769932 1150767351 3 0 1
The result is in the following order:
1. connected database name
2. connected username
3. process start-up timestamp
4. connection created timestamp
5. protocol major version
6. protocol minor version
7. connection-reuse counter
If there is no connection to the backends, nothing will be
displayed. If there are multiple connections, one connection's
information will be this displayed on each line for multiple
times. Timestamps are displayed in EPOCH format.
Specifying an invalid node ID will result in error with exit
status 12, and BackendError will be displayed.
pcp_systemdb_info
Format:
pcp_systemdb_info _timeout_ _host_ _port_ _userid_ _passwd_
Displays the System DB information. The output example is as follows:
$ pcp_systemdb_info 10 localhost 9898 postgres hogehoge
localhost 5432 yamaguti '' pgpool_catalog pgpool 3
yamaguti public accounts aid 4 aid bid abalance filler integer integer integer character(84) dist_def_accounts
yamaguti public branches bid 3 bid bbalance filler integer integer character(84) dist_def_branches
yamaguti public tellers bid 4 tid bid tbalance filler integer integer integer character(84) dist_def_tellers
First, the System DB information will be displayed on the first
line. The result is in the following order:
1. hostname
2. port number
3. username
4. password. '' for no password.
5. schema name
6. database name
7. number of partioning rules defined
Second, partioning rules will be displayed on the following lines. If
there are multiple definitions, one definition will be displayed on
each line for multiple times. The result is in the following order:
1. targeted partitioning database name
2. targeted partitioning schema name
3. targeted partitioning table name
4. partitioning key column name
5. number of columns in the targeted table
6. column names (displayed as many as said in 5.)
7. column types (displayed as many as said in 5.)
8. partitioning rule function name
If the System DB is not defined (i.e. not in pgpool-II mode, and query
cache is disabled), it results in error with exit status 12, and
BackendError will be displayed.
pcp_detach_node
Format:
pcp_detach_node _timeout_ _host_ _port_ _userid_ _passwd_ _nodeid_
Detaches the given node from pgpool-II.
pcp_attach_node
Format:
pcp_attach_node _timeout_ _host_ _port_ _userid_ _passwd_ _nodeid_
Attaches the given node to pgpool-II.
pcp_stop_pgpool
Format:
pcp_stop_pgpool _timeout_ _host_ _port_ _userid_ _passwd_ _mode_
Terminate pgpool-II process with the given shutdown mode. The availabe
modes are as follows:
s - smart mode
f - fast mode
i - immediate mode
If pgpool-II process does not exist, it results in error with exit
status 8, and ConnectionError will be displayed.
* Currently, there is no difference between the fast and immediate
mode. pgpool-II terminates all the processes wheter there are clients
connected to the backends.
Exit Status
PCP command exits with status 0 when everything goes well. If any
error had occured, it will exit with the following error status.
UNKNOWNERR 1 Unknown Error (shouldn't occur)
EOFERR 2 EOF Error
NOMEMERR 3 Memory shortage
READERR 4 Error while reading from the server
WRITEERR 5 Error while writing to the server
TIMEOUTERR 6 Timeout
INVALERR 7 Argument(s) to the PCP command was invalid
CONNERR 8 Server connection error
NOCONNERR 9 No connection exists
SOCKERR 10 Socket error
HOSTERR 11 Hostname resolution error
BACKENDERR 12 PCP process error on the server (specifying an invalid ID, etc.)
AUTHERR 13 Authorization failure
|