$Id: mysql_drv.txt,v 1.8 2006/05/13 01:12:58 jonz Exp $ mysql_drv: MySQL storage driver for DSPAM Copyright (c) 2002-2006 Jonathan A. Zdziarski http://www.nuclearelephant.com/projects/dspam/ LICENSE This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. ABOUT mysql_drv is a MySQL storage driver for DSPAM v3.0 and above. This driver enables DSPAM to read and write all token, signature, and statistics data from a MySQL database. The advantages of using a SQL backend are obvious: - Centralized data storage - Structured queries for information - No need for context locking mechanisms - Replication and other MySQL features It is STRONGLY RECOMMENDED that you run DSPAM with MySQL v4.1 or greater, as some critical issues were addressed including: - Support for full unsigned 2^^64 integer support This will allow DSPAM to store its token keys as numeric (8 bytes) instead of character (20 bytes). On older versions of MySQL, DSPAM is forced to use character fields. Using the mysql_objects-4.1.sql script will result in a database about 30-35% smaller and more optimized. - Support for subqueries This allows purge-4.1.sql to conditionally purge based on the user's training mode, which allows a cleaner and more granular purge. - Insert Speed 4.1 Supports the use of ON DUPLICATE KEY which allows us to perform all token inserts with a single query. As a result, MySQL 4.1 is about 20% faster. 0. CHECK FOR 4.1 BUG Some versions of 4.1 apparently have a bug with the way bigint's are handled. If you have this bug, you'll need to either upgrade to a better version of MySQL or use the 4.0 (speed, space) object scripts for DSPAM instead of the 4.1 scripts. To check for this bug, log into MySQL and issue these statements: create table dspam_test (token bigint(20)); insert into dspam_test values('3232933621568372736'),('5174632569486238062'),('5860192774298853742'),('8787532532089676142'),('8029914733239132526'); select token from dspam_test where token in('3232933621568372736','8787532532089676142','8029914733239132526','5174632569486238062','5860192774298853742'); Broken versions of MySQL will return only one row, whereas a working version will return all five rows. 1. CONFIGURING DSPAM To configure DSPAM to use mysql_drv, use the following arguments while running DSPAM's configure: --with-storage-driver=mysql_drv Tells DSPAM to use the mysql_drv driver --with-mysql-libraries=/path/to/libs Tells DSPAM where to find the MySQL client libraries. They are usually located in /usr/local/mysql/lib --with-mysql-includes=/path/to/libs Tells DSPAM where to find the MySQL headers. They are usually located in /usr/local/mysql/include --enable-virtual-users Tells DSPAM to create virtual user ids for each dspam user. Use this if your users don't really exist on the system (e.g. via getpwuid) or if you're doing something weird like sharing uids. After configure has successfully finished, build and install DSPAM using the instructions from DSPAM's readme. 2. CREATING MYSQL OBJECTS Before mysql_drv will function, you must run one of the mysql_objects.sql files (located in src/tools.mysql_drv) to create the table objects required by the driver. Depending on whether you are looking to optimize for speed or disk space, you should run the appropriately named script. See your mysql documentation for further information. This script assumes that you have already created a database for DSPAM objects and a user with full access to SELECT, INSERT, UPDATE, and DELETE. If you plan on enabling virtual users (something you'll need to do if the users don't actually exist on your system), also run virtual_users.sql. NOTE: If you plan on having more than 32,768 users on the system, you'll want to change 'smallint' to 'int' for all uid fields. You may also wish to add this line to the mysqld portion of /etc/my.cnf: set-variable = max_allowed_packet=8192000 This will extend the query size from 1MB to 4MB; some larger queries may cause MySQL to drop connection without this set. 3. UPDATING DSPAM.CONF mysql_drv needs to know how to connect to your MySQL database. You will need to specify this information in dspam.conf. This file already contains some example data: MySQLServer /var/lib/mysql/mysql.sock MySQLPort MySQLUser dspam MySQLPass changeme MySQLDb dspam MySQLCompress true Or if you'd like to connect via TCP, use the IP address and port of the MySQL server. DSPAM checks to see if the first character of the HOSTNAME field is a slash, and if so will treat it like a socket file. 4. NIGHTLY PURGE If you would like to purge the many stale tokens DSPAM will have lying around the database, you should run one of the provided purge scripts nightly. The dspam_clean tool can be configured to perform more granular cleansing, such as that of uninteresting data. See DSPAM's README for more information about the dspam_clean tool. If you are supporting TOE or TUM-mode users on your system, you will want to do one of the following: 1. Preferences Extensions + MySQL 4.1 or Greater If you are running MySQL 4.1 or greater, it is recommended you activate preferences extension support and recompile. You may then use purge-4.1.sql instead of purge.sql, which will skip certain types of purges for users with TOE/TUM specified in their preferences. NOTE: You should add a preference for any global users on your system, so that their data is purged as if TOE-based (since global user data is rarely updated). You can do this using: dspam_admin add pref [username] trainingMode TOE 2. Global TOE Support or MySQL < 4.1 If you are running an earlier version of MySQL or if you are using TOE globally (and not in individual user preferences), then you should use purge.sql, but remove the following lines: delete from dspam_token_data where @a-to_days(last_hit) > 90; This will prevent the purging of stale tokens, which could cause serious data loss in TOE databases (because tokens are never touched unless an error has occured). All other purges should be safe even for TOE-mode users. If you will NOT be supporting TOE users on your system, you may simply run the purge.sql script nightly, as-is. 5. TUNING If you have a busy server, and find a lot of table locks, you may consider making a few tweaks to the MySQL configuration. Alternatively, you may consider innodb which performs row-level locking, but this is rarely necessary. Increase key_buffer_size. If you have the memory, try at least 256M or 512M. Increase table_cache. Try a higher value (some systems go as high as 1024). This is related to the max_connections option and allows many file descriptors to be shared among threads Increase myisam_sort_buffer_size to a higher value, depending on your memory. Ideally, a few hundred MB would be great, but at least 64M would be an improvement. 6. REPAIRING If your database gets corrupt, you'll need to repair it. This could take a long time, and so it may make sense to keep a hot backup somewhere. You can run a command like this to repair the database: mysqlcheck --all-databases --fast --auto-repair ERRORS Any SQL errors will be reported to LOGDIR/sql.errors as well as the standard syslog facilities (although the query will be truncated). QUESTIONS Please contact the dspam-dev mailing list with any questions or constructive feedback. Jonathan A. Zdziarski