1. MySQL replication

Introduction

This document explains how to setup MySQL replication between master and slaves.

References: See also http://dev.mysql.com/doc/refman/5.0/en/replication-howto.html

Since FreeNAC Version 3.0.1, the MySQL database is configured to run in a so called 'multiple-master' scenario, meaning that each server is both a master and a slave in MySQL terminology. So each server queries updates from others (a slave), and makes any updates which were made to its dataset available to other servers (master).

Therefore a replication must be setup in each direction, for each server. Lets assume we have two servers vmps1 (our 'main' or primary server) and vmps2.
It is possible to have more than two servers (using the mysql relay_log), but this has not been tested or documented in FreeNACA yet.

The procedure is basically as follows:

First get vmps1 (the main server) running, with actual data.

A) configure vmps1 to share its data, copy an initial dataset to vmps2, configure vmps2 to retrieve updates via replication

B) configure vmps2 to share its updates, and vmps1 to retrieve these via replication

Replace the following in the examples below:

SERVER2.DOMAIN       the FQDN of your slave
repl Replication username
REPL_PASSWD Replication password
opennac Name of your database (this was 'inventory' prior to NAC v2.2).

A. Initial vmps1 --> vmps2 replication

Initialisation

"vmps2" is a MySQL slave, and "vmps1" is a MySQL master.

Allow vmps2 the right to get replication updates from vmps1.
Note: it is important the master name corresponds to the DNS name in the GRANT statement below, otherwise use its IP address. Check /mysqldata/mysqld.log for errors.

GRANT SELECT, PROCESS, FILE, SUPER, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl'@'vmps2' IDENTIFIED BY 'REPL_PASSWD';
SHOW MASTER STATUS;

Purge unneeded logs on the master:

PURGE MASTER LOGS TO 'SERVER-bin.NUMBER' 

[the exact name comes from the File field in the 'show master status' above]

Copy initial data-set

0) On the slave, vmps2

stop slave;

1) On the master, vmps1: Lock the tables, note log position, restart

mysql> FLUSH TABLES WITH READ LOCK;
vmps1:$ cd /mysqldata; tar cvf opennac.tar opennac
mysql> SHOW MASTER STATUS;
+------------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------------+----------+--------------+------------------+
| vmps1-bin.000027 | 12717436 | | |

==> take note of the position

mysql> UNLOCK TABLES;

2) Slave vmps2:

Stop mysql

/etc/init.d/mysql stop

Copy DB tar file from master & extract:

   cd /mysqldata && mv opennac opennac.$$
scp vmps1:/mysql/opennac.tar .
tar xvf opennac.tar
chmod 770 opennac; chmod g+s opennac; chown -R mysql:mysql opennac;

Configure slave: start daemon with slave off

    /usr/sbin/mysqld --skip-slave-start --log-warnings &

Start replication

Start mysql client (on vmps2):

mysql> reset slave;

CHANGE MASTER: replace XXXX, YYYY, ZZZZ and 'FILE_NAME' with the values from the 'show master' above:

mysql> CHANGE MASTER TO MASTER_HOST='vmps1', MASTER_USER='repl', MASTER_PASSWORD='YYYY', MASTER_LOG_FILE='FILE_NAME', MASTER_LOG_POS=ZZZ;

Start replication:

    START SLAVE;
show slave status \G;

Check the log position with that on the master:

    show master status;

Empty the vmpsauth table, which is the only local table:

DELETE FROM opennac.vmpsauth;

Also check the slave mysql log (or syslog) for errors.

If all looks fine, stop the slave:

    /etc/init.d/mysql stop     
Check with 'ps' to make sure mysql is dead, other use 'kill' with the PID of the mysqlprocess.
Then start mysql normally
   /etc/init.d/mysql start

If vmps is configured already, restart that too. If this is a first time installation, wait.

   /etc/init.d/vmps restart;
/etc/init.d/postconnect restart;
tail -f /var/log/messages | grep vmpsd_external

B. Initial vmps1 <-- vmps2 replication

Initialisation

"vmps1" is a MySQL slave, and "vmps2" is a MySQL master.
Note: it is important the master name corresponds to the DNS name in the GRANT statement below, otherwise use its IP address. Check /mysqldata/mysqld.log for errors.

On the vmps2 mysql prompt:

GRANT SELECT, PROCESS, FILE, SUPER, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl'@'vmps1' IDENTIFIED BY 'REPL_PASSWD';

SHOW MASTER STATUS;

Examining logged SQL queries/updates on vmps2

Now vmps2 is already humming along (due to the procedure in section A.) with a copy of vmps1's data, and is retrieving vmps1 update via replication. Since that there may have been updated to vmps2 though.

To see what logs vmps2 has, the name of the current log and position:

show binary logs;
show master status;

Now lets look at the updates in the current log:

show binlog events limit 20; 

This will show the most recent 100 SQL statements that are pending, allowing you to verify that they make sense.

Enable replication client on vmps1

Start the mysql client and tell the replication to start at the initial position of the log on vmps2 (see also the output from the show master status on vmps2)

mysql> reset slave;

mysql> CHANGE MASTER TO MASTER_HOST='vmps2', MASTER_USER='repl', MASTER_PASSWORD='REPL_PASSWD', MASTER_LOG_FILE='vmps2-bin.000001', MASTER_LOG_POS=1;

Start replication:

start slave;
show slave status \G;

Verify that the master log position is correct, Slave_IO_Running: Yes and Slave_SQL_Running: Yes. Last_Error should be empty.

Check the log position with that on vmps2:

    show master status;

Check the mysql log (/mysqldata/mysql.log or syslog) for errors.

Double check replication: on vmps2, insert some data

insert into naclog set what='test2';
select * from naclog order by id desc limit 10;

on vmps2, see if it appears as expected:

select * from naclog order by id desc limit 10;

The id of the inserted row should have an increment offset of 2.

Notes: Fixing a replication problem

It has happened to us that replication stops due to an invalid query.
Replication is OK on a slave if

   show slave status \G;

reports that the master log position is correct, Slave_IO_Running: Yes and Slave_SQL_Running: Yes. Last_Error should be empty.

For example, lets say Slave_SQL_Running was 'No'. To see why examine the Last_Error entry which may list the SQL entry causing the problem and then the mysql log (/mysqldata/mysql.log or syslog).

Lets assume that you understand the SQL statement, decide its not a big problem and just want to ignore that statement. So we fix it, by stopping the
slave and skipping the SQL Query causing the problem:

  stop slave;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
start slave;
show slave status \G;

It now skips to the next error, for example:

  Slave_SQL_Running: No
Last_Error: Error 'Unknown table 'opennac.v_1'' on query. Default database: 'opennac'. Query: 'DROP VIEW v_1'

Pending log events can also be examined:

show binlog events limit 100; 
show warnings;

To get through these difficult queries, it may be necessary to repeat the above.

More reading:
http://dev.mysql.com/doc/refman/5.0/en/set-global-sql-slave-skip-counter...
http://dev.mysql.com/doc/refman/5.1/en/replication-options.html

4. Monitoring replication

Activate monitor_mysql_slave - call it from cron on all servers (since all servers are slaves), e.g. every 5 minutes during office hours:

*/5  7-18 * * 1-5 /opt/nac/bin/monitor_mysql_slave