An introduction to replication
Recently, while having
the knots pounded out of my body during a particularly painful shiatsu lesson,
I reflected on what put them there in the first place. Yes, ‘the database’ was
once more to blame. A busy database I work with saw one of its tables jump from
3GB to 7GB overnight, as we imported archive data. As expected, this had some
performance impact. Unfortunately, I had not expected quite the knock, and it
turned out that this single database server could no longer handle the load. No
matter how much more I tried to optimize
the queries, tweak
the variables or bump
up the query cache, it was not enough. The machine could not take any more
memory, and a hardware upgrade would do little good (at least with the kind of
budget I have to play with). However, MySQL does not claim to be
enterprise-ready for nothing, and Yahoo and other high-volume users of MySQL
certainly do not run on one database server. There are a number of techniques
to handle high volumes, one of which I will introduce this month – MySQL
replication (I will look at others in future articles).
Replication allows you to
take one database, make an exact copy of it on another server, and set one of
them (the slave) to take all its updates from the other (the master). The slave
reads the master’s binary
logs, which store all statements that change a database, and repeats these
on its database, keeping the two in exact sync. Since a replicating database
simply repeats statements, the databases are not necessarily exactly in sync,
and advanced users can take advantage of this. That is a topic for another
article however, and we will look at simple replication this month – getting
one database to be an exact copy of another one.
What replication is not
-
Replication is
not a backup policy. A mistyped DELETE statement will be replicated on the
slave too, and you could end up with two, perfectly synchronized, empty
databases. Replication can help protect against hardware failure though. -
Replication is
not an answer to all performance problems. Although updates on the slave are
more optimized than if you ran the updates normally, if you use MyISAM tables,
table-locking will still occur, and databases under high-load could still
struggle. -
Replication is
not a guarantee that the slave will be in sync with the master at any one point
in time. Even assuming the connection is always up, a busy slave may not yet
have caught up with the master, so you can’t simply interchange SELECT queries
across master and slave servers.
How to start replicating – the master
server
-
Grant the
slave permission to replicate with the REPLICATION SLAVE privilege, for example
as follows: GRANT REPLICATION SLAVE ON
*.* TO slave_user IDENTIFIED BY ‘slave_password’ -
If the master
is not using the binary update log, add the following lines to the my.cnf
or my.ini configuration file, and restart the server:log-bin server-id=1
By convention, the master is usually server-id 1, and any
slaves from 2 onwards, though you can change this if you wish. If the master is
already using the binary update log, either take note of the offset at the
moment of the backup (the next step), or use the RESET MASTER statement
to clear all binary logs and immediately begin the backup. You may want to make
a copy of the binary logs before doing this, in case you need to use the binary
logs to restore from backup. -
Make a backup
of the database. You will use this to start the slave server. Note the comments
about the binary log above. You can also skip this step if you use the LOAD
DATA FROM MASTER statement, but see the comments about locking the master
below first.