...making Linux just a little more fun!
By Mark Nielsen
This article is to help you setup MySQL Master/Slave/Cluster combinations on one computer. Why would that be useful? Well, for starters, there isn't an abundance of Master/Slave/Cluster documentation. The Cluster technology is really new. Second, MySQL is a very very very hot database. There aren't too many good MySQL DBAs out there. If you want to secure your job in the future, learn MySQL. The Master/Slave and Cluster technologies are a must. If you don't know anything about the Master/Slave or Clusters, you can pretty much forget getting hired. In every MySQL interview (I had like 10 companies interviewing me in a six week period and half of them used MySQL for something) the Master/Slave questions popped up and topics of the Clustering technology came up as well. Third, it is a pain to setup Master/Slave or Cluster environments by hand. If my script works on one of your computers, copy over the software, copy over the config files, make slight changes to the config files, and you can get a real multi-computer environment setup in minutes.
The second purpose of this article is to show you how to use Python to write a simple application (compiling and installing MySQL). I tried to include a lot of stuff in the Python code which is common to most Python scripts/modules. Also, I am thinking about turning the Python code into a package in the Vaults of Parnassus - just for kicks, and so I can say that I have some sort of Python code published.
So, why did I use the programming language Python to execute all the commands? I love Python, it is object oriented (from the ground up), has good exception handling, has been compilable for a long time, it is hard to write ugly Python code, it is easy to understand other people's code, there is only one way to do things (usually), it's easy to manipulate, and it is a general tool from web programming to Unix scripting to database programming to GUI programming to mathematical programming (my favorite). There are so many reasons to love Python if you are a true object-oriented programmer. The other reason why I try to use Python for all personal projects is to convince people where I work (or will work) that Python rocks and should be used wherever appropriate. The sad thing is, most managers don't want to use Python because of the lack of people who can write good Python code, so it is my duty to convince them otherwise. Update: I got hired at Google, which uses a lot of Python so now I am a happy camper!
Why compile MySQL? Why not just use the RPMs? I don't think clustering
is built into some of the RPMs. Also, I always like to compile and install
software myself. If you can't compile it, odds are, there might be other
problems. If you do decide to use RPMs, and you decide to use an RPM
server, please use YUM and do not use
commercial RPM servers. All the commercial RPM servers I have seen (just
one, you know who) are way overpriced and are geared towards the executive
staff or people who want to buy support so that they can blame someone if
something goes wrong. The sales staff are really good at making you believe
you can jump to the moon, when really, the software they present only
manages the installation of RPMs. Any 1st year programmer could write a web
interface to do the same thing. By using YUM (it is easy, simple, and free
of restrictions), you empower yourself and you will help bring down the
costs of the the overpriced commercial RPM services. If the commercial RPM
services weren't so overpriced (they should be like $25 per computer max)
and didn't have so many stupid restrictions (like installing
thebind
RPM requires a more expensive license), I wouldn't
mind, but they are getting away with doing so little, it is ridiculous.
Politics and the suits have gotten in the way of good technology at some of
these companies.
One more thing, I bought a 2-gig RAM 64 bit AMD CPU with Serial ATA hard drives. I bought this hardware configuration for the simple purpose of using a 64 bit operating system. Why? Because many companies are using 64 bit AMD cpus for their databases and other things. Just at work the other day, I mentioned I got a MySQL cluster to work on my home computer, so I as asked to setup a test cluster at work. See how valuable it is to stay ahead of the market?
The nice thing about the 64 bit AMD CPU is that is will also support the 32bit operating systems. Thus, I was able to take the safe route by installing a 32 bit Linux OS before I risked using the 64 bit versions (which had a lot of problems in the past). Plus, don't kill me for saying it, but I still wanted to play some Windows games that weren't available for Linux yet. I am close to the point where I never need to use Windows again. If gaming companies would always make Linux versions of their products, I would never have use Windows for anything. That would really be nice.
What are MySQL Master/Slaves? The Master MySQL server lets people read and write data to it. It copies all of its data to the MySQL slave computers. Thus, you end up with many database servers with the same data. This can be useful for load-balancing your webservers against many database servers or performing a failover in case the Master dies (you shutoff the Master and make one of the Slaves the new Master). Normally, the Slave computers are read-only. Since most websites have a 4-to-1 ratio of reads to writes, having many read-only MySQL slaves can be very useful. How it is useful? Your website can handle more data and use more database connections spread over multiple servers. All the writes still have to be done to the Master, but the website can choose which Slave it wants to read data from. Separating the write connections from the read connections can speed up your website a lot.
What is MySQL Clustering? MySQL Clustering lets you put multiple computers together which are all the same and have all the same data. You can connect to any computer in the cluster and perform read/write operations which are immediately available to the other mirrors. The main different between Clustering and Master/Slaves is that each computer in a Cluster can be written to while in a Master/Slave configuration you are only suppose to write to the Master. Also, in Clustering, when you write data to one machine, the computer will not respond back with an "OK" status until the data has been copied to all the other computers. With the Master/Slave configuration, the data is not necessarily copied to the Slaves immediately. Effectively, you can view the cluster as one single entity/database. The nice thing is, if one computer is the cluster crashes, it doesn't affect the other computers or the data in the cluster. This behavior doesn't exist in the Master/Slave replication. If the Master goes down, the replication stops.
Clustering is a little more complicated, but it has huge advantages and will most likely be used a lot by many companies. It is fairly new technology, so in my opinion, it is good time to start testing it. For small websites which love to be on the bleeding edge of technology and you don't mind being risky, go ahead and use MySQL Clustering. I am a little cautious about using it for heavy performance database needs, but I am anxious to at least try! You should be warned that there are a lot of limitations with MySQL Clustering. For example, your database can only be as big as the free RAM you have. Also, I have had a lot of problems getting more than just a two storage node cluster working.
The MySQL.py module currently only has one class called "Installer" which is mostly finished. It has other classes I am working on, but nothing worth talking about yet. I included a lot of stuff in the module like:
Here is sample script to use my MySQL module. Note, if you already ran this script and got mysql working, and you want to reconfigure it, then comment out "M_Obj.Execute_Compile_Script()" to reconfigure mysql. I tried to comment each below in the script.
#!/usr/bin/python ### Just a bunch of standard modules I load. import urllib2, urllib, string, re, os, struct import base64, string, gzip, sys, time, commands import getopt ### The module I created. import MySQL ## Initialize our installation object. M_Obj = MySQL.Installer() #-------------------------------------------------------- ### Build and install the mysql binaries. These binaries will be common ### with all the mysql services. M_Obj.Write_Compile_Script() ### This creates out bash compile script. M_Obj.Execute_Compile_Script() ### This executes our bash script. ### This creates one set of binaries for ### all mysql instances. M_Obj.Stop_Instances() ### In case there are any running. M_Obj.Setup_Instances() ### Setup and initialize the databases. M_Obj.Start_Instances() ### Start the databases.
python Compile_MySQL.py
This will download and install MySQL. It will make a log file called
"/tmp/mysql_install.log". If it doesn't download MySQL, download
mysql-4.1.7.tar.gz
manually from mysql.com and save it in the directory
"/usr/local/src/mysql_compile".
NOTE: If you want to specify another database server, then execute this, substituting the appropriate URL:
python Compile_MySQL.py -d http://www.signal42.com/mirrors/mysql/Downloads/MySQL-5.0/mysql-5.0.2-alpha.tar.gz
Here are a bunch of scripts in the home directory where mysql got installed.
Script | Usage | Purpose |
---|---|---|
bash_aliases | source bash_aliases | This setups aliases to connect to each service. The aliases connect to the master service, slave services, ndb_mgm, and each cluster mysqld service. |
Start | ./Start | Starts the master, slaves, and cluster. |
Stop | ./Stop | Stops the master, slaves, and cluster. |
The scripts listed below are created in the "scripts" directory where mysql got installed. If you did a "source bash_aliases", these commands will be in the bash path. You can execute these scripts as many times as you like.
Script | Purpose |
---|---|
Master_Status.bash | Gets the status of the master service. |
Slave_Status.bash | Gets the status of the slave services. |
Master_Slave_Status.bash | Gets the status of the master and slaves. |
Master_Slave_Test.bash | Inserts data and then the master and slaves perform the same sql query which should show the same results. |
Cluster_Status.bash | Gets the status of the cluster |
Cluster_Test.bash | Inserts data into the cluster. It shuts down each storage node and sees if each mysqld server can still access the data. |
Try to learn what the scripts do step by step. If you can fully understand what those scripts are doing, then you know at least the basics about how to manage the mysql services.
Follow these steps:
cd /usr/local/mysql-cluster ### Change to the mysql directory. source bash_aliases ### Load some aliases into the bash shell. ./Start ### Start mysql master, slaves, cluster. Cluster_Status.bash ### Get the status of the cluster Master_Status.bash ### Get the status of the master service. Slave_Status.bash ### Get the status of the slave service. Master_Slave_Test.bash ### Test the master/slave services. Cluster_Test.bash ### Test the cluster.
MySQL is coming along nicely. With all the new features being put into MySQL, it can compete against some of the larger database companies. It is important to learn MySQL because new job opportunities are emerging and MySQL will be a hot item over the next few years (before the masses learn it). The Python module and the script I wrote make it really easy to get a fairly complicated set of MySQL instances installed on your computer. To use it in production, just copy over the binaries and the config files and make a few changes to the config files and you are done! Once you see how Master/Slave/Cluster configurations work, it really becomes simple to understand how it all works.
In the last few months, I believe the MySQL Professional Certification paid off. Most employers don't want just a DBA anymore. PostgreSQL and MySQL make database servers easy to use for most people. No longer do you have to worship the DBA from hell because he/she knows all the little tricks. All the docs and examples are online and you can fool around with the source code yourself. This is going to put pressure on DBAs to be more than just DBAs. I believe the MySQL jobs I applied for really liked my programming and Sys Admin skills. I know for a fact that one company used my MySQL Professional Certification as leverage to interview me because they could claim "he is certified". They really wanted a programmer, but because of politics, they couldn't directly get a programmer, so my certification let me slip in. Cool, huh? A lot of certifications really don't tell you whether someone is good or not. However, the LPI and MySQL Certifications have two things going for them. First, they are actually reasonable certifications. Most certifications are meant to just make money for the company, but LPI and MySQL don't seem to be like that; they really want you to be qualified. Second, managers and HR like certifications because it protects them. I know most techies hate certifications, but because of politics, you really need to get certified to fight against the machine. LPI and MySQL Certifications are fairly inexpensive, unlike a lot of the other stupid certifications, so it really isn't that bad.
Mark Nielsen was enjoying his work at cnet.com as a MySQL DBA, but
is moving to Google as a MySQL DBA.
During his spare time, he uses Python heavily for mathematical and web
projects.