BackupsThis article is somewhat long. Interestingly, it does not actually cover my entire talk, as there is much to talk about besides the mechanics of each backup option. I wonder what I'd need to do to make this into a white paper or an article? The backup presentation was finished last night. I may decide to go back and put some extra stuff in there, but that would be syntax and code and stuff. The logic is all in there, and the notes have been printed. I will post the slides (in .pdf and .swf (flash, the file is very small that way) formats) after the talk on Monday, as I may yet revise them. I am very excited about one slide in particular, and I'll share it here. It's really a slide that I end with, but I feel as though it's a great starting point as well as a summary point. I haven't seen this information encapsulated this way before, so here goes:
The table is set up so the "No" answers are 'bad' and the "Yes" answers are 'good'. The last column contains the # of "No"s in the row. Right off the bat, we see why mysqldump is probably the most used backup tool -- it has the fewest "No"s. "No locking" -- A backup routine should interfere as little as possible with the actual database. If you lock a row or table, even for just reading, you're blocking others from writing it. Transactional engines like InnoDB and BDB offer transactions to help get around this, but many databases have a mixture of transactional and non-transactional storage engines. "DDL" -- Backups should be able to completely restore a database. If you do not have the Data Definition Language (ie, CREATE TABLE statements), you will not be able to completely restore a database. "Snapshot" -- To use a backup for point-in-time recovery or to build a new replication slave, a snapshot is needed. It is possible to do a point-in-time recovery without a snapshot, but it involves checking the binary logs for possible duplicate statements. "Remote" -- Does the backup need to be run on the OS where the server lies? Due to security in companies or the use of an ISP, it is not always possible for the DBA to have the level of access needed to run programs on the database server. "Free" -- Self-explanatory. Note that the backup options that are free come packaged with MySQL, with the exception of OS-level copy, which is in all OS systems (ignoring embedded systems for now). "All Engines" -- Does the backup option deal with all storage engines? Some tools or commands are engine-specific. "All Tables" -- Can the tool easily backup all the tables without external looping code? "Text file" -- A text file backup is advantageous for a two reasons: Corruption can be easily detected, partial backups (ie, of one table) can be done, and SQL can be standardized for migration. Text files are usually larger than data files, but compression of text is excellent. "Recover Corruption" -- A "Yes" here means the backup option can be used to recover from corruption. A brief discussion about the options: SELECT . . . INTO OUTFILE is not used often, because it does not copy DDL. Code is needed to loop through tables, and there will be table-level locking for MyISAM tables. If the SELECT statement is put into a transaction, there will not be a problem for BDB and InnoDB tables. In MySQL 5.0 it is possible to copy the information about a table from the INFORMATION_SCHEMA database by using SELECT . . . It's my opinion that mysqldump is the most widely used backup tool. It has the fewest "No"s of all of them, and the features it does not have can be worked around. Note that mysqldump does do DDL by default, but it's easy to get an incomplete DDL statement. If you turn off --opt, and don't put --create-options, you will end up with CREATE TABLE statements that are standard SQL. This may sound good, but remember that storage engines are MySQL specific, so your CREATE TABLE statements will end up using the default storage engine, which may not be what you want upon restoration. mysqldump can do a snapshot if the option to lock all the tables is used. However, it is usually not feasible to lock every database on the server while the backup is running, because a backup can take minutes. Replication is also widely used, but it requires another instance. And to reduce single points of failure, that means more hardware needs to be bought. It may be difficult to justify hardware simply to do nothing unless a backup is needed. One of the features of replication is also a detriment. The standard way to set up a replication slave is to take a snapshot of the master, import it to the slave, and then use binary logs from there. Another way is to use LOAD DATA FROM MASTER, but that only works for MyISAM tables. It is possible to alter tables and data on the slave before or during replication, without replication failing. This is a handy feature for having a write-only master with InnoDB tables and a read-only slave with MyISAM tables (for example, with fulltext searching). However, this also means that the DDL backup is not necessarily to be trusted. That mistrust can be extended to the data for the same reasons. Replication may not be a reliable backup if DML is being run on the slave server. As well, if the master becomes corrupt, it is likely that corruption will spread to the replication server. The two most widely-used storage engines, MyISAM and InnoDB, are OS independent (explanation in the MySQL manual for MyISAM and InnoDB). This means that they can simply be copied on the OS-level. The biggest gotcha is that in order to get a snapshot, the entire database has to be locked. And unlike mysqldump, there is no option to an OS-level copy that will lock the tables. Using this is bulkier than mysqldump , and it cannot be used for all storage engines as mysqldump can. Two solutions have been devised to combat those problems -- mysqlhotcopy and InnoDB Hot Backup. mysqlhotcopy is for MyISAM tables while InnoDB Hot Backup is for InnoDB tables. The other difference is that the InnoDB Hot Backup is not free. However, InnoDB Hot Backup does offer a free perl script that will use InnoDB Hot Backup to take a snapshot of the InnoDB tables, and then use MySQL commands to get a snapshot of the other engine types. Most people will choose a combination of backup methods -- many use both replication and mysqldump, or replication and mysqlhotcopy and InnoDB Hot Backup. There is no clear-cut answer, and mostly it depends on your environment and what the backup will be used for. Ideally there would be one tool that performed all of the functions listed in the chart. The chart is not complete, either -- just a listing of my opinion of the most common desired features. There are other desired features not listed, for example replication is an "immediate hot backup" whereas all the other tools require manually importing or starting the MySQL server with new data files, etc. I am definitely interested in what folks have to say on the subject. |
Follow me on:SearchNavigation |
It interesting and a great
It interesting and a great resource but two issue was there
* single-transaction also avoids locking issues. As you said, it only works for InnoDB tables. If you want your MyISAM tables to be consistent, you have to use –lock-tables. However, –lock-tables and –single-transaction are mutually exclusive.
* Article does not cover " What to Back UP,and When" .
Petter Smith
http://www.hdrconline.com
Re: "the article only
Re: "the article only mentioned tools to bad up, not strategies", wrt incremental backup. The tools you give will not work for incremental backups, because they by their nature backup an entire database or slabs of the database that are not necessarily the parts you want to back up incrementally, i.e., the parts you have changed.
So the "tools" and "strategies" cannot be separated like you appear to suggest.
You can use the bin-logs, as you say, if you have access to them. But I bet many people used shared web hosting do not have access to them, since it would take extra work to provide them securely across the network. (Not that much work, but enough to make it unpopular).
So to do incremental backup, I have to time-stamp every record in every table and use mysqldump with a where clause to select only those records changed since the last backup.
[...] MySQL Backups Tagged
[...] MySQL Backups Tagged as: database howto [...]
Hi Jan, One thing this
Hi Jan,
One thing this article does not cover is what to back up, and when. The talk goes into that (it actually goes into the "who" as well). Binary logs will accomplish what incremental backups do.
I've found that you want to automate everything. "I'll just update it when I change the static data" is a good idea, and certainly at my company our list of zip codes in the US rarely changes. However, if we automate the process, say to back up the static data once a month, then we don't have to remember to do it when we do change the data. (in our case, we release in new countries and have new postal code data)
That method actually makes backing up less of a load on your database, because you're not backing up the static data constantly. However, the article only mentioned tools to use to back up, not strategies. When you do back stuff up, you will want to use one of the tools in the article.
One thing I didn't see
One thing I didn't see mentioned was incremental backup. Those who's data follows the typical "90/10" rule -- 90% reads, 10% updates -- don't want or need to backup the entire database each time. In my case in particular, I have lots of heavy TEXT and BLOB fields that are essentially constant -- why should I back them up more than once?
Or am I missing something...
For anyone with high load,
For anyone with high load, large data size and uptime requirements, replication plus one of the other options from a slave would be the way to go. Any other option on the table adds lots of disk load, hurting performance for end users. Not really something you want for the duration of a 400GB backup job while you're trying to serve 4,000 questions per second. Ends up being a denial of service attack on your own server. :)
Mirroring at the filesystem level with something like DRBD is also a useful option for some situations, and shares that no high load benefit.
The corruption I encounter is usually that of the tables, not inconsistency of the data within them. Corrupt data files or tablespaces due to disk errors and such. Replication is fine to recover from this.
I'd set up replication first with a binary copy while the master is down. Avoids any doubt at all about whether it's consistent and since the performance impact of a backup while the system is serving content is terrible at high sizes and load, you're effectively down anyway. Might as well free the disks and get it done fast. After the first, I'd do binary copies from one slave to another, always trying to keep at least three computers with the data on them.
Human error corrupting the data logically at the application level is best countered with lots of backups and binary logging. To get lots of backups you need the backups to be cost-free for the end users. Replication delivers that.
Backup from slaves is the approach we use at Wikipedia.
An even better solution is
An even better solution is to get away from MySQL's "almost-a-database" mentality, and switch to PostgreSQL. Recent versions of PostgreSQL can even use naively rsync'ed copies of the underlying data files and they still "do the right thing" on restart, thanks to proper Write-Ahead-Logging.
Another nice solution for
Another nice solution for backups is using a Logical Volume Manager that supports filesystem snapshots. By using LVM you just need to run a "FLUSH TABLES WITH READ LOCK", perform the snapshot operation (usually done in a second) and then do "UNLOCK TABLES". You can now backup the binary database files in a consistent state from the snapshot volume using regular file copy operations or any backup software. without interrupting the database server. See my "MySQL Administration: Backup and Security Strategies on Linux" presentation for more info on that: http://www.lenzg.org/mysql/
You are correct, although
You are correct, although I'll add that you also have to use the --all-databases option for that to work. (this is actually in the talk)
--single-transaction also avoids locking issues. As you said, it only works for InnoDB tables. If you want your MyISAM tables to be consistent, you have to use --lock-tables. However, --lock-tables and --single-transaction are mutually exclusive.
If you ONLY have InnoDB tables, then yes, that will work. I'm assuming a complicated setup, as that's what most folks have -- searching is much more efficient with MyISAM, and FULLTEXT searching can only be done on MyISAM tables.
(likewise, if you ONLY have MyISAM tables, you can use mysqlhotcopy)
About InnoDB
About InnoDB backups,
passing --single-transaction to mysqldump followed by flush logs will perform a snapshot backup on InnoDB tables?
Thanks and sorry my English.