Tuesday, April 5, 2011

How to fix MySql Replication Error 1236

We have some websites that use two database servers in master-master replication. Recently one of the servers died and had to be resurrected. (They are cloud based and SoftLayer doesn't seem to have their cloud-based offering thing nailed down, yet. Every week one of them will go down and be unresponsive until tech support does some magic.)

After one of the servers was brought back up the other server would not connect. It "Slave_IO_Running: No" and "Seconds_Behind_Master: null" which means it was not playing nicely.

First, I went to the MySql log files, which for this server is found at /var/log/mysql.log and looked at the last few messages here by running "tail /var/log/mysql.log" from the command prompt. This revealed the error number (server_errno=1236). It also had the following info:
Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log

Just before this entry in the mysql.log it indicates the file and position that it is trying to read. So, with this data in hand, I headed over to the master server and took a look at the bin logs. They are located in /var/lib/mysql/. Here I took a look at the file in question using the mysqlbinlog utility. I used the following command to check out the bin log. Obviously, you'll have to replace the position and file name with the position and file indicated in your mysql.log.
mysqlbinlog --offset=128694 mysql-bin.000013

And, this is what I saw here, among other things:
Warning: this binlog was not closed properly. Most probably mysqld crashed writing it.

Well, that explains things! When the server crashed the bin log was not closed properly. This is easy to fix. Going back to the slave server, I stopped the slave, reset the bin log position and started the server.
STOP SLAVE;
CHANGE MASTER TO MASTER_LOG_POS = 0;
CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000014';
START SLAVE;

I simply pointed the slave to the start of the next bin log. It started right up with no problem.

Update 5/21/13 - An anonymous commentor made a good point about potentially losing data using the above technique. If you are in a situation like I am (master-master replication with 100's of gigs of non-critical data) this is really the only way of getting back up without significant down time. But if you are in a master-slave configuration with a manageable data set or its critical you slave doesn't miss any data, you should probably dump the master database and re-create the slave database to make sure you didn't miss anything.

13 comments:

Anonymous said...

Hey there

Thanks so much for this AWESOME article! You saved my life! :-)

javipolo said...

nice!

Thank you so much!

Anonymous said...

nice!

Thank you so much!

Anupam Vishwakarma said...

Thank you Firelit,
This is very nice and good concept,I thought of doing the same when found the difference in slave status and master status but not having the knowledge of performing this,
This doc cleared concepts.

Brad Triem said...

Thank you!!

Anonymous said...

Excellent!
Thank you very much!

Saludos desde Argentina!

Shiv Chouarsia said...

Excellent perfect!!

aramman said...

great post , thanks a lot

edulanka said...

Done

Its works

Thanks

Got fatal error 1236 FROM MASTER
WHEN reading DATA FROM BINARY LOG:
'binlog truncated in the middle of event; consider out of disk space on master;
the first event 'mysql-bin.000792' at 608112413,
the last event read from './mysql-bin.000792' at 608112413,
the last byte read from './mysql-bin.000792' at 608112640.'

STOP SLAVE;
CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000793', MASTER_LOG_POS = 0;

Anonymous said...

Dunno if it is a possible problem, but when setting MASTER_LOG_POS = 0 before changing MASTER_LOG_FILE, the actual position seems to be adjusted in relation to the old log file. Then when the log file is changed afterwards, that adjusted position is used and not the initially set position.

Anonymous said...

Thank you! It worked for me.

Anonymous said...

As a customer it works but as a DBA I must say that this wouldnt work as it should, you should see in the binlog whats wrong

Anonymous said...

edulanka, thank you! You made my day...

Error 1236 can have various reasons, but when the reason is 'binlog truncated in the middle of event', solution provided by you works perfectly.