One of the favorites errors for the many MySQL users is “mysql server gone aways”. It provides minimal data to debug the issue, is very annoying and could be caused by number of reasons. Most popular are:
- Internal MySQL timeouts (configurable, could be found using “get variable …’
- MySQL max_allowed_packet limit
- Networking layer
- Probably some other mysterious reasons.
In this post i would cover networking layer, because i think it is hard to diagnose and very common. Of course, before jumping to it you should check mysql logs and timeouts to make sure that they are not a root cause. In my case all MySQL server timeouts were good, but application was still failing after running very large tasks. Using strace and tcpdump i found the reason.
- App was opening connection to MySQL.
- After this some slow code was running, during this time MySQL connection was not in use.
- In 1 hr (code was still running) remote side was closing connection.
- When app was finally done and trying to reuse connection – it was failing because connection was already closed.
Of course app is also guilty – slow code, no good connection handling, but my task was to find some quick workaround. First question was why connection was closed in 1Hr. Reason was Amazon ELB (load balancer) which was in use. It has idle timeout, which could be configured up to 3600 seconds.Such situation is not uncommon – many routers or balancers do have some idle connection timeouts. So i decided to find why there were no keepalive packets sent. And found that it was a bug of the PHP, which is fixed in the recent version (5.6). After updating php to the recent version i been able to find that keepalive is working. You can do this using
netstat -to tool or using tcpdump. Thats how keepalive packet looks like:
04:46:55.034560 IP 10.0.21.105.39853 > 10.0.52.203.mysql: Flags [.], ack 7907432, win 3086, options [nop,nop,TS val 3156463616 ecr 488286410], length 0 04:46:55.035543 IP 10.0.52.203.mysql > 10.0.21.105.39853: Flags [.], ack 9089476, win 136, options [nop,nop,TS val 488361560 ecr 3156163057], length 0
So now Linux will send keepalive packets regularly and this not allow ELB to close idle mysql connection.