Monthly Archives: September 2016

‘MySQL server gone away’ error, keepalive and PHP

One of the favorites errors for the many MySQL users is “mysql server gone aways”. It provides minimal data to debug the issue, is very annoying and could be caused by number of reasons. Most popular are:

  • Internal MySQL timeouts (configurable, could be found using “get variable …’
  • MySQL max_allowed_packet limit
  • Networking layer
  • Probably some other mysterious reasons.

In this post i would cover networking layer, because i think it is hard to diagnose and very common. Of course, before jumping to it you should check mysql logs and timeouts to make sure that they are not a root cause. In my case all MySQL server timeouts were good, but application was still failing after running very large tasks. Using strace and tcpdump i found the reason.

  1. App was opening connection to MySQL.
  2. After this some slow code was running, during this time MySQL connection was not in use.
  3. In 1 hr (code was still running) remote side was closing connection.
  4. When app was finally done and trying to reuse connection – it was failing because connection was already closed.

Of course app is also guilty – slow code, no good connection handling, but my task was to find some quick workaround. First question was why connection was closed in 1Hr. Reason was Amazon ELB (load balancer) which was in use. It has idle timeout, which could be configured up to 3600 seconds.Such situation is not uncommon – many routers or balancers do have some idle connection timeouts. So i decided to find why there were no keepalive packets sent. And found that it was a bug of the PHP, which is fixed in the recent version (5.6). After updating php to the recent version i been able to find that keepalive is working. You can do this using netstat -to tool or using tcpdump. Thats how keepalive packet looks like:

04:46:55.034560 IP 10.0.21.105.39853 > 10.0.52.203.mysql: Flags [.], ack 7907432, win 3086, options [nop,nop,TS val 3156463616 ecr 488286410], length 0
04:46:55.035543 IP 10.0.52.203.mysql > 10.0.21.105.39853: Flags [.], ack 9089476, win 136, options [nop,nop,TS val 488361560 ecr 3156163057], length 0

So now Linux will send keepalive packets regularly and this not allow ELB to close idle mysql connection.

Tagged

Creating list of the folders on big NFS mount

Today i found that on one of my NFS shared (140+ Gb of small files) some directory permissions are set in a wrong way. I decided to fix all of them using find /nfsmount -type d|xargs chmod 755. However after running find command i found that it painfully slow and would take enormous amount of time. After using strace reason was found: it is using newfstatat call for the every file in the directory. On NFS it would take a lot of time. Below is an example from the strace:

newfstatat(7, "s00594.jpg", {st_mode=S_IFREG|0660, st_size=152035, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "68394s4.jpg", {st_mode=S_IFREG|0660, st_size=221090, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "fg120lgb.jpg", {st_mode=S_IFREG|0660, st_size=12910, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "86240_1_low.jpg", {st_mode=S_IFREG|0660, st_size=14544, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "am1051-30cm.jpg", {st_mode=S_IFREG|0660, st_size=10091, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "46939-9.jpg", {st_mode=S_IFREG|0660, st_size=121149, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "22293(11).jpg", {st_mode=S_IFREG|0660, st_size=139348, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "51643(1).jpg", {st_mode=S_IFREG|0660, st_size=163897, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "53868t(2).jpg", {st_mode=S_IFREG|0660, st_size=17221, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "40419_t.jpg", {st_mode=S_IFREG|0660, st_size=4247, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "36116t.jpg", {st_mode=S_IFREG|0660, st_size=17580, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "89239_1_high(6).jpg", {st_mode=S_IFREG|0660, st_size=88781, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "61392-1(1).jpg", {st_mode=S_IFREG|0660, st_size=67362, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "98872_1_low(1).jpg", {st_mode=S_IFREG|0660, st_size=26867, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "31269.jpg", {st_mode=S_IFREG|0660, st_size=60999, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "s00437t.jpg", {st_mode=S_IFREG|0660, st_size=7505, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "30813-sub7(2)(4).jpg", {st_mode=S_IFREG|0660, st_size=155762, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "118b-090_t.jpg", {st_mode=S_IFREG|0660, st_size=4019, ...}, AT_SYMLINK_NOFOLLOW) = 0

Interesting thing that i have no idea why find is doing that. There is a getdents call which provides all required information for the entire directory without need to access every file. I found that this syscall works well on NFS, so i created my own tool to list this mount quickly. After it was done i got directory listing in less than 5 minutes! My tool is provided on the github, however it is interesting why GNU find tool itself is so non-optimal.

Tagged ,