‘MySQL server gone away’ error, keepalive and PHP

One of the favorites errors for the many MySQL users is “mysql server gone aways”. It provides minimal data to debug the issue, is very annoying and could be caused by number of reasons. Most popular are:

  • Internal MySQL timeouts (configurable, could be found using “get variable …’
  • MySQL max_allowed_packet limit
  • Networking layer
  • Probably some other mysterious reasons.

In this post i would cover networking layer, because i think it is hard to diagnose and very common. Of course, before jumping to it you should check mysql logs and timeouts to make sure that they are not a root cause. In my case all MySQL server timeouts were good, but application was still failing after running very large tasks. Using strace and tcpdump i found the reason.

  1. App was opening connection to MySQL.
  2. After this some slow code was running, during this time MySQL connection was not in use.
  3. In 1 hr (code was still running) remote side was closing connection.
  4. When app was finally done and trying to reuse connection – it was failing because connection was already closed.

Of course app is also guilty – slow code, no good connection handling, but my task was to find some quick workaround. First question was why connection was closed in 1Hr. Reason was Amazon ELB (load balancer) which was in use. It has idle timeout, which could be configured up to 3600 seconds.Such situation is not uncommon – many routers or balancers do have some idle connection timeouts. So i decided to find why there were no keepalive packets sent. And found that it was a bug of the PHP, which is fixed in the recent version (5.6). After updating php to the recent version i been able to find that keepalive is working. You can do this using netstat -to tool or using tcpdump. Thats how keepalive packet looks like:

04:46:55.034560 IP 10.0.21.105.39853 > 10.0.52.203.mysql: Flags [.], ack 7907432, win 3086, options [nop,nop,TS val 3156463616 ecr 488286410], length 0
04:46:55.035543 IP 10.0.52.203.mysql > 10.0.21.105.39853: Flags [.], ack 9089476, win 136, options [nop,nop,TS val 488361560 ecr 3156163057], length 0

So now Linux will send keepalive packets regularly and this not allow ELB to close idle mysql connection.

Tagged

Creating list of the folders on big NFS mount

Today i found that on one of my NFS shared (140+ Gb of small files) some directory permissions are set in a wrong way. I decided to fix all of them using find /nfsmount -type d|xargs chmod 755. However after running find command i found that it painfully slow and would take enormous amount of time. After using strace reason was found: it is using newfstatat call for the every file in the directory. On NFS it would take a lot of time. Below is an example from the strace:

newfstatat(7, "s00594.jpg", {st_mode=S_IFREG|0660, st_size=152035, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "68394s4.jpg", {st_mode=S_IFREG|0660, st_size=221090, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "fg120lgb.jpg", {st_mode=S_IFREG|0660, st_size=12910, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "86240_1_low.jpg", {st_mode=S_IFREG|0660, st_size=14544, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "am1051-30cm.jpg", {st_mode=S_IFREG|0660, st_size=10091, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "46939-9.jpg", {st_mode=S_IFREG|0660, st_size=121149, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "22293(11).jpg", {st_mode=S_IFREG|0660, st_size=139348, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "51643(1).jpg", {st_mode=S_IFREG|0660, st_size=163897, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "53868t(2).jpg", {st_mode=S_IFREG|0660, st_size=17221, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "40419_t.jpg", {st_mode=S_IFREG|0660, st_size=4247, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "36116t.jpg", {st_mode=S_IFREG|0660, st_size=17580, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "89239_1_high(6).jpg", {st_mode=S_IFREG|0660, st_size=88781, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "61392-1(1).jpg", {st_mode=S_IFREG|0660, st_size=67362, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "98872_1_low(1).jpg", {st_mode=S_IFREG|0660, st_size=26867, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "31269.jpg", {st_mode=S_IFREG|0660, st_size=60999, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "s00437t.jpg", {st_mode=S_IFREG|0660, st_size=7505, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "30813-sub7(2)(4).jpg", {st_mode=S_IFREG|0660, st_size=155762, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, "118b-090_t.jpg", {st_mode=S_IFREG|0660, st_size=4019, ...}, AT_SYMLINK_NOFOLLOW) = 0

Interesting thing that i have no idea why find is doing that. There is a getdents call which provides all required information for the entire directory without need to access every file. I found that this syscall works well on NFS, so i created my own tool to list this mount quickly. After it was done i got directory listing in less than 5 minutes! My tool is provided on the github, however it is interesting why GNU find tool itself is so non-optimal.

Tagged ,

ICMP Watchdog in the Ubiquiti Networks devices

About watchdog

I am using wireless devices from the Ubiquiti Networks. Usually everything works fine, but in rare cases of software/hardware bug it would be great to automatically restart device when needed. AirOS provides this functionality, it is called “ping watchdog” and is located in the web interface, “services” tab. However there is no a lot of documentation about how it works, so i decided to research this. Screenshot of the watchdog interface with default values provided below: Screen Shot 2016-07-18 at 08.38.52.

Under the hood

Ubnt AirOS is OpenWRT based OS with ssh enabled, so we can ssh to the device to find how this watchdog works. If ping watchdog is enabled in the web interface you should see something like this in the process list:

/bin/pwdog -d 300 -p 300 -c 3 -m 300 -e /bin/support /tmp/emerg /etc/persistent/emerg.supp emerg 0; reboot -f 192.168.1.1

This “pwdog” service is a custom busybox applet which is based on busybox ping implementation with modifications to implement watchdog functionality. I been able to find it source code on the github.

So there is detailed description of the pwdog service logic:

  1. On system start it waits -d seconds (300 by default), to allow initialization of the hardware and software. I would not recommend to reduce this value, or you will have a chance that device will never start. In the web interface it is “Startup Delay:” value.
  2. After initial delay it will send ICMP ping to the specified host (last parameter) and will wait -p seconds (300 by default, “Ping Interval:” in the web interface). After this step 2 will be repeated.
  3. If there is no reply -c times (by default – 3) pwdog will run command specified in the -e argument (/bin/support /tmp/emerg /etc/persistent/emerg.supp emerg 0; reboot) or just reboot if it is not specified. In this example watchdog also saves support information. In the web interface you can modify this value using “Failure Count To Reboot.:” parameter.
  4. There is also -m parameter which defines low memory threshold. It is enabled by default and is not configurable via web interface.

Below i tested how it works in the command line, with modified parameters:

XM.v5.6.6# /bin/pwdog -d 1 -p 3 -c 3 -m 300 -e /usr/bin/echo -v 192.168.1.1
pwdog[993]: pwdog: do_now=0, initial_sleep=1, timeout=3, retry_count=3, low_mem=300 exec=`/usr/bin/echo`
pwdog[993]: PING Watchdog is checking 192.168.1.1 (192.168.1.1).
pwdog[993]: Missed 1 ping replies in a row.
pwdog[993]: Missed 2 ping replies in a row.
pwdog[993]: Missed 3 ping replies in a row.
pwdog[993]: 4 ping replies missed. Executing `/usr/bin/echo`.

Conclusion

ICMP watchdog in AirOS is not a very smart service and default configuration does not look optimal for me – in fact its enough to miss only 3 ICMP packets to start reboot process. Also it will fire only after 15 (300*3) minutes of the link failure. So i would probably recommend to increase number of counts and decrease ping interval. Also i am thinking about porting apinger to this device, because it provides much more advanced icmp check functionality.

Tagged , , ,

Q&D perl script to check balance on the mujkaktus.cz GSM prepaid

I found that one of the best options for the GSM trackers in CZ is mujkaktus card. It allows per-traffic tarification with a very reasonable rates. Also it is anonymous and no contract required. To automatically check balance on the card i created a small perl script which i decided to share. I am planning to integrate it with Nagios to alert if balance is low.

#!/usr/bin/perl

use strict;
require LWP::UserAgent;

# username/password
my $username = 'example@example.com';
my $password = 'example';

binmode STDOUT, ":utf8"; # suppress UTF-8 warnings
$ENV{'PERL_LWP_SSL_VERIFY_HOSTNAME'} = 0; # disable ssl checking

# Create a request
my $req = HTTP::Request->new(POST => 'https://www.mujkaktus.cz/.gang/login');
$req->content_type('application/x-www-form-urlencoded');
$req->header('Cookie' => 'COOKIE_SUPPORT=true'); # required
$req->content('username='.$username.'&password='.$password.'&submit=P%C5%99ihl%C3%A1sit');
my $ua = LWP::UserAgent->new;
$ua->cookie_jar( {} ); # 
my $response = $ua->request($req); 
# status line should be '302 Found' if password is correct
die "Unexpected http response, check login/password\n" if ($response->status_line ne "302 Found");

# we will not follow redirect, but fetch "moje-sluzby" page
$req = HTTP::Request->new(GET => 'https://www.mujkaktus.cz/moje-sluzby');
my $response = $ua->request($req);
if ($response->decoded_content =~ m/stav kreditu<\/h3><div class=\"box-format\"><div id=\"[^"]+\"><p><span class="text-1">([0-9,]+)/i) {
	print "Balance: $1 CZK\n";
}
else {
	die "Unable to fetch balance data\n";
}

P.S. i found that there are in fact 2 web interfaces – old, legacy one, which i am using and a new one, AJAX based. To get old web interface you should disable Javascript in your browser. Hopefully they will keep this legacy interface for some time, AJAX based is much more complicated and will require more efforts to get data.

Tagged , ,

Using TK103A GPS tracker with traccar server

TK-103A tracker

Some time ago i decided to install on my car GPS tracker to get information about my routes, car location, etc. After quick research i found “Mini TK103A” tracker on the eBay, which is costs about 30$.

s-l1600

Device looks solid and can be configured by sms commands. Most important are “begin123456” (initialization), “admin123456 (adds numbers to the trusted list) and adminip (gprs settings). Full command list is provided in the documentation.

“USB” port

Tracker do have micro-USB socket, however it is not real USB, it is just serial port soldered on microusb plug. I been able to get information from it using USB-Serial TTL converter. It sends a lot of debug information on 115200/8N1 speed. Debugging information is useful when you configuring and testing the tracker.

04-26 15-63-40  EINT PWR CONNECT
04-26 15-63-40  motion_close
01-15 00-00-00  SENDDATA:0
01-15 00-00-00  NO SERVICE
01-15 00-00-00  T-card not ready!
01-15 00-00-00  FILE2222:
                         01-15 00-00-00:

01-15 00-00-00  password1=:123456
01-15 001-15 00-00-00  CENTER NUMER1:+420123123123
01-15 00-00-00  CENTER NUMER2:
01-15 00-00-00  CENTER NUMER3:
01-15 00-00-00  CENTER NUMER4:
01-15 00-00-00  CENTER NUMER5:
01-15 00-00-00  heartbeat time:3
01-15 00-00-00  SENDDATA:0
01-15 00-00-00  send Packet time:15
01-15 00-00-00  sms_gprs=1
01-15 00-00-00  time_zone:2,8,0
01-15 00-00-00  voice_temp:1
01-15 00-00-00  shave alarm:0,35
01-15 00-00-00  ACC:0
01-15 00-00-00  speed alarm:0,120
01-15 00-00-00  speed alarm time:5
01-15 00-00-00  s alarm time:5
01-15 00-00-00  move alarm=0
01-15 00-00-00  JT=0
01-15 00-00-00  JT TIME=3
01-15 00-00-00  TRACE :2
01-15 00-00-00  lang=1
01-15 00-00-00  APN=1
01-15 00-00-00  ���ϴ�ʱ��:1
01-15 00-00-00  powr=1
01-15 00-00-00  weilan:0
01-15 00-00-00  num:255
01-15 00-00-00  loud_spe=1
01-15 00-00-04  NO SERVICE
01-15 00-00-04  NO SERVICE
01-15 00-00-04  NO SERVICE
01-15 00-00-04  NO SERVICE
01-15 00-00-04  NO SERVICE
01-15 00-00-05  NO SERVICE
01-15 00-00-07  NETWORK NORMAL
01-15 00-00-07  NETWORK NORMAL
01-15 00-00-10  T-card not ready!
01-15 00-00-10  FILE2222:
                         01-15 00-00-10:

01-15 00-00-12   IMEI��:352887072123123
01-15 00-00-12   IP/PORT:1.2.3.4/9000
01-15 00-00-12   VER:MAUI.10A.W11.08.MP.V25 2015/09/11 12:38
01-15 00-00-12   ---------------------------------------------------
01-15 00-00-12   SIM CARD------------------OK!
01-15 00-00-12   GSM Signal----------------OK!
01-15 00-00-12   SOCKET----------------NG
01-15 00-00-12   G-Senser------------------OK!
01-15 00-00-12   GPS Location----------NG
01-15 00-00-12   PWR EINT--------------NG
01-15 00-00-12   ACC EINT--------------NG
01-15 00-00-12   SOS EINT--------------NG
01-15 00-00-12   BATTER/Vin-----------4.11/11.97
01-15 00-00-12   ---------------------------------------------------
01-15 00-00-12   GPS Location:86,Satellite:2-----------
01-15 00-00-12  num:255

I also found some references that this port can be used to reflash the tracker, however i never tried that.

Sending data to the server

After GPRS host/port configuration you can enable GPRS mode where all data will be sent to the remote server. I found that OpenSource TracCar software supports such devices and provides web+android interfaces. It was found that this specific tracker using GT06 binary protocol. Traccar supports it out of the box, you just have to choose correct port on the server/client. Traccar also supports data logging to the external database (MySQL, Pg, etc.), so it should be easy to integrate it with anything you need.

Some security considerations

All data from tracker to the monitoring system is sent unencrypted and can be easily decoded on transit if traffic is captured. This device also allows to add some “security alarm” features, including ignition and oil pump control. I personally feel that it is very dangerous and should not be used at all. I think such features are good example of the InternetOfShit coming🙂

Tagged , , , ,

Let`s Encrypt is now public beta :)

Starting from today Let’s Encrypt service is a public beta🙂 This mean you can create certificates trusted by most of the browsers right now. So it is a perfect time to create SSL certificates for all your TLS services and get rid of self-signed certificates. Also it will help to switch to the new HTTP/2 protocol.

Only few things i am missing – non-python client without tonn of half-implmented features and some similar s-mime solution. First thing should be easy to implement, its just a matter of time, protocol is open and easy to understand.

Upgrading TP-Link Archer C7 AC1750 to use with OpenWRT

Why OpenWRT?

One of my home access points is TP-Link Archer C7. I purchased it to get all benefits of the 5Ghz 802.11ac standard for the laptop and 2.4Ghz band for the older devices. However, it was never working for me well:

  • In 5Ghz band Apple devices were working very unstable
  • Sometime i had to reboot router because of wifi stability issues. After reboot it was working until next issue. There are no debug options/logs in the native firmware.
  • Device was spamming network with STP packets and some other data, no way to disable.
  • After upgrading to the new firmware versions i had to reconfigure it completely. And in fact difference between regullary updated versions was minimal
  • Native firmware configurable only via web interface, probably backdoors are included🙂

So i decided to reflash it to the OpenWRT and found, that i am “happy” owner of the TP-Link Archer C7v1, with AR1A (v1) variant of QCA9880 chip, not supported in the open source ath10k driver. So there is no way to use 5Ghz with OpenWRT at all. Only good thing that 5Ghz chip is not soldered on the board, but connected to the PCIe mini card socket. So i decided to replace it.

Router upgrade

  • I been able to find on the eBay Compex WLE900VX Atheros QCA9880 card. It supports 802.11AC 1.3Gbps 3×3 MIMO 5ghz and is supported by ath10k driver.
  • Before replacing WIFI card you should install OpenWRT or device wont boot at all. I used OpenWRT CC 15.05 for the Archer C7 V1.X, upgrade was done via web interface
  • After OpenWRT is up and running – turn off device and replace WiFi card. Be careful with pigtails, it is very easy to damage them.
  • OpenWRT recognized this card without any additional packages and now working well. You may also want to use alternate firmware from Candela Technologies, there are some reports that it works better then one from vendor.

Limitations

  • Hardware NAT is not supported. I am not using NAT on it, so i dont really care. Probably on speeds up to 300Mbit it does not matter.
  • Device has only 8Mb of flash. It is enough for the OpenWRT installation (including Luci). There are also 2 USB2 ports, so its easy to extend storage size if needed.

Results

So far everything works great. It is too early to say if stability issues are gone or not, but at least i am now able to do full debug and tuning if needed. I am planning to benchmark router later.

Tagged , ,

Lets encrypt!

Some time ago i subscribed to the Lets Encrypt beta participation program. Let’s Encrypt is a free, automated, and open certificate authority (CA), run for the public’s benefit.
Yesterday i finally get an email, telling that my requested domains (smartmontools.org/www.smartmontools.org) are white-listed. So i decided to give it a try.

Installation

To use Lets Encrypt! service you will have to install ACME protocol client. ACME (Automatic Certificate Management Environment) is a protocol to automate all operations with PKI certificates. Current implementation is written on Python language and available on the github repository. You could find a lot of information about it usage in the online manual. I was trying to run it on the CentOS 6.7 and installation failed because of old (2.6) Python version. However, after some research, i been able to find a pull request with a patch for the 2.6 support. Hopefully it will go into mainline at some point, because py26 is still widely used. After this i been able to complete installation with letsencrypt-auto.

Usage

Lets Encrypt require you to verify that you own requested domain, as most other CA do. However with ACME this cold be done 100% automatically. There are different options on how to do this, initially i tried --standalone option. With it letsencrypt client creates standalone webserver for the authentication. However if you already have web server on port 80 you will have to stop it when client is running. It was working for me, but it requires short downtime, so i decided to look on other options. After all i found webroot authenticator, which allows to just create some files in the web root and later automatically removes them. To automate the process i created configuration file /etc/letsencrypt/cli.ini:

# This is an example of the kind of things you can do in a configuration file.
# All flags used by the client can be configured here. Run Let's Encrypt with
# "--help" to learn more about the available options.

# Use a 2048 or 4096 bit RSA key
rsa-key-size = 2048

# Use production server
server =  https://acme-v01.api.letsencrypt.org/directory

# Uncomment and update to register with the specified e-mail address
email = nospam@example.com

# Uncomment to use a text interface instead of ncurses
text = True

# Uncomment to use the standalone authenticator on port 443
# authenticator = standalone
# standalone-supported-challenges = dvsni

# Uncomment to use the webroot authenticator. Replace webroot-path with the
# path to the public_html / webroot folder being served by your web server.
authenticator = webroot
webroot-path = /var/www/html/smartmontools/static

# automatically agree with license
agree-dev-preview = True

# renew certificate if it is already exists
renew-by-default = True

I also had to make sure that nginx can provide required files to the remote, so i added such lines in my nginx site configuration:

    location /.well-known/acme-challenge/ {
        alias /var/www/html/smartmontools/static/.well-known/acme-challenge/;
    }

To use certificates in nginx i added path to the new certificates and key to the configuration:

    ssl_certificate /etc/letsencrypt/live/smartmontools.org/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/smartmontools.org/privkey.pem;

Now to re-new my certificates i just need to run

./letsencrypt-auto -d smartmontools.org -d www.smartmontools.org certonly

command and it will do the job. Also dont forget to reload nginx service if certificates are already configured.

Notes

LetsEncrypt certificates will expire in a 90 days, so it is recommended to renew them every 60 days. Also it is very recommended to setup nagios check to send an alert if expiration time is less then one week. In the feature i would also try to use ACME client on the OpenWRT box, but hopefully there will be some more suitable alternative for the embedded hardware. Finally i would recommend to test your web server SSL configuration with SSL Server Test from SSL Labs.

Tagged , , ,

Smartmontools daily builds

Sometime i need to audit some servers and often smartmontools is very old, not installed at all (and repositories are broken) or not working for some reasons. Thats one of the reasons why http://builds.smartmontools.org was created. You can download latest SVN builds for the following systems:

  • Darwin (OSX) package, Mach-O universal binary with 2 architectures: i386+x86_64
  • Win32 installer (32 and 64 bit)
  • Linux: i686,x86_64,static and dynamic
  • Source code

Service is now in “experimental” status, please report any issues with it here or on https://smartmontools.org.

Tagged , , , ,

LXC on OpenWRT/Turris presentation

Slides from my presentation @ Turris Evening by cz.nic about LXC in OpenWRT/Turris. Video will follow soon, if you are interested.

 

Update: video from the presentation:

Tagged , , ,