ZPA ZE312 counter RF output descriptopn

Some time ago i been able to read data from the ZPA ZE312 power meter, which seems to be popular in CZ. Data now automatically imported to Arduino every 5 minutes, i am planning to make a blog post about it. For now – just a description of the values which this device returns:

 'C.1.0(XXXXXXXXX)' # Meter serial number
 '0.0.0(XXXXXXXXXXXX)' # customer number = barcode
 '0.3.0(10000*imp/kWh)' # Active energy meter constant [imp/kWh]
 'F.F(000000)' # Error code
 '1.8.0(0010332.677*kWh)' # Energy a in total: a = |+a|+|-a|
 '1.8.1(0005757.422*kWh)' # Positive active energy (A+) in tariff T1 [kWh]
 '1.8.2(0004575.255*kWh)' # Positive active energy (A+) in tariff T2 [kWh]
## Registers of active energy per phases
 '21.8.0(0008995.585*kWh)' # Positive active energy (A+) in phase L1 total [kWh]
 '41.8.0(0001014.588*kWh)' # Positive active energy (A+) in phase L2 total [kWh]
 '61.8.0(0000322.503*kWh)' # Positive active energy (A+) in phase L3 total [kWh]
 '2.8.0(0000000.000*kWh)' # Negative active energy (A+) total [kWh]
 '22.8.0(0000000.000*kWh)' # Negative active energy (A-) in phase L1 total [kWh]
 '42.8.0(0000000.000*kWh)' # Negative active energy (A-) in phase L2 total [kWh]
 '62.8.0(0000000.000*kWh)' # Negative active energy (A-) in phase L3 total [kWh]
 'C.8.1(0207212145)' # Operating period of tariff register t1. Format RRMMDDhhmm, RR-year, MM-month, DD-date, hh-hours, mm-min.
 'C.8.2(0103181249)' # Operating period of tariff register t2. Format RRMMDDhhmm, RR-year, MM-month, DD-date, hh-hours, mm-min.
 'C.8.0(0305140414)' # Operating period in total +a. Format RRMMDDhhmm, RR-year, MM-month, DD-date, hh-hours, mm-min.
 'C.82.0(0000000000)' # Operating period in total -a. Format RRMMDDhhmm, RR-year, MM-month, DD-date, hh-hours, mm-min.
 'C.7.1(00000001)' # the number of power failures in phase L1
 'C.7.2(00000001)' # the number of power failures in phase L2
 'C.7.3(00000002)' # the number of power failures in phase L3
 '0.2.1(ver.02,100830,B526)' # SW version
 'C.2.1(1112221302)' # Date and time of the last parameterization. Format RRMMDDhhmm, RR-year, MM-month, DD-date, hh-hours, mm-min.
 'C.2.9(1112221302)' # Date and time of the last read-out. Format RRMMDDhhmm, RR-year, MM-month, DD-date, hh-hours, mm-min.
 'C.3.9(0000000000)' # number of trials of attacking by magnetic field (scaler, counter)
Tagged , ,

Home Automation – WIFI connected FAN in the bathroom with humidity sensor

Why do i need that?

To avoid mold in the house it needs to be properly ventilated. Bathrooms are in the danger zone, because of the high humidity and temperature. Some time ago i bough a ventilator which automatically turns on/off based on the humidity, but it was never working good. It has variable resistors to tune it + some complex analog electronic and in fact was never working as intended and in about 1 year controller completely died. So, i had to find a replacement and decided that this is a perfect task for the HA. First part of the task was to define the requirements:

  • Ventilator must be able to turn on when some humidity value is reached (e.g. 80%) and turn off on low threshold (e.g. 60%)
  • It should be easy to change this thresholds. I would expect that its possible for them to be different at winter/summer seasons, may be not.
  • It would be great to monitor Temperature/Humidity in the bathroom with the same device and report it to the DomoticZ server.
  • Additionally it should be possible to start ventilator by pressing a button (e.g. to test if it works or for manual action)
  • And to have a lot of fun it should be built on cheap and open source components.

Choosing the right platform

Initially i had an idea (and prototype) on the Arduino Nano + Relay board + DHT21 sensor + 5V power supply + some connectivity board (wifi or bluetooth). However, i found that there will be a lot of connected components + wifi boards for the Arduino are very limited and sometime expensive, a lot of custom code needs to be written, etc. So i decided to try ESP8266 platform based devices.
One of the known devices based on this chipset is Sonoff TH10/TH16 (only relay is different, 10A and 16A):
sonoff TH10

I found controller on the Ali Express, because on the official web site it was out of stock.

What is inside?

  1. ESP8266 module which provides SoC with wifi connectivity. Serial port (3.3v, be careful!) soldered on the board, but no pins connected (but easy to add)
  2. Internal power supply circuit to power 3.3V SoC from the 220V power line
  3. Mini audio jack socket to connect different sensors. DHT21 and few others are available from the vendor. I ordered mine with DHT21 which is Humidity + Temperature (and is much better compared to DHT11)
  4. Relay (10 or 16A) connected to the SoC GPIO and manageable from it.

Flashing Sonoff TH16

Native firmare works using a cloud and is integrated with the vendor services. This was not something i would like to have, so i decided to re-flash it. There are many options for the ESP8266 (Arduino sketches, NodeMCU, firmware from vendor, etc). After quick research i decided to use an ESP Easy firmware based on such features:

  • It is free and opensource
  • Comes with a handy web interface and support of many different sensors, so very easy to start with.
  • Supports DomoticZ, MTTQ and other connectivity options.
  • Allows limited scripting (Rules engine) to implement device-side logic

I decided to use version 2.0 (ESP Easy Mega) which is still under development, because release was lacking some of the functionality i need. It is very easy to flash firmware to the device – all you have to do is to solder 4 pins in front of the board, which are VCC, RX, TX, GND. To flash it you will need 3.3V TTL Serial converted with a 3.3v output to power the board.

⚠WARNING⚠!!! never try to flash the device when it is connected to the 220V powerline. There is no proper ground isolation and most likely it will kill your device, serial converter and PC. You should disconnect device from the 220v and power it from the TTL converted instead (or any other 3.3v source)

To flash the firmware i been using esptool.py tool and such command:

esptool.py -b 115200 --port /dev/tty.usbserial-A900ZKU9 write_flash 0x0000 ESPEasy_v2.0.0-dev8_normal_1024.bin

Device needs to be programmed in the flash mode, which is entered by powering on device with the button pressed.

Of course serial port name will depend on your OS and controller. After flash you should reboot the device and it will automatically create an SSID ESP_0 with a WPA password configesp. On address there is a web interface to setup connection to the home WIFI network.

EasyESP initial configuration

After Sonoff device is connected to the wifi you can access it from the browser. First think i would recommend to do is to assign blue led to the WIFI status indication. This could be done in the Hardware section, by setting wifi status led to GPIO-13 ESP Easy - status led. Next thing is to define devices available on the board: button (GPIO-0), and DHT21 sensor (GPIO-14). Additionally i defined dummy device, which will be later used in the rules engine:
ESPEasy - devices

To report values to the DomoticZ controller you have to define it in the controller section:
Controller. Additionally you will have to create virtual devices (using virtual switch) in the DomoticZ for the Temperature + Humidity sensor and the switch. ID in devices section should match one from DomoticZ. Also you may want to setup syslogd ip to see logs from the device when not using serial. This is important, because in my case DHT sensor was not working properly when powered from TTL converter (properly not enough current) but its not possible to use serial when 220V is connected.

EasyESP rules

To avoid any dependency on DomoticZ controller i decided to put this simple logic in the device. There is a Rules engine (it needs to be enabled in the Advanced menu) which allows to define some simple device-side rules. It is very limited, but for my task it is enough. Some of the features i been using:

  • Virtual device as variable holder and event loop provider. Every virtual device allow to store up to 4 values and creates own event loop (loop time is set by “Delay” value).
  • Timers, which allows to call user-defined callback on specified timeout (and set a new one if needed)
  • Events from the device (button, temperature sensors, etc). Events can be used in the rules.
  • System startup event to define actions on a startups.
  • Simple (no nested) if/else controls.
  • Many functions which can be called from the code, e.g. to manager GPIO, do HTTP calls, etc.

Below is a ruleset i created. It defines threshold to turn on/off, reports FAN status to the DomoticZ, supports button which turns on ventilator for a 3 minutes even if conditions are not met:

On System#Boot do
  TaskValueSet,3,1,75.00 // temperature on dummy sensor
  TaskValueSet,3,2,70.00 // humidity on dummy sensor
  event FanOff

// main loop
On OnOffValues#HumidityOn do
  If [Temp#Humidity]>[OnOffValues#HumidityOn]
    event FanOn
  If [Temp#Humidity]<[OnOffValues#HumidityOff]
    event FanOffIfNotTimer // workaround for no nestled if

On FanOffIfNotTimer do
  If  [OnOffValues#TimerActive]=0
    event FanOff

// timer button is pressed
On Button#Switch=0.00 do
  If [OnOffValues#FanOn]=0 // do nothing if FAN is already on
    event FanOn
    timerSet,1,180 // turn off in a 3 minutes

// turn off FAN after timer
On Rules#Timer=1 do
   If [Temp#Humidity]<[OnOffValues#HumidityOn] // do not turn off if Humidity is High
     event FanOff

// command to turn on fan
On FanOn do
  TaskValueSet,3,3,1 // FAN is on

// command to turn off fan
On FanOff do
  If [OnOffValues#FanOn]=1
    TaskValueSet,3,3,0 // FAN is off

Additionally i can turn on FAN from the DomoticZ using HTTP call to the http://bath.local/tools?cmd=event+Button%23Switch URL. I used this as “On Action” for the “Push On” button switch. Of course, nice looking visualization is available:
Temperature/Humidity graph


Everything works as it should, however some of the things i would like to improve in the feature:

  • Solution is not secured properly. I am planning to use own SSID for the HA devices to solve this problem, ESP Easy do have password protection, but it is to secure only web access, API calls are still not secured.
  • Thresholds can be easily changed remotely using TaskValueSet command call
  • Should threshold depend on the temperature or it is overkill? How long will DHT21 survive in such wet environment ? Subject to test… Anyway, its cheap.
  • May be MQTT is better compared to DomoticZ HTTP API? Any comments on that?

As always, comments and suggestions are welcome 🙂

Tagged , , , ,

Acmetool utility port for the FreeBSD

I am actively using Lets Encrypt certificates for my private and business projects. Initially i been using official Python client to obtain them, and it was all kind of possible problems. At some point i migrated to the Acmetool client which works perfectly and allows me to maintain hundreds of the certificates with a minimal efforts. This is feature list from the web site:

✅ Zero-downtime autorenewal
✅ Supports any webserver
✅ Fully automatable
✅ Single-file dependency-free binary
✅ Idempotent
✅ Fast setup

Only problem for me was lack of the FreeBSD port – yes, you can grab FreeBSD binaries from the author web site, install them somewhere to the /opt/, but its not a Jedi Path. So i had to create FreeBSD port, which after few months of aging was finally accepted to the tree. There are small changes in the port compared to the author build – paths are FreeBSD-style and builds should be repeatable (i hope, at least). As usual – feel free to send me PR-s, bugreports and suggestions.

Lets Encrypt!

Tagged , , ,

Reading DHT11 temperature sensor on Raspberry Pi under FreeBSD

Connecting sensor to the RPi

DHT-11 is a very cheap temperature/humidity sensor which is commonly used in the IoT devices. It is not very accurate, so for the accurate measurement i would recommend to use DHT21 instead. Anyway, i had DHT-11 in my tool box, so decided to start with it. DHT-11 using very simple 1 wire protocol – host is turning on chip by sending 18ms low signal to the data output and then reading 40 bytes of data. Details about the protocol could be found in the specification. To read data from the chip it should be connected to the power (5v) and gpio pin. I used pin 2 as VCC, 6 as GND and 11 as GPIO (it is GPIO17, see pinout):

FreeBSD support

There is no support for this device out of the box. I found some sample code on the github, see lex/freebsd-gpio-dht11 repository. This code was a good starting point, but soon i found 2 issues with it:

  1. Results are very unreliable, probably due to gpio decoding algorithm.
  2. Checksum is not validated, so sometime values are bogus.

Initially i was thinking to fix this myself, but later found kernel module for this purpose, 1 wire over gpio. This module contains DHT11 kernel driver (gpio_sw) which implements DHT-11 protocol in the kernel space and exporting /dev/sw0 for the userland. Driver compiles on FreeBSD11/ARM without any changes. Use make install to install the driver.

Putting all the things together

  • To specify GPIO pin with a sensor put hint.gpio_sw.0.pin=17 into /boot/loader.conf.
  • I found that gpio_sw.ko needs to be loaded after kernel initialization, or device is not created. So i have added /sbin/kldload gpio_sw.ko to the /etc/rc.local file.
  • To make /dev/sw0 available for the non-root users you should add devfs_system_ruleset="localrules" to the /etc/rc.conf and add into /etc/devfs.rules this section:
add path 'sw0' mode 0644
  • Sample program (test.c) shows non human-readable data, use something like printf("h:%d.%d %%, t:%d.%d C\n",Buf[0],Buf[1],Buf[2],Buf[3]); if you want to get humidity and temperature, e.g.
root@rpi-b:/home/freebsd/gpio_sw # ./test
h:22.0 %, t:12.0 C

That is it, after reboot you should have working /dev/sw0 device.

Solving problems with LUA

Final goal was to add this sensor to the domoticz software. It is using LUA scripting to extend it functionality, e.g. to obtain data from non-supported or non standard devices. So, i decided to read /dev/sw0 from the LUA. I wrote simple test script:

file = io.open ("/dev/sw0", "rb")
out = file:read (5)

However script was always returning nil and i had to use truss tool to understand the problem. This is part of the trace:

open("/dev/sw0",O_RDONLY,0666)             = 3 (0x3)
fstat(3,{ mode=crw-r--r-- ,inode=96,size=0,blksize=4096 }) = 0 (0x0)
ioctl(3,TIOCGETA,0xbfbfe28c)             = 0 (0x0)
read(3,0x2065b000,4096)              ERR#22 'Invalid argument'

As you could see – LUA trying to read 4096 bytes, despite the fact that we specified 4. And driver checks this and returns the error. To fix this i did a small patch to avoid error and make LUA happy:

-- gpio_sw/gpio_sw.c    2014-05-12 11:26:51.000000000 +0000
+++ gpio_sw.mod/gpio_sw.c   2017-01-14 14:10:33.736813000 +0000
@@ -273,9 +273,10 @@
   duprintf("read - start, uio_resid=%i\n", uio->uio_resid);
   struct gpio_sw_softc *sc = cdev->si_drv1 ;
-  if ( uio->uio_resid >= sc->BufSize) return EINVAL ;
+  if ( uio->uio_resid >= sc->BufSize) sc->Len=sc->BufSize-1 ; // to work with buffered IO
+  else sc->Len=uio->uio_resid;
   if ( ! sc->GpioStatus) return ENXIO ;
-  sc->Len = uio->uio_resid ;
   int i = 0 ;
   while ( i < sc->Len)

After modification script works as it should:

[root@rpi-b /home/freebsd]# lua52 test.lua

Now chip is connected to the domoticz and reporting actual temperature/humidity.

Tagged , , ,

Domoticz on RPi 1 using FreeBSD/ARM

FreeBSD, Rasberry and Domoticz

I decided to create a Home Automation controller from my old Raspberry 1 device. In the past i used Linux on it but now it is supported by FreeBSD/ARM, so i decided to give it a try.

Installing FreeBSD

This is an easiest part. RPi 1 is fully supported, so i just downloaded FreeBSD11 image and put it on the SD card using dd tool. That`s it, FreeBSD boots normally (no installation process required) and is ready to use. I only added ntpd/sshd to the autostart process.

Installing domoticz

Domoticz is an open source home automation system, written on C++. There is a FreeBSD port for it, so first thing i tried to do was to install binary package using pkg tool. Unfortunately there is no binary package, so i had to install ports. To save some time before starting domoticz compilation i would recommend to install all build requirements using pkg manager, compiling them on RPi 1 would take a lot of time. Some tips on getting domoticz compiled on RPi 1:

  • Clang/C++ is using a lot of memory, so swap is required. I added swap space on the connected USB storage, with 512Mb swap file.
  • Compilation process will take a lot of hours. Probably cross-compilation would be much faster, but then it wont be a normal port build. Anyway, this process is a good crash test for the hardware.
  • I found that build failed on a 2 files: hardware/I2C.cpp and hardware/PiFace.cpp. This is the reason why there is no binary package. To fix the build – replace __arm__ with __linuxarm__. This will fix the build process, issue is reported to the upstream and FreeBSD maintainer.

Using domoticz

I did not found anything platform-specific for the domoticz on the FreeBSD. After installation i only imported database from the OSX installation and copied my LUA scripts for the custom sensors + installed required LUA modules from the ports. Despite the ugly build performance domoticz works very fast and using just about 25Mb of RAM. Currently I2C support in the hardware/I2C.cpp is implemented only for Linux/ARM, i am planning to fix that in a few weeks (i do not have any supported i2c devices now). FreeBSD/ARM itself works very stable, i did not had any problems with it.

Tagged , ,

PPTP on OSX Sierra

After upgrading to OSX Sierra (10.12.2) my PPTP connections just disappeared. Also it is not possible to create new PPTP connections anymore. Yes, i know that PPTP protocol is old, could be insecure, but i still need it to connect to some legacy systems. And, after all, i don`t like when someone telling me what should i do 🙂

After quick research it was found that actually only GUI part is removed, but PPTP is still in the Darwin internals. You can use it using pppd tool from the command line (using sudo). Configuration should be placed into /etc/ppp/peers/ directory, e.g. /etc/ppp/peers/pptpvpn. Here is a sample config, which works for me:

        plugin PPTP.ppp
#        logfile /tmp/ppp.log
        remoteaddress gwvpn.example.com
        redialcount 1
        redialtimer 5
        idle 1800
        mru 1368
        mtu 1368
        novj 0:0
        user userid
        password userpw
        # used in ip-up script
        ipparam gwvpn 

To connect to the VPN use pppd call pptpvpn command. Use man pppd for the possible options in the config. Hopefully it will still be there some time 🙂

Tagged , , ,

Check how your web site looks on IPv6 only connection

Today i found great resource for the IPv6 adopters. NAT64 Check website checks and compares website look and feel in NAT64, IPv4-only and IPv6-only modes. This way i been able to find that on my of the my web sites IPv6 was not configured correctly (browser was automatically falling back to v4, so i did not found that myself). Another interesting issue is some external scripts hosted on public CDN-s. Some of them do not have IPv6, e.g. very popular maxcdn.bootstrapcdn.com.

Service works by downloading all resources in 3 different network environments. Also it is trying to compare rendered image of the web site to find, for example, broken geo based lookup or other issues, like incorrect virtual host configuration which will cause different rendering result.

Thats how it looks if everything is fine:


So, thank you Jan Žorž, and lets make more websites IPv6 ready 🙂

Tagged ,

‘MySQL server gone away’ error, keepalive and PHP

One of the favorites errors for the many MySQL users is “mysql server gone aways”. It provides minimal data to debug the issue, is very annoying and could be caused by number of reasons. Most popular are:

  • Internal MySQL timeouts (configurable, could be found using “get variable …’
  • MySQL max_allowed_packet limit
  • Networking layer
  • Probably some other mysterious reasons.

In this post i would cover networking layer, because i think it is hard to diagnose and very common. Of course, before jumping to it you should check mysql logs and timeouts to make sure that they are not a root cause. In my case all MySQL server timeouts were good, but application was still failing after running very large tasks. Using strace and tcpdump i found the reason.

  1. App was opening connection to MySQL.
  2. After this some slow code was running, during this time MySQL connection was not in use.
  3. In 1 hr (code was still running) remote side was closing connection.
  4. When app was finally done and trying to reuse connection – it was failing because connection was already closed.

Of course app is also guilty – slow code, no good connection handling, but my task was to find some quick workaround. First question was why connection was closed in 1Hr. Reason was Amazon ELB (load balancer) which was in use. It has idle timeout, which could be configured up to 3600 seconds.Such situation is not uncommon – many routers or balancers do have some idle connection timeouts. So i decided to find why there were no keepalive packets sent. And found that it was a bug of the PHP, which is fixed in the recent version (5.6). After updating php to the recent version i been able to find that keepalive is working. You can do this using netstat -to tool or using tcpdump. Thats how keepalive packet looks like:

04:46:55.034560 IP &gt; Flags [.], ack 7907432, win 3086, options [nop,nop,TS val 3156463616 ecr 488286410], length 0
04:46:55.035543 IP &gt; Flags [.], ack 9089476, win 136, options [nop,nop,TS val 488361560 ecr 3156163057], length 0

So now Linux will send keepalive packets regularly and this not allow ELB to close idle mysql connection.


Creating list of the folders on big NFS mount

Today i found that on one of my NFS shared (140+ Gb of small files) some directory permissions are set in a wrong way. I decided to fix all of them using find /nfsmount -type d|xargs chmod 755. However after running find command i found that it painfully slow and would take enormous amount of time. After using strace reason was found: it is using newfstatat call for the every file in the directory. On NFS it would take a lot of time. Below is an example from the strace:

newfstatat(7, &quot;s00594.jpg&quot;, {st_mode=S_IFREG|0660, st_size=152035, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;68394s4.jpg&quot;, {st_mode=S_IFREG|0660, st_size=221090, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;fg120lgb.jpg&quot;, {st_mode=S_IFREG|0660, st_size=12910, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;86240_1_low.jpg&quot;, {st_mode=S_IFREG|0660, st_size=14544, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;am1051-30cm.jpg&quot;, {st_mode=S_IFREG|0660, st_size=10091, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;46939-9.jpg&quot;, {st_mode=S_IFREG|0660, st_size=121149, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;22293(11).jpg&quot;, {st_mode=S_IFREG|0660, st_size=139348, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;51643(1).jpg&quot;, {st_mode=S_IFREG|0660, st_size=163897, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;53868t(2).jpg&quot;, {st_mode=S_IFREG|0660, st_size=17221, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;40419_t.jpg&quot;, {st_mode=S_IFREG|0660, st_size=4247, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;36116t.jpg&quot;, {st_mode=S_IFREG|0660, st_size=17580, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;89239_1_high(6).jpg&quot;, {st_mode=S_IFREG|0660, st_size=88781, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;61392-1(1).jpg&quot;, {st_mode=S_IFREG|0660, st_size=67362, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;98872_1_low(1).jpg&quot;, {st_mode=S_IFREG|0660, st_size=26867, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;31269.jpg&quot;, {st_mode=S_IFREG|0660, st_size=60999, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;s00437t.jpg&quot;, {st_mode=S_IFREG|0660, st_size=7505, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;30813-sub7(2)(4).jpg&quot;, {st_mode=S_IFREG|0660, st_size=155762, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(7, &quot;118b-090_t.jpg&quot;, {st_mode=S_IFREG|0660, st_size=4019, ...}, AT_SYMLINK_NOFOLLOW) = 0

Interesting thing that i have no idea why find is doing that. There is a getdents call which provides all required information for the entire directory without need to access every file. I found that this syscall works well on NFS, so i created my own tool to list this mount quickly. After it was done i got directory listing in less than 5 minutes! My tool is provided on the github, however it is interesting why GNU find tool itself is so non-optimal.

Tagged ,

ICMP Watchdog in the Ubiquiti Networks devices

About watchdog

I am using wireless devices from the Ubiquiti Networks. Usually everything works fine, but in rare cases of software/hardware bug it would be great to automatically restart device when needed. AirOS provides this functionality, it is called “ping watchdog” and is located in the web interface, “services” tab. However there is no a lot of documentation about how it works, so i decided to research this. Screenshot of the watchdog interface with default values provided below: Screen Shot 2016-07-18 at 08.38.52.

Under the hood

Ubnt AirOS is OpenWRT based OS with ssh enabled, so we can ssh to the device to find how this watchdog works. If ping watchdog is enabled in the web interface you should see something like this in the process list:

/bin/pwdog -d 300 -p 300 -c 3 -m 300 -e /bin/support /tmp/emerg /etc/persistent/emerg.supp emerg 0; reboot -f

This “pwdog” service is a custom busybox applet which is based on busybox ping implementation with modifications to implement watchdog functionality. I been able to find it source code on the github.

So there is detailed description of the pwdog service logic:

  1. On system start it waits -d seconds (300 by default), to allow initialization of the hardware and software. I would not recommend to reduce this value, or you will have a chance that device will never start. In the web interface it is “Startup Delay:” value.
  2. After initial delay it will send ICMP ping to the specified host (last parameter) and will wait -p seconds (300 by default, “Ping Interval:” in the web interface). After this step 2 will be repeated.
  3. If there is no reply -c times (by default – 3) pwdog will run command specified in the -e argument (/bin/support /tmp/emerg /etc/persistent/emerg.supp emerg 0; reboot) or just reboot if it is not specified. In this example watchdog also saves support information. In the web interface you can modify this value using “Failure Count To Reboot.:” parameter.
  4. There is also -m parameter which defines low memory threshold. It is enabled by default and is not configurable via web interface.

Below i tested how it works in the command line, with modified parameters:

XM.v5.6.6# /bin/pwdog -d 1 -p 3 -c 3 -m 300 -e /usr/bin/echo -v
pwdog[993]: pwdog: do_now=0, initial_sleep=1, timeout=3, retry_count=3, low_mem=300 exec=`/usr/bin/echo`
pwdog[993]: PING Watchdog is checking (
pwdog[993]: Missed 1 ping replies in a row.
pwdog[993]: Missed 2 ping replies in a row.
pwdog[993]: Missed 3 ping replies in a row.
pwdog[993]: 4 ping replies missed. Executing `/usr/bin/echo`.


ICMP watchdog in AirOS is not a very smart service and default configuration does not look optimal for me – in fact its enough to miss only 3 ICMP packets to start reboot process. Also it will fire only after 15 (300*3) minutes of the link failure. So i would probably recommend to increase number of counts and decrease ping interval. Also i am thinking about porting apinger to this device, because it provides much more advanced icmp check functionality.

Tagged , , ,