All posts from Philipp Heckel

Lossless MySQL semi-sync replication and automated failover

MySQL is a really mature technology. It’s been around for a quarter of a century and it’s one of the most popular DBMS in the world. As such, as an engineer, one expects basic features such as replication and failover to be fleshed out, stable and ideally even easy to set up. And while MySQL comes with replication functionality out of the box, automated failover and topology management is not part of its feature set. On top of that, it turns out that it is rather difficult to not shoot yourself in the foot when configuring replication. This is a blog post about setting up lossless MySQL replication with automated failover, i.e. ensuring that not a single transaction is lost during a failover, and that failovers happen entirely without human intervention.

Reliably rebooting Ubuntu using watchdogs

Rebooting Ubuntu is hard. I don’t really know why, but in my twelve years as an Ubuntu user, I’ve encountered countless “stuck at reboot” scenarios. Somehow, typing reboot always comes with that extra special feeling of uncertainty and the thrill of danger. This post describes the short story of how we managed to make Ubuntu machines reliably reboot using Linux watchdogs.

Providing remote access to Datto devices via SSH tunnels

Our backup devices are typically physically located inside the LAN of our end users. Under normal circumstances that means that they are behind a NAT and are not reachable from the public Internet without a VPN or other tunneling mechanisms. For our customers, the Managed Service Provider (MSP), only being able to access their Datto devices with direct physical access would be a major inconvenience. In this post, we talk about how we implemented "Remote Web", a feature that lets customers remotely access the device, even when it is behind a NAT.

How we upgrade the software and operating system of thousands of appliances every two weeks

In this post, we describe how we moved from Debian-based deployments in our fleet of >80,000 devices to image based upgrades. We show the nitty gritty details of how we use Grub and loop devices to boot from image to image seamlessly, every two weeks.