Our backup devices are typically physically located inside the LAN of our end users. Under normal circumstances that means that they are behind a NAT and are not reachable from the public Internet without a VPN or other tunneling mechanisms. For our customers, the Managed Service Provider (MSP), only being able to access their Datto devices with direct physical access would be a major inconvenience. In this post, we talk about how we implemented "Remote Web", a feature that lets customers remotely access the device, even when it is behind a NAT.
Datto backs up data, a lot of it. At the time of writing Datto has over 500 PB of data stored on ZFS. This count includes both backup appliances that are sent to customer sites, as well as cloud storage servers that are used for secondary and tertiary backup of those appliances. At this scale drive swaps are a daily occurrence, and data corruption is inevitable. How we handle this corruption when it happens determines whether we truly lose data, or successfully restore from secondary backup. In this post we'll be showing you how at Datto we intentionally cause corruption in our testing environments, to ensure we're building software that can properly handle these scenarios.
In this post, we describe how we moved from Debian-based deployments in our fleet of >80,000 devices to image based upgrades. We show the nitty gritty details of how we use Grub and loop devices to boot from image to image seamlessly, every two weeks.