"What do you want?" A question I had to find the answer to in order to preserve my sanity and my career.
We take data protection seriously at Datto, which is why we’ve been increasingly using mutual TLS authentication to secure communications between components in our application stack. Our use of Hashicorp Vault has accelerated this security pattern, as Vault makes it easy to deploy and manage multiple CAs. Recently, we saw an increase in TLS-related errors for one of our mutually-authenticated application endpoints. In this article, I’ll walk you through how we debugged and resolved this problem. I’ll also take you on a deep dive into reproducing this issue, and I’ll hopefully teach you some fun OpenSSL commands along the way.
Recently the Datto SaaS Protection SRE team was met with a challenge to add authentication onto an open source web application that didn’t have a strong authentication story with it. We knew that we didn’t want to write an entire authentication layer just for this one application as the return on that time investment would be rather low. Instead we looked for a solution that would be easy to implement, easy to automate, and easy to understand months down the road long after the shine had worn off and was just another application we managed.
We’ve all had a hard drive fail on us, and often it’s as sudden as booting your machine and realizing you can’t access a bunch of your files. It’s not a fun experience. It’s especially not fun when you have an entire data center full of drives that are all important to keeping your business running. What if we could predict when one of those drives would fail, and get ahead of it by preemptively replacing the hardware before the data is lost? This is where the history of predictive drive failure at Datto begins.
Upgrading thousands of servers is challenging and filled with uncertainty. This article describes how we leveraged Ansible to build automation that increases confidence in our upgrade process.
Rebooting Ubuntu is hard. I don’t really know why, but in my twelve years as an Ubuntu user, I’ve encountered countless “stuck at reboot” scenarios. Somehow, typing reboot always comes with that extra special feeling of uncertainty and the thrill of danger. This post describes the short story of how we managed to make Ubuntu machines reliably reboot using Linux watchdogs.
Learn how Datto manages the rollout of trusted root certificates to a fleet of hundreds of thousands of devices without causing a single failed backup!
Datto has a lot of great products in its arsenal that only keeps growing as we acquire more businesses. Because of this, integration is very important and a key item of that is user experience. To accomplish this, we knew we had to streamline our user interfaces. As product designers, we are advocates for the user and have a proclivity for good aesthetics. So for us, this was not only a time to stretch our visual design muscles but to also improve the usability across our products.
Tutorial on how to construct ROP chains from difficult ROP gadgets in ARM assembly.
Over the past 2 years, Datto has been working on how we could collect consistent data from our entire fleet of hosts. In this post, we'll discuss how we leveraged OpenTSDB to collect nearly 1 million metrics a second across our infrastructure.