Day 21 - S3 Outage and Cloudbleed post mortems are out; Revenant!

I am happy to report that the MP3 files not being recognised by Rhythmbox was solved after apt-get install ubuntu-restricted-extras. I wonder what “extra” packages that package contains that need to be restricted. It definitely installed ffmpeg which is a great CLI utility to cut videos. Anyways, that problem is solved. I am still holding off on installing gnome-shell and trying that out because the mouse thing seems to have sorted itself out (?). It happened only once today.

This great article about Amazon Data Centers is right on point! This comes in the wake of that great S3 outage a few days back. The post mortem for that outage is out, and it was a mistake in passing an argument to an established playbook. Which is incredibly basic and could literally happen to any one. It’s like meaning to run rm -rf ../basic and then typing TAB after rm -rf ../b and pressing ENTER without checking if the basic was autocompleted or not. It’s so simple! It still happened, and that post mortem is definitely worth checking out if ever you write a script which has two conflicting, really close arguments, one of which might create some major problems. This is also a great lesson on designing parameter names in such a way as to make the info commands (like ls, ps ax) small, intuitive and incredibly fast but the rm-like commands like rm, deluser long and cumbersome (maybe not too much, but still something more than rm). I would definitely type trash instead of rm if that meant I will one day save myself from deleting an important directory. (This is like Minority Report!)

Talking about post-mortems, lets talk about Cloudbleed. The incident report came in very soon, and it was a terrible leak. A memory leak that just kept reading from memory because the HTML wasn’t formatted properly? COMMON! Who is going to be able to protect against stuff like that. More importantly, the bug itself aside, let’s talk about the steps that CF took to deal with it.

An additional problem was that Google (and other search engines) had cached some of the leaked memory through their normal crawling and caching processes. We wanted to ensure that this memory was scrubbed from search engine caches before the public disclosure of the problem so that third-parties would not be able to go hunting for sensitive information.

For some reason, this insight into search engine caching and reducing the impact by directly talking to multiple search engines and getting these caches purged seemed like something that CF thought of on the fly and managed to nail! This is some great engineering stuff!

We also undertook other search expeditions looking for potentially leaked information on sites like Pastebin and did not find anything.

AH HELL! They are so thorough. Incredible detail!

P.S. Revenant is on the watchlist. It’s a 2.5 hour movie, that’s some commitment right there. Gear up, get started!

POST #21 is OVER