Day 21 - S3 Outage and Cloudbleed post mortems are out; Revenant!
10 Mar 2017 100daysofwriting · computers · movies · post-mortems · technologyI am happy to report that the MP3 files not being recognised by Rhythmbox was
solved after apt-get install ubuntu-restricted-extras
. I wonder what “extra”
packages that package contains that need to be restricted. It definitely
installed ffmpeg
which is a great CLI utility to cut videos. Anyways, that
problem is solved. I am still holding off on installing gnome-shell and trying
that out because the mouse thing seems to have sorted itself out (?). It
happened only once today.
This great article about Amazon Data
Centers
is right on point! This comes in the wake of that great S3 outage a few days
back. The post mortem for that outage
is out, and it was a mistake in passing an argument to an established
playbook. Which is incredibly basic and could literally happen to any one. It’s
like meaning to run rm -rf ../basic
and then typing TAB
after rm -rf ../b
and pressing ENTER
without checking if the basic was autocompleted or not.
It’s so simple! It still happened, and that post mortem is definitely worth
checking out if ever you write a script which has two conflicting, really close
arguments, one of which might create some major problems. This is also a great
lesson on designing parameter names in such a way as to make the info
commands
(like ls
, ps ax
) small, intuitive and incredibly fast but the rm
-like
commands like rm
, deluser
long and cumbersome (maybe not too much, but still
something more than rm
). I would definitely type trash
instead of rm
if
that meant I will one day save myself from deleting an important directory.
(This is like Minority Report!)
Talking about post-mortems, lets talk about Cloudbleed. The incident report came in very soon, and it was a terrible leak. A memory leak that just kept reading from memory because the HTML wasn’t formatted properly? COMMON! Who is going to be able to protect against stuff like that. More importantly, the bug itself aside, let’s talk about the steps that CF took to deal with it.
An additional problem was that Google (and other search engines) had cached some of the leaked memory through their normal crawling and caching processes. We wanted to ensure that this memory was scrubbed from search engine caches before the public disclosure of the problem so that third-parties would not be able to go hunting for sensitive information.
For some reason, this insight into search engine caching and reducing the impact by directly talking to multiple search engines and getting these caches purged seemed like something that CF thought of on the fly and managed to nail! This is some great engineering stuff!
We also undertook other search expeditions looking for potentially leaked information on sites like Pastebin and did not find anything.
AH HELL! They are so thorough. Incredible detail!
P.S. Revenant is on the watchlist. It’s a 2.5 hour movie, that’s some commitment right there. Gear up, get started!
POST #21 is OVER