In case you are an everyday listener to the podcast or an e-commerce watcher, you understand “the season” is essential for us. It’s a yearly recurring theme within the podcast. You’re in all probability additionally conscious that uptime and responsiveness of our app and web site are essential. And also you may need observed that enabling our software program engineers to carry out at their peak is essential for us. Enabling groups and engineers is what we do to construct a fantastic place to engineer.
And generally issues simply go bitter. An ideal storm happens that’s positively not a tailwind…
As our CTO will say “by no means waste a great disaster”. We have now to be taught from what occurred. Let’s discover a kind of incidents. We return to the season begin of 2019. Simply earlier than the beginning of the Friday Afternoon Drinks, an enormous incident began in our Android App. This triggered downtime in different areas of the platforma as effectively. And possibly similar to when investigating a aircraft crash there isn’t just one factor that was off however a sequence of unlikely issues occurred in a brief span of time. Let’s dive into this.
What the episode covers
- Why is studying from failures an vital subject to share?
- Some context, what a part of the panorama are we speaking about within the episode?
- What was your perspective? What have been you doing and what occurred?
- Taking a couple of steps again: What was the method of incident administration and the way did we step-by-step repair the difficulty?
- When the mud settled: What did we be taught? What did we enhance?
- Julius van Dis – Full-Stack engineer at Flock. He was accountable for the app, particularly its direct backend. Among the initiatives he has carried out embody making the app and repair panorama multilingual, the migration and integration of a brand new gateway, creation of a basket API and improved app updates.
Peter Paul van de Beek