Book Cover - ULTIMATE GUIDE TO RAID LOG

Try the app for free

RAIDLog.com improves on the tried and true RAID log by rebuilding it on a modern SaaS platform. Use it alongside your task management platform to ensure your plan goes to plan.

About the book

The Ultimate Guide to RAID Log
by Kim Essendrup

This book will introduce you to RAID logs and help you learn how to use them so your projects can immediately benefit.

© Copyright 2022 Kim Essendrup

Excerpts from the Ultimate Guide to RAID Log may not be copied, reproduced or distributed without author’s permission

Lessons Learned review of Issues

Ideally, we want to learn from our experiences. This is where Issue logs can provide long term value to your organization. Every issue is an opportunity to learn and improve your planning, leadership and your organization. 

You can take lessons learned from your issues as you manage them by creating a separate column to identify the lessons to be learned from each issue. Alternatively (or in addition), you can perform a formal issue review at the end of the project, analyzing all the issues your project encountered and identifying what can be improved for next time. 


Case Study Conclusion: Surviving the Fishbowl

Meanwhile, back in our fishbowl, my customer executive stakeholders were waiting for an answer. “What is going on?! Why is our call center down?” I remember looking over at my lead engineer who typed furiously on his laptop trying to diagnose the problem, afraid to look up in case he’d be asked a question he couldn’t answer.

I took a deep breath and looked my stakeholders in the eyes. “The system is down and no inbound calls are being received. Ongoing calls have not been disconnected but no new calls can come in. We recommend performing a ‘failover’ to the backup system. We think this will open up the system to take new calls, but it will terminate current, ongoing calls. Do we have your permission to do so?”

Note that in doing so, we were already giving our stakeholders some degree of power over a situation that made them feel angry and powerless, by asking their OK to perform the failover.

Our customer’s team talked briefly among themselves, asked me a couple questions then gave the OK. My engineer triggered the failover and nobody in the room breathed as we watched the system dashboard, waiting to see if calls would start going through again as the backup system came online. Fortunately it did and we all collectively sighed. Only for a moment. Then the shouting resumed. 

They demanded to know why the system went down and what we were going to do to fix it. At that point, we had no idea what caused it, no idea how to fix it, and no idea if it was going to happen again in 5 minutes. 

My response? “We don’t know the root cause or if it will happen again. We are going to restart the primary system in case we need to fail-back again, and we are escalating to our product team and executive management. We don’t expect to have a root cause in 30 minutes, but I will give you an update at that time, in this room.” 

This satisfied our customer enough to let us get to work. In 30 minutes we gave them an update on what we were doing, our next steps, and the fact that we didn’t have a root cause identified yet. We met again every 30 minutes that day until we had a potential fix identified and a plan to implement it. 

Every action item, every 30 minute update and every finding we had was documented in the Issue log. This way, our customer could reference it as they communicated to their stakeholders in the organization. This also gave our internal team – on-site and in the remote product support team – a  common, accurate and up to date understanding of where we were, what we had done and what the next steps were. 

48 very long and sleepless hours later, we had implemented one hotfix which failed spectacularly, had another emergency failover, and finally implemented a successful fix. Although those were 48 very difficult hours which we would have been happy to avoid, the fact that we actively managed the issue and communicated so well gave our customer confidence that we were going to resolve the issue. Looking back, it was clear that our successful management of this issue ended up building a stronger long-term relationship with the customer. And I still feel that a key part of that was our use of an Issue Log to track and communicate that issue. 

RAIDLOG

Together, we can run or rescue any project