Incident management 🇬🇧
Incident management is the process used by DevOps and IT Operations teams to respond to an unplanned event or service interruption and restore the service to its operational state.
An incident is resolved when the affected service resumes functioning in its intended state. This includes only those tasks required to mitigate impact and restore functionality.
These types of incidents can vary widely in severity, ranging from an entire global web service crashing to a small number of users having intermittent errors.
The need for scale-ups
Most of our clients are in the process of formalizing incident management. When the IT department consists of max. 10 people, incident management tends to be an informal process.
But when the organization grows - and usually, this happens very fast - the need for more structure grows:
- Business users expect to be informed about disruptions. 
- The impact of incidents and downtime increases. 
- OKRs may be defined based on service uptime or number of outages. To improve, you need to measure… and to measure, logging is required. 
- (IT) Audits require incident logs be kept. 
Benefits of having Incident Mangement include:
- Faster Escalation When Needed, and Faster Time to Resolution - With a well-defined, and well-used incident management process, application support becomes a natural part of your organizational culture. Incidents get resolved faster, more consistently, and in a way that reflects a best practice. 
- Encourages Training - Following the “I would rather someone else get up in the middle of the night and fix it” principle, an incident management process encourages cross-training both within the Dev team, and between teams. This has the side benefit of encouraging operational documentation and configuration management to be kept up-to-date, while emphasizing the importance of readability of code, and commenting. 
- Provides a Pathway of Growth for Junior Staff - We often forget to look back on where we came from in our rush to look forward. Teams also benefit from greater diversity of thought and opinion. An incident management process can encourage this by exposing every level of the escalation path to the application. Resolving incidents helps inform more junior members of the team. 
- Creates Better Overall Process - Paired with continuous integration and continuous delivery techniques, more deployments occur more rapidly. This drives incidents to reduce in volume and frequency. 
- Generates Quantitative Feedback - Each incident tracked and can be analyzed. It also helps an assessment of risk factors in operating the application. This can inform the application roadmap and also spur conversations on high-value, low-effort enhancements that can be implemented. 
- Develops Internal Tools - Once a team reaches a certain size, differentiation of duties will take place. Tools to operate the application that were previously niceties now become imperative for the organization to sustain its growth. An incident management process can bring to light not only this need, but also where to start when creating these tools. 
How to set up incident management?
Obviously, you don’t want to re-invent the wheel. As an example, the ITIL Incident Management process is displayed on this page. Our suggestion would be to adapt this process based on your requirements.
Start by aggregating incidents from (for example) your Vulnerability Disclosure program, uptime monitoring and integrate it with (for example) your ticketing system.
Make sure to classify tickets as “incidents” and to prioritize appropriately.
The next steps are rather obvious: investigation, escalation and resolving the incident. Finally, the incident is closed.
Further reading
Setting up your Incident Management Process? We’d be happy to help: as the Auditors of many of the Top 100 scale-ups in The Netherlands, we know what you’re dealing with! Don’t hestitate to contact us.
If you prefer to do your own research first, we can recommend the following articles:
