4 metrics for measuring DevOps success

March 18, 2015

We are all aiming to do things faster, cheaper, and with as few bumps along the way as possible. Rely on these four success metrics to help you achieve those goals during your DevOps improvement project.

Are you and your team taking the plunge into improved DevOps practices? Not sure where to start or how to know your improvements are working? Don't worry, this is a common problem, and metrics will be your friend!

In a lot of cases, organizations will make changes to their application lifecycle management tools and processes based on recommendations, without ever starting to measure what their current performance is, or how the changes affect that performance. It's important to make the commitment to continuous measurement, and to monitoring your organizational performance.

How do I know something is wrong?

When deciding to improve your DevOps processes, you have likely encountered one of the following issues:

Missed deadlines: Projects are just not getting to production fast enough and running much longer than expected
Site is always down: Quality of server configurations or application is suspect
Unhappy customers: Applications are not working the way the client expects
Long waits for small changes or fixes: Deployment complexity or “Red tape" in the process for production deployments means users are waiting a long time for simple operational fixes
Changes cost too much: Every time the business wants a change, the cost to develop and deliver the change is beyond what would be reasonable

If any of these apply, you probably are a candidate for improving your overall DevOps processes. Before making a change, you need to determine the problem areas for your particular organization and then begin measuring the current capabilities of the team. You will not be able to report effectively on improvements to the process without a baseline!

What can I measure?

It is best to have a variety of metrics that combine so that you can give your team an overall "score" of where they are right now. There are a variety of metrics that could be chosen from. The following articles outline a few that you might consider:

I believe a combination of the following 4 metrics will provide you with a balanced appraisal of your team’s capacity to deploy to production, provide operational fixes, develop new features, and the ability to provide a return on investment to the business.

#1. Mean time to recovery/repair (MTTR)

This measures how long it takes from when an incident is reported to when it is resolved. In general, this number needs to be trending downwards. This indicates both the responsiveness of your team as well as the capability to resolve and deploy solutions. Of note: you should probably be weighting data in this metric by severity or priority to avoid low-severity and low-priority issues from skewing your data. These types of issues are rarely resolved. You could also just filter out anything that is not considered a ‘high priority’ incident.

Teams optimizing for this metric will ensure that there is a short and rapid path through all environments to get a fix through testing and into production.

#2. Lead time

While the MTTR metric will help you monitor the team’s ability to react to customer support issues, the lead time metric will allow you to measure the time from start of development to deployment to production. You will want this metric to be as small as possible to highlight the team’s agility.

Teams optimizing for this metric will attempt to tackle smaller chunks of work and optimize the integration of the testing process to shorten overall time to deployment.

#3. Percentage of successful deployments

Others might track this as percentage of failed deployments and try to reduce the number of failures. However, I enjoy having positive metrics! Maybe that makes me a glass half full kind of guy.

Either way, the goal is to make sure that regardless of how often you deploy, what gets deployed doesn't cause a failure. A successful deployment is not just about avoiding an outage, but also about maintaining positive reactions amongst your customer base.

Teams optimizing for this metric tend to push smaller, less risky changes more often.

#4. Projects completed per quarter

Increasing the number of projects completed and launched to production per quarter allows for a faster return on investment made into a project. This is a great metric for reporting success to management on the team’s ability to execute. This can also be used to provide confidence that an investment in the team will generate results.

This metric leads teams to try to optimize for smaller batches to get more projects done in a shorter amount of time. Teams will also optimize towards delivery processes that support multiple projects at once to avoid the classic “project queue” in testing environments.

Europe

North America

Latin America

Asia Pacific

Middle East and North Africa