Defining what good looks like and identifying appropriate metrics means that you’ll know whether the service is solving the problem it’s meant to solve.GDS Service Manual
Collecting the right performance data means you’ll be alerted to potential problems with your service. And when you make a change to the service, you’ll be able to tell whether it had the effect you expected.
Just as digital service teams are measuring the performance of their products for the reasons in the above quote from the GDS Service Manual, it’s important also for technology departments and development teams to decide on what good technology looks like so that we can ensure the choices and practices we’re implementing are having a positive outcome for our users. When it comes to technology:
- Teams may work in silos and rarely share knowledge which can lead to an increase in the time taken to onboard new team members, along with a loss of knowledge transfer
- Products might not adopt best practices and common standards from the rest of your organisation, or the wider technology community
- Development and ops tooling, as well as your pipelines to production, may be inefficient or not automated
- Code can be buggy, inconsistent, hard to debug, have slow tests or no tests
Wouldn’t it be great if we could try and prevent these scenarios from happening as early as possible? Once we start to measure, we can start to see some important trends based on the data; both immediate shifts in those metrics for short term changes, but also overall a longer term picture. We’d like to see over that long term period that there is an upward trend where quality is improving. In the short term, if we see a downward trend, we can monitor that, but also try to proactively take action to remedy it and set it back on course.
A technology department-wide score
We work on multiple products for different customers and we need a centralised place where we can track these trends using metrics, so that we can compare between the product deliveries, to potentially identify areas where learnings can be shared between teams and as a result pull up lower scorers.
With each team reporting on their individual metrics as discussed below, we can calculate an overall average for that product delivery team. Combining these product delivery scores into an average, we can then gain a technology department-wide score.
All organisations with technology departments and/or any type of development team can benefit from recording a department-wide score, and by making it transparent, everyone can help and work towards constantly improving it.
What and how should we measure?
There are countless different metrics you could score your deliveries on, but we’ve focused on a few that are critical to delivering successful outcomes for our users, and that would help identify issues that could be easily remedied if caught early on, maximising impact and improvement.
Technology NPS (Net Promoter Score)
Why is it important? Teams should feel confident in the technology choices and practices that have been selected in order to have the best successful outcomes. Early on this can help to identify issues that could be resolved by a course correction, or in a longer period, uncover a degradation of quality with the product and it’s underlying technology.
What it means: Collect a score of between 0 and 10 from each member of the team asking “How likely is it that you would recommend the technology choices and practices selected for the product you are working on to a friend or colleague?”
Team capability support rating
Why is it important? Individual team members should feel as though they are supported in being able to deliver towards a product by implementing technology choices and following selected practices, and this can be in the form of coaching, mentoring or pairing. Unhappiness can lead to bad retention, or poorer quality of technology.
What it means: Collect a score of between 0 and 10 from each team member asking “How supported do you feel by the team in the ability to implement the technology choices and practices being used for the product?”
Productionisation (p15n) score
Why is it important? The p15n checklist is a collection of best practices that have been learned over the years, and have helped assist with consistency, supportability, maintainability and extensibility. There should be a good reason why the recommendations aren’t followed if they are applicable to a product.
What it means: Teams should be baking in p15n from the start of a delivery, focusing on items relevant to the stage of their delivery, and this should always be improving over time. A low score would help to identify a risk to the future and longevity of the service. The score against the checklist can be reported on weekly.
Why is it important? This is often a controversial topic. However, this should never be treated as a score on the quality of tests written for an application, as that is impossible to measure objectively, and that is completely down to experience in writing good tests. It’s better to reverse this view on test coverage and to consider what production code has no coverage whatsoever and what code is at high risk of breaking and not being detected. Read it as “20% of this code is not covered,” as opposed to “80% of the code is covered.”
What it means: Teams should be collecting code coverage metrics in their CI/CD pipelines when tests are run, and ideally a minimum should be set to reduce risks, but also to ensure if there is already high test code coverage, that it doesn’t start to slip.
Deployment frequency (deploys per day)
Why is it important? If deployment frequency is low, that could be a sign that there is not a clear path to production. This is one aspect of the Made Tech way. A reduction in deployment frequency can identify issues with the quality of a delivery, such as; a poor CD pipeline, too much WIP, backlog features not being owned and pushed through to production, organisational blockers, test failures, etc. All of these could eventually lead to unhappiness with both the team and users.
What it means: Once a deployment has occured to the various different environments, for example, edge, staging and production, increase the count of deployments against those specific environments.
Critical incidents (incidents per week)
Why is it important? Incidents can pull the team away from delivering on their current sprint, which means a slowdown in delivery pace. But also it is potentially a sign that quality of code and tests is slipping. This can cause a risk of dissatisfaction with users.
What it means: Teams need to take ownership of handling these incidents when they are reported and once resolved write an incident report. The number of these incidents can be reported on during fortnightly meetings with the wider organisation.
Mean time to repair (MTTR)
Why is it important? When an issue is detected, you want to have it resolved as soon as possible. Every second the issue is present, users are likely to be affected by it, resulting in a loss of confidence in the product, service and organisation. It can identify issues in team redundancy around getting issues resolved, CD pipelines, and deployment stages.
What it means: Teams could implement simple uptime monitoring around critical user journeys, as well as more involved smoke tests. The MTTR can be tracked either via the uptime monitoring tool, or between when a smoke test started failing, and once it begins passing again after a fix has been deployed.
Pipeline cycle time
Why is it important? This provides a good indication of how quickly changes can be deployed out to production once they have been committed. It can help to raise alarms over a slow CD pipeline, or a test suite which can affect your feedback cycle and MTTR.
What it means: The team could track this based on the commit hash within the repository; start and stop the timer at each stage of the pipeline once a commit has been merged into the integration branch in the CI/CD pipeline. After the commit has been successfully deployed out to production, calculate the total length of the timer in either seconds or minutes.
Age of technology
Why is it important? Ensuring that technical debt doesn’t accrue, and also keeping update paths easier, and as a result, things more secure, making applications more likely to be supportable and easily maintainable in the future. It’s a good way to ensure that some time in sprints is set aside to do a regular health check on age.
What it means: Ideally, you want to aim for a technology age of around zero. One potential way to track this is by tracking version numbers of languages and packages that you use, and then compare the difference between the most recent versions based on major and minor version numbers. For example, if you’re using Ruby 2.5 and the latest version is 2.7 that contributes 0.2 to your Technology age. Still using Rails 4.2? Oh dear, you’ve just got older and are now 1.2 years!