For my current assignment, I’m currently busy in the area of IT Risk. I agree it isn’t the most exciting area compared to building the next state-of-the-art app or website, but for a client where a disturbance of IT Service leads to news items on major news websites, there is a great drive to become as stable as you can and limit IT Risk as much as possible. Of course, I cannot go into too many details about the metrics and how things are going, but in this blog post, I want to tell you about things I’ve observed and give some takeaways about my role in this part of the program: Dashboarding.
It all started early last year when the management team, comprised of the different CIO’s of the IT domains, decided there was a need for a program to give insight into the level of status of IT Risk commitment on all of approximately 2,000 applications; and if not on the right level, try to get it within a certain desired level. The program would measure the level of 'OK-ness' on a number of areas based upon particular metrics, of which the management team had set end-of-year targets for every metric specified.
As you can imagine, not every application is as important as the other one. Some are supporting business critical processes. Therefore, the applications were divided into three categories based on their risk score. All of the most critical applications needed to score an 'OK' on every metric as soon as possible, but for lower risk applications it was allowed to not (yet) score an 'OK' for every application. Targets however were set tight. The different CIO’s then committed to the targets and then the work began!
Now that I've provided some high-level background on the program, here are the things I’ve learned, observed and experienced:
Keep to the - Generally Accepted - Truth!
One of the metrics had something to do with the usage of out of support software on the servers of which the applications were linked to in the Configuration Management Database (CMDB). I’ve had quite a few discussions with people who disagree with servers being connected to their application because:
- Already had a newer version of the software package installed
- The server should have been turned off and decommissioned
- The server wasn’t part of their application at all
Of course, it would have been possible to create some sort of exception list to already in-process changes, but that would have been wrong. The CMDB is considered the single source of truth, and if a server is connected to the application, the server belongs to the application. If that is not okay, the CMDB should have been corrected. That way, you improve quality not only for this program, but for every other system making use of the CMDB data.
You really don’t want to end up in discussions over and over again about definitions and how things are being measured. This is just wasted energy and distracts from the goal: Getting more in control. Try not to change definitions too often, and if they have to change, allow for implementation time.
Reporting Is Good, Giving Insight Is Even Better
The dashboard is primarily reporting the percentage of the applications in control. The next question everybody is asking: 'What do I still have to do?' We really received positive feedback on a simple overview with the list of applications and per metric using a valid/not valid indicator. It provided a different view of the data, but for the more action focused managers, it served as a great view to hang on the wall in an Obeya room.
Try to automate generating lists as much as possible (or give people the ability to refresh for themselves). You just know when you build an overview for a senior manager which is really useful for him, that you'll need to give updated data on a frequent basis. Automation is really useful in delivering consistent quality. Manual work is fine, but sometimes error prone - no one wants to send an email full of corrections an hour later.
Show Refresh Dates
It is just unnerving when someone is reviewing numbers, sending out actions and someone else is looking at the same overview - but has different numbers and questioning person number one. Why? List number one was forwarded three times, and the second one a fresh list containing the most actual numbers. Or were gleaned from PowerPoint presentations built a week earlier. When people get numbers, they should be able to confirm the date when the numbers were measured. Same goes for the source data. For one metric, the process of scanning and processing into a report can take a while (over a week). So, when I report a number of overdue actions based on that particular scan, I really want to let the other person know when the scan has run. That saves a lot of explaining about why the dashboard still reports an overdue, if only two days ago, someone had just taken all of the necessary actions. Do I really want to shorten the process of scanning and reporting? Of course! But one step at a time - better show numbers early with lag and try to change the process in parallel, then try to change a complete business process first and only be able to show numbers late.
Prepare for Busy Times Around Deadlines
Just like any other project, when a deadline is just a few days away, everybody is becoming busier than ever to try to make the target or keep (above) target. Prepare for emails/calls with requests if you can change data right away, based on an approval of an IT Risk auditing department, and send a screenshot of the updated dashboard so it can be shared in a progress meeting (which 'starts in 5 minutes'). People are also getting more creative when the normal plan isn’t working, as always, the last steps are the hardest, so be aware that some guidelines aren’t stretched too thin.