https://hmrcdigital.blog.gov.uk/2019/06/18/why-we-held-a-hack-day/

Why we held a Hack Day

Two casually dressed males sitting at computers workingHello. I’m Ben Conrad, and I work on the Multi-channel Digital Tax Platform, MDTP, or what we refer to as simply ‘the platform’ most of the time. We provide, maintain and build the infrastructure upon which all of HMRC’s public-facing digital services sit. We recently held our first Hack Day, this blog is about why and what happened.

Busy Teams

If you have ever used the HMRC Mobile App or logged on to check your Personal Tax Account, then you’ve used services that run on our platform. All our infrastructure is defined in code and runs on a large-scale cloud provider.

We have several teams who all look after different aspects of the platform. For example, one of our teams works on the pipelines we have developed to ensure that the people writing the digital services can deploy changes to their code rapidly and successfully. Another team manages the collection of the billions of logs and metrics that are generated each day and makes sure they are available to be queried, graphed, analysed and used to send alerts if something goes awry.

Our engineers are highly skilled, experienced proponents and practitioners of DevOps – bridging the divide between traditional IT development and operations teams, and being responsible for the resilience of the platform. They are also very busy people, who work hard to ensure that we provide the best possible service for our users. They will constantly have ideas about how services can be changed and improved, these will be discussed by the team, and a spike will be conducted to understand their value to HMRC.

Ideas that could get lost

Sometimes, the ideas aren’t suitable for a spike, usually, because we’re too busy to spend time working on ideas of uncertain value that might not come to anything. However, there is often something in those ideas, or the exercise of testing those ideas, that is really valuable.

Hack Day

That is where the Hack Day comes in. A Hack Day is a day set aside to ‘hack’ things, in the traditional sense of hacking . In the preceding weeks, we asked engineers to put forward ideas, and then gave them three minutes to pitch each idea to their peers. The engineers then chose which idea to work on for the day, and small teams coalesced around them.

On the actual day, each team worked as fast as they could to develop the idea and build something that could be demoed for everyone else to see.

Here are a few of my favourite ideas that came from the Hack Day:

What calls what?

We have more than 800 microservices running on the platform.  Any given microservice is likely to be dependent on a number of others, and may well itself be a dependency for others. By using configuration files and logs as a data source, the team graphed those dependencies to allow easy visualisation of these connections.

A cluster diagram of microservices

Visualise the traffic

Similar to the concept above, we wanted to create a visual map of components on the platform and then, using this skeleton, plot the traffic flows in real time. Rather than highlighting the traffic between microservices, one benefit of this approach would be to show where errors are being generated.

A list of all the components on the multi digital tax platform

A chaos generator

A couple of times a year we run a Chaos Day, where we charge a team with trying to break the platform. So far we have only ever done this in non-production environments. On a Chaos Day we try quite hard to come up with unexpected and creative ways in which things could fail. On this Hack Day, one team tried to come up with a chaos generator, which would bring a degree of randomness. To do this, they used a games console emulator, and connected it to a Kubernetes cluster. The code was then written so that when a life was lost in the game, a random pod on the Kubernetes cluster would fail (and hopefully automatically recover).

We’ll hack again, don’t know where, don’t know when

The teams learnt a huge amount on this Hack Day and it is something that we will definitely repeat, with a few improvements to the format. Several of the ideas that were investigated during the day showed a lot of promise, and are likely to make it on to the platform in some guise later this year.

Share this page

1 comment

  1. Comment by Tim Simpson posted on

    Great work guys. Look forward to hearing more about your development approach and experiences, especially when you start doing your chaos engineering in the live production environment.