The hub of all knowledge


Can you imagine storing and managing over 10 million filing cabinets worth of customer data? Then working out how to keep that data relevant, correct and useable?  It’s a bit of a conundrum but since 2014 we’ve been working on a solution called the Enterprise Data Hub (EDH) and we’ve just reached a major milestone in its development.

To give you a bit of background, at the moment our customer data is spread across 11 separate data warehouses with some information only available to parts of HMRC, so cross referencing can be tricky.

EDH will bring all our customer data into one usable place. It’ll let us use technological advances to store and analyse customer data using freely available open source tools and commodity hardware. It will save money and give us new ways of interrogating data. We’ll be able to work smarter, quicker and ultimately increase our tax revenues.

Our technical solution is Apache (Hadoop), an open-source software framework for storage and large-scale processing of data sets on clusters of commodity hardware.  Instead of relying on expensive, proprietary hardware and disparate systems to store and process data, this enables the distributed parallel processing of huge amounts of data across relatively inexpensive, industry-standard servers that both store and process the data, and can also scale to accommodate very large data volumes.  It can handle all types of data, including structured, unstructured, log files, pictures, audio files, communications records and email.

To protect our customers data we’ve incorporated a world leading technique called Tokenisation. Tokenisation is a reversible method for replacing sensitive data with non-sensitive ‘tokens’. It provides similar security benefits as encryption, but retains the vital usability of data for our business processes. This gives us unprecedented capability to securely manage customer information from one single control, and to tightly control access to detokenised data giving greater protection to vulnerable or at risk customer groups. This is a world first, and we know that a number of banks are very interested in what HMRC are doing, and want to use a similar solution themselves.


So where are we at with all this and what’s the major milestone?

Well, we’ve just reached the point where we can securely upload customer data from anywhere within the department which may not sound like much, but because of the complexity of how our data was stored before and our determination to develop a secure approach, it’s a massive step forward.

This means we can move onto the important tasks of migrating over data and services and use EDH for our key transformation projects. We can start bringing data tools together across HMRC and identify future opportunities to analyse and use our customer data smarter. We’ll be able to combine data and customer analysis to improve services and customer experience.

And on a practical level we’ll start making significant savings as we decommission each of the data warehouses.

Transformation of our analytical capability will be a game changer in terms of HMRC’s digital ambition and achievement of our revenue generation objective. There’s also potential for cross government transparency with UK economic benefits and shared use of cloud storage.

The possibilities are endless.

Nigel Green 1  

Nigel Green is HMRC Director of IT Delivery. Nigel spoke about EDH at the recent Technology Network Leaders event, you can find out more on the GDS Government Technology blog

Share this page