Hackney was keen to encourage collaboration amongst their teams, share information and raise awareness of the data available.
We helped to build a data lake, sorting captured data into 4 tiers: landing, raw, refined and trusted. On top of this we used Apache Spark to add a computation layer, using AWS’s Glue serverless Hadoop capability.
We tried to avoid doing anything overly complicated or bespoke because we wanted Hackney’s engineers to not have to spend 5 days a week working on the platform once we’d stepped away.
Hackney, like any local authority, is responsible for delivering a wide range of services and therefore uses a wide range of applications to collect its data. This meant there were many sources from which to pull data into the new platform, including databases, APIs, CSV files, Google Sheets and various other bits and pieces.
The major challenge was the sheer volume, with some services holding huge amounts of data. For example, the parking system alone holds millions of rows of data and is updated daily.
We also did a lot of work with Hackney to consolidate their housing repair data, which was then passed for additional processing and analysis. This dataset was then used by Hackney’s Link Work team so they could reach out to support residents at higher risk.
Hackney wanted to make sure the new data platform was designed to enable reuse across the public sector to help other authorities use their data to deliver excellent public services, so we used AWS infrastructure and open source technologies to build it.