JIM STAMP: Hi everyone. Welcome to our little chat between Lisa and myself this morning. We’re going to wait a minute or two for people to join. I think we’ve got a few registered. We’re recording this as well so that people can watch this back. We’ll wait a minute or two. It’s always interesting to look at the attendee list as it builds, see who you recognise.
We’re planning to just talk through some lessons learned from building the data platform at Hackney, this morning. Feel free to post questions in the Q & A. Anything that comes up as we talk, please ask, post in there. We’ve got plenty of time at the end of the session to go through questions. We’re more than happy to answer any questions that you have. Lisa and I can both confidently talk for a long time on all of the topics that we’re planning to talk through this morning.
LISA STIDLE: And you get us together, and that’s even worse!
JIM STAMP: It’s horrendous, isn’t it! We’re famous for just talking and talking. Our half hour-long sessions usually stretch to at least an hour if not more. We’ve reserved a good chunk of time at the end of this session to go through your questions. So please feel free to post those. Any questions that come up if you’re watching this after the event, there is contact information at the end of the presentation. You can talk to us on LinkedIn or go to websites. There is contact information for any questions so please feel free to contact us.
I think we have slowed down on participants joining, so if you’re happy, Lisa, we’ll make a start. Cool.
Welcome. We’re here to talk through the building of the data platform at Hackney. How Made Tech was involved, and what Lisa has learnt through the fairly interesting journey that we took to get there.
Quick introductions. My name is Jim Stamp, I am the Head of Data here at Made Tech. I generally look after all of our data projects from, I guess, an architectural standpoint, but also making sure that everyone that we have in the data capability has the skillset that Lisa and our other customers require. That’s generally what I do but also outreach, doing architectural discussions, joining workshops etcetera. Lisa, if you wanted to do a quick introduction to yourself?
LISA STIDLE: Sure. Hi everyone. I’m Lisa Stidle, and I am the Data and Insight Manager at Hackney Council. Just to give you a bit of background about the Data and Insight team, we are a centralised team within ICT at Hackney. We have responsibility for data analytics, data engineering, GIS and master data, including looking after our LLPG which is our address database.
Why we are here talking about the data platform is really that the origins of it were from our cyber-attack. We experienced a serious cyber-attack in October 2020 which I’ll talk a little bit more about. That kind of sparked off this whole project for building a new data platform in Hackney.
Our vision for the data platform is to use that opportunity of the cyber-attack to build back better for the future. To deliver a secure, scalable, reusable cloud-based data infrastructure that brings together the Council’s key data assets. That will enable us to democratise data across the council, use technology to drive deeper insight from our data and ultimately, improve the lives of the residents of Hackney.
JIM STAMP: Right, thank you. Can we have the next slide please? Right, quick agenda. We are probably going to flip between just sharing the screen and having Lisa and myself talking to one another. We’ll go through this agenda, and as we move through each point, we’ll flash up the screen and then take it down again.
The plan this morning is to talk about the origin of the project – Lisa has touched on that slightly there but we will go into a bit more detail. What we’ve learned as we’ve built it, both from the Hackney side and the Made Tech side. Then what’s next for Lisa’s team and where they see it going, and a few surprise questions from Lisa, I’m sure, as we go along, as well. Then on to questions at the end.
So yes, the origins of the project. If I remember rightly the original problem statement was more about recovering the data. The bid was originally around recovering the data after the cyber-attack. Did you want to talk a little bit about what that meant, and why you had to recover the data?
LISA STIDLE: Sure. As I mentioned, we had a serious cyber-attack in October 2020. Actually, before that, I want to jump to what our data infrastructure looked like beforehand, to give you an idea. Then what the impact was of the cyber-attack.
At Hackney, we were fairly advanced in terms of local government. We had a really mature data warehouse, lots of people around the Council using business intelligence tools. We were experimenting with machine learning models. One of the use cases was looking at identifying illegal houses of multiple occupation. That information was actually being used in the field to knock on the right doors.
During the beginning of the Covid 19 pandemic, we brought together around 30 different internal and external data sets to try and identify our most vulnerable residents, so that the Council could proactively make contact with them. We’ve got a lot of interest from other local authorities; a lot of councils have done similar things but I think we were fairly quick in being able to pull that information together because of all of the groundwork that we had done before.
I think as a team and as a council, we were in a really good place with our data before the cyber-attack. Then it happened, and most of our data and our IT systems that were creating that data were not available, which really had a devastating impact on the services that we were able to provide and the work that we do, as well.
What that meant for us that was that we really had to build our data infrastructure from the ground up again. Most of our infrastructure previously was using on-premise servers that Hackney maintained. We knew generally we didn’t want to go back to that. We wanted to migrate everything into the cloud and again, use that crisis as an opportunity. We knew that we couldn’t just put back what we had in terms of our data infrastructure. We had to do something differently. It also meant that we have a fairly disparate and at times chaotic data landscape, where we’ve got recovered sets of data, we’ve got interim data that has been created whilst systems were unavailable. Lots of Google Sheets, lots of Google Forms being used to run services – needs must. Then we’ve got new sets of data because we are also using opportunities to not bring back applications that we didn’t think were meeting our needs in the first place.
Universal Housing is the prime example. That was an application that was used by the Housing team before the cyber-attack. It was going to have to be phased out anyway because it was going out of support. So, Hackney has been using that opportunity to build our own applications. That means we have entirely different data sources than what we had before the cyber-attack.
It also meant that the work of the Data and Insight team looked completely different. We had to go into crisis mode. Most of the work in the aftermath of the attack was around helping people to try and collect information in a way that was fairly structured and could be usable in the future. So – a lot of helping people to design forums and understanding what they had done to their Google Sheets and things like that. So, taking us away from ‘Let’s see if we can use a machine learning model to predict this’ and going back to ‘How do you design a Google Form?’. The work of the team drastically changed and the priorities of the Council drastically changed as well. Everyone was just focused on recovery. Let’s get back to business as usual. Not really thinking, ‘What can we do in the future? How can we use this opportunity to leapfrog and get back to a better place, not just back to where we were?’. All of that was really driving where we were coming from in setting up the project in the first place.
JIM STAMP: It was interesting from a Made Tech point of view because we were working with you before and after the cyber-attack, as part of the cloud migration. It was interesting just seeing the focus of the organisation flip. It was a binary flip onto a completely different focus. For myself, joining Made Tech around that time and then joining the data project, just seeing the panic, it felt really interesting to see you all not really know how to do what you were hoping to do. Just trying to map.
Why did you choose to go with the data lake? As I said, I think our response to ‘How do we recover data?’ was along the lines of; actually, we don’t want to recover your data, we want to help you build something that you can use to recover your data, but also do other things with. I was just interested in why. Was there a reason why you went with that versus – I don’t know what the other responses were – but a bit about why you chose that?
LISA STIDLE: Sure. I don’t think at the outset we were specifically looking to implement a data lake but it actually met the needs of what we were trying to do. We were trying to migrate everything to the cloud and use new cloud-based tools. We were also looking to use open-source tools and move away from doing all of our ETL in our BI tool which is Click. So that in future, if we wanted to make a different decision on our BI tool, we could lift and shift that work elsewhere.
We knew we wanted to be able to serve that data easily within different tools, be that Click, be that Google Data Studio. That was difficult in the previous ways that we were doing things. We knew that we wanted something better than we had before.
I mentioned the machine learning work that someone on our team was doing around identifying illegal HMOs. We didn’t have an environment where we could productionise that model and actually put it into use. We knew that we wanted something that would enable us to do that. A data lake ticked all of those boxes. But it wasn’t just about the lake either. It was also around the ways of working that would encourage. Trying to address some of the issues – perennial issues for any data team, really. People around the Council don’t know what data is there. There’s no metadata. I don’t know what this data means. I don’t know if I can share my data. It’s really difficult for me to create a data extract to share with this other person.
Also, standardising the skills and the tools that Analysts around the Council have. So, I would say that other proposals generally were trying to take us back to where we were. In saying, ‘We’ll get you off spreadsheets’. We already know how to get ourselves off spreadsheets. We want to know how we move forward from where we were already, as a very mature data team.
JIM STAMP: Interesting. Right, shall we move on to the next bit? I think we’ll get stuck if we don’t move on. So, yes, moving on to what have the team learned. That’s the bit I would like to focus on the most. I guess personally, what have you learned? And what do you think your team have learned? Then again, I guess we could extend again to what has Hackney learned through doing this project?
LISA STIDLE: Yes. I think this is a really big question. I’ll just touch in broad strokes, but you can probe me if you want more information. I think this is true for me personally and for the team. We’ve learned a huge amount in terms of subject matter expertise. Me in particular, I don’t consider myself a very technical person. When we were starting out, I had to Google, ‘What is the data platform?’.
Because a lot of people come from a Data Analyst background, not from a particularly technical background but from more of a research and statistics driven background. That’s my background. So, I’ve learned a huge amount about the different tools that we use. Working in AWS, all of that before was such a mystery to me, when I would hear all these different service names and had no idea what these things do.
Working with different data sources. We’ve got a huge variety of different data sources that we are trying to access with our data platform. That’s been a real learning curve about what’s easy, what’s really difficult, what do we really need to press our suppliers on to make sure we get that access to data?
Then there’s also just different coding languages that I’m not doing in particular, but the team are definitely learning. Some of the team were already using Python but now we are having to learn how to use PySpark, and think about how we process things in a different way. Terraform – a completely new language and concept.
That also brings me onto the ways of working are completely different. As Data Analysts, we have had to become a lot more like Developers, and learn from the best practise there.
JIM STAMP: I have to say, your team have effectively moved from being Data Analysts to being Data Engineers. It’s one of those projects where you see such a change in a set of people. There’s a couple I won’t mention by name but there was a day I remember, it must have been about a year ago, when someone said, ‘I really enjoy doing unit testing and quality testing’.
LISA STIDLE: Yes.
JIM STAMP: For an Analyst, personally, I’ve never heard an Analyst enjoy doing testing. It was such a moment of me just going, this is a team that is maturing in their data literacy and their data capability. It’s just so much fun to see that happening.
LISA STIDLE: Yes, absolutely. We’ve learned a huge amount about the importance of documentation, which is something we always said we wanted to do but we hadn’t been very good at doing. Version control, peer reviewing each other’s code, why it’s important to have a staging and production environment. All of those things we kind of knew from our other colleagues within ICT but had never really adopted those ways of working.
Finally, personally, I think I’ve just grown a lot as a product owner and leader. At the very beginning of the project, there were a lot of requests coming into the team that I didn’t really feel confident in pushing back on. Whereas now I look at that and I would handle that completely differently. I definitely would.
JIM STAMP: It was fun at the start.
LISA STIDLE: Yes.
JIM STAMP: I guess that leads on nicely to what would you have done differently?
LISA STIDLE: Yes. That’s one. The first is that the initial use cases that we chose were the wrong use cases. I don’t completely know what would have been the right ones, but I know why the ones we chose were particularly difficult.
We had two different use cases. One was our strategic use case which was an effort to get back to that vulnerability data set that we had put together before Covid. That was just too big in scale so we decided to focus on repairs. Actually, we were trying to work with a service and convince them about why they needed this and how it would be useful. So, we had to do both the Why and the What and the How. I think we really should have tried to look harder for a service that had a use case ready and was ready to use. That would have given us a lot more momentum.
JIM STAMP: Yes.
LISA STIDLE: The other use case was what we called our tactical use case, which was working with Data Analysts in Parking to redevelop their data warehouse. I think it was a bit too early to bring in Data Analysts from outside of the team to start using the platform in the way that they wanted to be using it.
The team itself, we are flying the plane as we are building it, but then bringing Parking into it, meant we also had passengers who were sitting in First Class, wondering where their glass of champagne was. It’s been a challenge. That’s one main thing.
The other is around discovery. We didn’t actually do a proper discover at the beginning of the project. Because of where we were as an organisation, because of the urgency that we felt – we need to recover this data, we need to make it accessible – that we just launched straight in, and we didn’t do proper user research. Really, it wouldn’t have made an impact on the project to spend two weeks doing a discovery.
JIM STAMP: No.
LISA STIDLE: I think that would have really benefitted us.
JIM STAMP: I think that’s my biggest regret, not pushing harder for that. I think that when we started, completely understandably, there was a learn-by-doing mentally applied to the project. That was an obvious response to where we were as a team with the data that needed to be recovered and how quickly. The pain was being felt immediately and we had to fix things. I think by just launching straight in to the first thing we identified as a group as – this is a doable thing, we should try this – yes, I think we should have pushed harder for a discovery. I should have pushed you harder for a discovery. It’s the normal way of doing it.
If we had spent even a week just looking at the options, talking to a few teams, finding a service owner that had a proper problem that we could help fix, rather than having to do that internal sales pitch on – we can help you with a problem that you don’t really know that you’ve got as a problem – that was hard, that was really hard.
LISA STIDLE: Yes. I don’t know for sure if we would have found one. As an organisation we were just so mentally focused on getting back to business as usual, recovery, that coming to someone and saying, ‘Would you like to use your data a little bit differently’, or ‘Would you like to change your operating model?’, that just wasn’t a very appealing prospect for my colleagues.
JIM STAMP: Yes, I think you’re right. There was an aspect of shell-shock in the organisation when we first started. It might have been hard. But I think sitting back and at least saying this is the type of shape of project that we are looking for, and maybe having two or three ideas that we could have looked at and chosen the best one.
Repairs ended up being a good project but it was a lot of work. There was a lot of work to get it up and running and to get it to a point where people could start using it.
I think it burnt the team out quite a lot, didn’t it?
LISA STIDLE: Yes. Another thing – again, I don’t know how we would have done differently but maybe someone else could do differently if they have a different organisational context to what they are doing – is to really think about the make up of the team.
I think when we started out, I thought people in Data and Insight might become the data platform engineers. Not Data Engineers, but the Platform Engineers who would maintain and build the platform because they might just enjoy that work. But it became clear that we want to be the superusers, we want the data platform, we love the data platform but we don’t want to build it and maintain it. That is a different skillset and a different person.
I’m sure like many other organisations, we have difficulty in recruiting Developers and Software Engineers.
JIM STAMP: Never mind Data Engineers.
LISA STIDLE: Yes, never mind Data Engineers. We didn’t have the right Hackney people on the project to really get that knowledge transfer. I think Data and Insight have learned a huge amount. We now have a lead Hackney Developer on the project. If he had been on from the start, it would have looked a lot different, I think.
Data Engineering as well, we have a Data Engineer with us from Made Tech right now, a really experienced Data Engineer. The Data and Insight team is really enjoying working with him. We just think, ‘Oh my gosh, if we had have met him at the beginning, that would have been brilliant.’ But we had already been looking for that person and to bring those skills on, but they are really difficult people to find.
If you can get those people, definitely make sure you’ve got internal Software Engineers, some really experienced Data Engineers on your team.
JIM STAMP: Yes, absolutely. I completely agree. I think having a team made up almost completely of Data Analysts meant that we had an additional challenge. The learning curve was phenomenally steep, frankly. That probably changed what we could build so significantly, at the start because it was a case of – we’ve got to teach you Python, we’ve got to teach you PySpark, we’ve got to teach you the concept of a data lake, we’ve got to do infrastructure as code – because we wanted to build it properly so that if anything went wrong again in the future, you could just press the Restore Everything button.
Frankly, members of our team also needed to get up to speed on the Made Tech side. The capability didn’t exist then, to be fair. We didn’t have a data capability then. Now, we have professional Data Engineers that are now working with our Software Engineers on the same project, to cross skill and upskill. That’s definitely something we learned the hard way, that data as a profession is significantly different to software.
I guess I knew that, as probably the only professional Data Engineer on the team. I just hoped that we would be able to scale better. We got there and in fact, several of the team are now professional, official Data Engineers within Made Tech. They’ve converted over and their skillsets have grown. So, it feels like we all learned a lot from that stage of the project.
LISA STIDLE: Yes, absolutely. From your perspective, is there anything else that you would do differently aside from discovery and the team?
JIM STAMP: User research, I think you had it right, I think that user research should have been a strand that we kept going the whole time. I think we – and you and I have had this conversation a few times, I’m not saying anything new to you here – I think for everyone else, focusing on fixing problems that people have, rather than fixing problems that you think people have, is so important for a platform to maintain stickiness.
I think we fell into the trap as a group of becoming quite internal-looking. We didn’t really look at what the problems were that we were trying to solve. I think that the discovery would have set us off on the right footing to then carry on doing that user research as we went along. Actually, working out what the problems were that people were trying to face.
I think the tools that we selected were good enough. With a bit more user research I think we would have selected some different tools. I think we more recently have started looking at the tools that people need to use rather than we think they need to use. PySpark I still think was a good choice for some users, but it clearly wasn’t the right thing for others.
I think the SQL solutions we chose perhaps were deficient as well. The low code thing doesn’t really happen, it isn’t really a thing, it doesn’t allow people to do low code. It doesn’t really work. There are some tools around but it is tough to do.
LISA STIDLE: Yes, I certainly tried one out and it just didn’t work very well. I think the concept of it was great but it just didn’t deliver what it needed to.
JIM STAMP: I keep being told it’s much better now. We won’t name it but yes, it’s much better.
LISA STIDLE: They always say that. Or it will be better in three months, don’t worry!
On technological choices, when you are making them, how do you allow for some flexibility because things change so much, just within the services that we’re using? Again, it might be much better now. How do you future-proof a design?
JIM STAMP: For me, that evolving architecture is a well-known topic within software. Software frameworks and tools change all the time and data is no different. It’s just software with a very niche way of working. For me, the use of PySpark as an underpinning service means that you have an open-source solution. You can take your code and you can move it to any of the other cloud providers. You can go fully open source and drop all of the cloud provision stuff, host your own data platform in the cloud if you wanted to. You’d be mad to do it. It’s hard work.
Having that underpinning open-source mentality is so important to be future-proof. We’ve got that with your platform, I think it’s there with PySpark. I think we may have gone too far down the AWS route. I’m not saying they are bad tools, but I think for the team that you have, using Glue in the way that we did was perhaps not the best way of doing it. I’ll be honest with you though, I’m not sure what we would have used as an alternative.
The steer that we were given at the start from your architectural colleagues was, ‘We’d like to keep everything in AWS as much as we can.’ Which limited what we can do. The idea of bringing in an open-source scheduler or orchestration layer just seemed too much for you and your team at the start.
I think now we could probably bring in something that was a bit more grown up than Glue and you’d be able to cope. Back then there was no way that if we had brought in Airflow or any of the others, your minds would have just melted. It wouldn’t have worked.
I don’t think we really could have done anything differently from a technology point of view but I think you are future-proofed. You could replace Glue tomorrow with something a bit more grown up and still continue. All of your code would still work. Everything you might need to change a few lines in each script. The core code would still run, you could migrate over relatively easily. PySpark isn’t going to go away, it feels like it’s there for a good few years yet, so I think you’re safe for now.
LISA STIDLE: The choice to use AWS is another thing that I don’t know if we would have done it differently but I would have liked to have considered it more, to understand the implications. I think we probably would have done the same thing because as you were saying, all of our infrastructure colleagues were telling us please do this in AWS. We have a whole cloud engineering team now that didn’t exist before the cyber-attack. That’s meant we have actually got a whole team looking after the core infrastructure, and we don’t have to do that ourselves. Which I’ve seen in other local authorities, where they are trying to do this without an actual cloud team there in place to help them. So, they are doing it all.
If we had chosen something different then we wouldn’t have support from our cloud engineering team —
JIM STAMP: Exactly.
LISA STIDLE: — which has been really crucial. But it would have been good to understand what else is out there and what some of the pros and cons are, before we firmed up that decision.
JIM STAMP: I think there was an aspect at the start of the project where I felt that you and the team had so much to learn, that I made some decisions without your input. To just get the ground and the foundations and everything prepared so that you had somewhere to learn.
I think if we had just stopped and evaluated everything at the start, then we would have ended up just doing that analysis paralysis thing for ages because you were learning the Why and the What, as you said, at the same time as trying to build something. We would have just got stuck.
So, I think it was good to just go – let’s just do something – and give you a space to start growing but not do anything that meant that we couldn’t undo. If you decided to replace Glue you can. There’s nothing stopping you from doing that. So hopefully we have designed it in such a way that now you are more comfortable questioning and assessing those different tools, you can replace them.
LISA STIDLE: Yes. I think that demonstrates the maturity of the team now from where we started. We do have that confidence now that we are thinking, is this the right tool for us? I’m not sure, actually. Whereas before, we wouldn’t have known. We wouldn’t know how to make that decision. So yes, I think that just demonstrates how much we’ve learned.
JIM STAMP: I do feel slightly it was very strange coming from the Made Tech point of view, making decisions for your customers without actually involving them in the decision. It felt very alien to how we normally work but it felt like something that I had to do at the start of the project because there was just no way we could do it. So yes, I apologise for not involving you in those foundational choices. Hopefully it’s ok.
LISA STIDLE: Yes. That brings me to another question. We’re using pretty standard AWS services but it does feel like we’ve had to do a lot ourselves from scratch. All of the terraform modules that we’re developing and things. Do you think there is scope for an open-source blueprint for people to just implement in AWS data platform more easily?
JIM STAMP: Yes, there is. I think that those modules that we’ve created with you, it would be lovely if you open-sourced it. That would be great.
LISA STIDLE: Well, our repo is public now.
JIM STAMP: Is it public now?
LISA STIDLE: Yes, it’s public.
JIM STAMP: Excellent, there we go. So, it’s all there for everyone to use. So yes, I think there is. We have got some ideas; we’ve got some thoughts about how we could start building a local government open and freemium premium model data platform. It feels like as an organisation, Made Tech is keen on giving stuff away. We want to write it once and leverage that each time. But there are strong possibilities for products. So, trying to draw a boundary between having a really nice open-source mentality, but also being able to charge for stuff so that we can make it better, as well.
It’s a delicate balance, so we are thinking about what we could do in that space. But absolutely, there is definitely scope for building some of these blueprints as open-source patents. Having now done it probably three or four times now on different data platforms, and doing the same thing over and over again, and as we work with more local governments, I’m sure we will be doing almost exactly the same stuff again.
So building a local government ontology that we can share openly and building some data sets based on that ontology as we learn more and as we work with more. Obviously, we’d love to involve you. Building a community so that we can start doing that would be fascinating. Building some kind of Slack group so that you and I can start putting what we’ve learned along with what we’re learning from other councils, and building that up.
First an ontology, then some shared data models and then collaborative analytical modules, machine learning modules, visualisation modules that we can all get involved with would be so much fun. Just transformative for local government.
LISA STIDLE: Yes, absolutely. We are all looking for opportunities to collaborate, but I think sometimes the technology gets in the way of that.
We are a member of the London Office for Technology and Innovation. As part of that I am a member of a number of different data networks. We are always looking for what data projects we can collaborate on. Actually, it’s the things at the beginning and the end really, that you can collaborate on. The How is so different depending on what you are doing.
I think we are slightly odd as an organisation or as a council, at least, that we are using Google and we are using AWS. Most councils are in the Microsoft universe.
JIM STAMP: That shouldn’t stop collaboration because if you are all using, for example, Spark, then it’s the same. It doesn’t matter which platform you are in; they are all the same. I think we can do something, I’m sure.
LISA STIDLE: But we are not all using Spark. I think that upskilling across the sector is really important. We are trying to do that within Hackney. We’ve got a Data Analyst community and as I was saying, one of the things that we want to try and address is that we’ve got Data Analysts spread all over the council doing things in different ways, using different tools, but actually they are trying to do the same things.
That is just a barrier to learning from one another, sharing each other’s work and shortcutting each other’s work. We want to do that within Hackney but with other local authorities, absolutely.
JIM STAMP: Yes, ok. Shall we move on?
LISA STIDLE: Sure.
JIM STAMP: Right, what’s next? What do you want to do as a team, what do you see as the next stage of the data platform, now that Made Tech is now starting to do the Homer Simpson, fading.
LISA STIDLE: Yes, so you’ve already started. We’ve already been dialling down the team. We have a skeleton crew on now, who are finishing towards the end of August. That’s a little bit nerve wracking.
JIM STAMP: The stabilisers are definitely off now, right?
LISA STIDLE: Yes. I do really feel that it is time for us to fly the nest and try and do this ourselves.
In terms of the road map, and how we continue to develop the data platform, I think that’s a difficult question to answer right now because we are doing a lot of work on reviewing and developing our road map.
One of the things that I’ve learned that I’m not particularly strong at, through this process, was understanding that middle horizon of what the future looks like. I was good at knowing what we need to do in five weeks, what we want to get to in five years. But what are we doing in five months? That was really difficult throughout the entire project.
We work with different delivery managers to try and develop our road map in a different way. We’ve had some coaching around it and it has continued to be a real challenge. I’ve brought in an interim Product Manager who comes from a product background, to do this. She’s done a lot of user research, and it’s flipping a lot of our assumptions on their head. She’s just picked things up so quickly. It’s amazing when someone actually comes in with a proper discipline to do the thing you’ve been trying to do, how easy it seems to come to them.
Two weeks ago before she started sharing some of her findings with me, I would have said one of the key gaps we hadn’t addressed so far in the project was around how we facilitate different service areas to share data with each other. Because at the very start of the project we were looking at access control and how we do that. It was very complicated, and we decided actually, we don’t really need to do that right now. All we need is to give Housing access to their housing data, Parking access to their parking data and so forth.
We’ve kind of sectioned things off in the data platform in terms of their permissions to view that data. I thought the biggest gap is being able to share that data across those different services.
That is a thing I think we want people to need but don’t necessarily need right now. I want people to be sharing their data with each other. I want them to need to know something from another service. But actually, is that what they are saying they need right now? It isn’t. It hasn’t come up in any of Jess (our Product Manager) – it hasn’t come up in any of her data. I had her come to a lean coffee session we hosted. She suggested a topic to talk about – what is a data set that you need to access in the next three months that you don’t currently have access to? No one voted to talk about that. No one. I was really shocked.
That is the difference that someone from a true product background comes with, rather than someone with a subject matter expertise. Where I’m trying to create the version that I want to see, that I want to be true. She has actually got to grips with what is the true situation. I think that work she is doing is going to majorly change the way we prioritise.
In terms of, what is the most important right now is just a lot of stability. We’ve been doing a lot of changes in the platform as Made Tech get ready to roll-off. That has been a bit of a bumpy ride for our users. I think we are all looking forward to a period of stability where we can really capitalise on what’s already in there, and not build out as much new stuff. To not try and bring new data sets in at such a quick pace.
The other context of this is, we’re also undergoing a restructure as a team. Anyone on this call in local government will know what that means. We have been waiting on this for 18 months since we started this project. I’m sure at our kick-off meeting we said, ‘Oh, by the way, we’re going to have this restructure.’ We’re only getting the actual consultation document which tells us what the team looks like, next week.
We’re losing you but also, I cannot replace the team with the permanent team that we need in place quite yet. Out of necessity, we are going to need to slow down a little bit, but I think that works really well with all the work that Jess is doing. When we start to gear back up, we will be confident that we will be building the right things in the right order.
JIM STAMP: Yes, brilliant. It’s interesting. When I think back to how you described the way that you used Click before the cyber-attack, and if I then think about that in the context of analysts not saying that they would like access to data sets, they seem like very different worlds. Because it sounded like, in Click, everyone was just mashing and merging everything together in a relatively unconstrained way. Whereas now, we’ve put everyone in these secured silos, intentionally, with options to create anonymised versions for sharing. It’s all there, it’s all ready to go as and when people ask for it.
For no one to put their hand up and say they want access to something, do you think that is because there is a literacy problem or do you think that is a lack of core datasets problem? Do you think there are data sets that if you had Council Tax, for example, that level of data set – because there are issues getting some of those data sets in, right?
LISA STIDLE: Yes.
JIM STAMP: I just wanted to say, I don’t think we’ve had any questions on the Q & A yet. Just checking – no, I don’t think we have. If anyone has got any questions please ask. We’ve got a bit of time left, we’ve got no questions, I’m sure Lisa and I can carry on talking for ten minutes no problem, but if you’ve got some questions that would be great.
Back to the point – so, is it because they don’t see a need to join the data or is it that the data that they want is not there, so they are not thinking in that way yet?
LISA STIDLE: I think there are a number of things driving this. Going back to your point about how we were using Click, yes, it’s true that we were joining a lot of data sets in there but we as Data and Insight were doing that. They weren’t doing that. They would see the output of that and not really recognise all of the work that had gone in.
One of the main barriers is people not understanding or knowing what else is out there. I don’t know what I don’t know. That is one of the core problems that we are trying to address with a data catalogue to help people see what we have in the platform.
You might have someone from public health, for example, they need to do a needs assessment on children’s social care and they go to someone in social care and say, ‘I need some data.’ The person in social care says, ‘Ok, tell me exactly what you want.’ ‘I don’t know, I don’t know anything about social care, I work in public health.’
So, it’s that lack of awareness of what actually is out there. Also, I think it is still where we are mentally as an organisation. We still are in a place of – I need to get back to business as usual. Which is what has still come out a lot in the user research, that people just want their dashboards back. It’s not even just about getting access to data sets. It’s – ‘Yes, I would like to do something innovative with data, later. I just want my dashboards back right now.’
In terms of the Data Analysts, it’s also about, ‘I just need to build my service’s dashboard. I just need to know what’s going on in my service. Eventually, I would like to join that up with some data from other services, to more holistically understand what’s going on with our residents, what they need.’ But they are not in a place where they think, ‘That’s what I need to do right now.’ It’s what they want to do in the future.
JIM STAMP: Do you think you can supercharge that by giving some amazing examples? Do you think if you as a team went, ‘Look, we’ve joined these together, we’ve managed to use a classifier of some description, some ML magic.’, and just show them what is feasible, do you think that would create the appetite within the business to then get the Analysts to have a need to do it?
LISA STIDLE: That’s what I’m hoping will happen. I don’t know for sure if it will happen. I think that’s a main role of Data and Insight as a central team with fairly advanced Analysts, that we need to show people what is possible to get their ideas going.
I think that has started to happen with our vulnerable residents work. People did really see the value of it. We need to do something like that again. What’s next for the data platform is one thing, but Data and Insight as a team as well. We’ve spent the past 18 months so focused on the data platform, actually, we’ve really neglected the Insight part of the team. We’re not the Data team, we are the Data and Insight team. How we help people to ask us the right questions and get meaning from their data, rather than just making sure that the data is there, is now where we need to be as a team.
JIM STAMP: Yes. We’ve got a couple of questions now. David Cooper, ‘Do you have an internal data sharing agreement?’. This is a good one. Go on Lisa, this is definitely yours.
LISA STIDLE: Yes. We don’t have specific data sharing agreements. There might be some within different services, but at least in the way that our team works, when we use data that has been collected for a different purpose, we do a privacy impact assessment. That needs to be signed off by the owners of that data.
For example, if I wanted to join Council Tax data and Housing data, to figure out who was illegally subletting their properties, I would need to do a privacy impact assessment. I would have to say why I want to do this, what the legal basis for doing this is. That would have to be signed off by the owners of that data. Because as a team, we don’t own any data. We use data but we don’t own it. So that question of can we use this data always has to go back to the information asset owners.
JIM STAMP: Yes. We went back and forwards on this one quite a few times, didn’t we? How we should do this and what it should look like.
LISA STIDLE: Yes.
JIM STAMP: Other organisations that we work with definitely do, and there are quite strict processes to go through to get access to data sets that you shouldn’t normally have access to. I think within a local authority that is slightly different to a national agency, for example. I still think there is an aspect of – the data catalogue, for example, saying what the data should be used for, what it shouldn’t be used for. What aspects of it can be used for analysis, that kind of thing. We discussed a few times how we would automate those agreements and permission sets. It is a difficult one.
LISA STIDLE: It is. I think a lot of people, when we first started talking about the data platform, thought, ‘Oh great, GDPR isn’t an issue anymore.’ No, no. This means that we can more easily press a button and give you that data, but we still have to ask the question of why you should have that data in the first place. We still need to be using data responsibly. Once we get the answer, we should just be able to do that much more quickly and effectively. We still have to be –
JIM STAMP: Within main frameworks so that you don’t need to jump around. Hopefully that has answered your question, David.
We’ve got, ‘Spark versus AWA debate is interesting. How much appetite do you think there would be from other local authorities to go open source by default? To build something from scratch, or are they going more towards the off-the-shelf data solutions, just because it’s easy or because it is easier?’
I don’t think it is Spark versus AWS. I think we are running Spark within AWS. AWS has Spark built within it. EMR is Spark. You can write Spark natively; you don’t have to use Glue. You can put Spark within Glue, you can move that dial as far down as you want.
As I said earlier, I wouldn’t advise anyone to build their own Spark cluster. I have tried it, it is hard. You will need experts to be able to maintain it.
LISA STIDLE: We will never have those people in local government. Never.
JIM STAMP: Don’t even think about that. Always use as much as you can from your cloud provider, but be aware that if you start using the tools that they provide that aren’t based on open-source technology, it will be very difficult for you to move later.
There are some low code solutions as we discussed but then AWS are better than they were when we tested them, Lisa. I will defend them. They are better than they were.
I would still worry about using them. I still think it locks you in really strongly into that platform, which makes it hard to migrate. I guess people don’t migrate cloud providers too often. If that’s not a concern you have, go for it. Just use the tools, do the easy thing because frankly, there isn’t enough time in the world to build everything from scratch.
We use Spark because some experts have written it. I’m never going to write my own plus software, plus the computer software. Well, maybe one day. But Spark is there for a reason and the same thing applies to anything that AWS or insert-cloud-provider-here provides. There are other ways you can do it. There is lots of stuff that you can use. You can use data bricks across any of the cloud providers, for example. Other cloud providers exist.
There’s different ways you can do it. I would say it isn’t Spark versus AWS, it is a spectrum.
LISA STIDLE: Yes. Just in terms of the appetite from local government, I think looking at similar projects across other councils, most of them are going down the Azure route and do want something that is just a little bit more off-the-shelf. I think also in Hackney, we are really lucky in terms of our approach to being open-source as an organisation.
In Hackney we just take that as read, that that is a thing that we should do. As Jim said, we don’t want to lock ourselves into really expensive contracts with particular providers. We’ve seen time and time again that just means they lose incentive to meet our needs and then we are stuck.
Also, we feel it is really important to open source the things that we are doing, to help other local authorities. We don’t want all local authorities to be using their resources to reinvent the wheel time and time again.
JIM STAMP: Absolutely.
LISA STIDLE: I have been struck by some questions that I got at another conference I spoke at. They were saying, ‘Why did you decide to make your playbook public on the internet?’. Well, it wasn’t really even a consideration. Why would I not do that? Which reminds me actually, to post a link to our playbook.
JIM STAMP: While you are doing that, I’m going to cover the next question, which I think is related. ‘AWS might be easier to explain to a local authority commercial and procurement college.’
That was the steer that we had at the start of the project, was that we already have an account with AWS for our cloud infrastructure, let’s just stay with that. That’s why we want to stay within the bounds, plus we have a team supporting it. So, it made sense from a commercial point of view. You already have AWS, let’s stick with it. It just means that that one number gets bigger, rather than having multiple numbers to track, so that made sense.
The open-source bit, it’s Capex versus OpEx, isn’t it? You have to employ people to maintain open-source, versus you have to pay licences for non-open-source. So again, that’s part of the decision as to where you move that dial to, along that spectrum.
‘Sounds like a great collaboration. For those who just want to be more effective in providing or managing data, what is the first thing you would focus on? Good project managers, building and maintaining a data catalogue, GDPR training?’
I think probably a data catalogue. I think if you’ve got data and you want to change the way that people use it, if you could do one thing, it is making people aware of what the data is, how it can be used and what it contains, and how other people are using it.
So, if you can find a good data catalogue which relatively automatically will go through and build a view for people to gain understanding of your data, I would say a data catalogue.
I think GDPR training is important, but you should all be doing that anyway. If you’re not doing GDPR training then you probably need to start doing GDPR training.
Good Product Managers are always necessary, having been a Product Manager in the past, and also in a relationship with a Product Manager. I get shot for not saying that. Yes, a good product manager is always important. I think we missed that. I think both Lisa and I missed that at the start of the project. I think that’s one thing that I will definitely push for on every single data platform project from now on, is that we need that user focus in the team, as well as the technological and analytical focus.
LISA STIDLE: Yes. I would say of those three options, a good Product Manager would be the top of my list. Then you know what the others should be!
JIM STAMP: The right thing to do, yes.
LISA STIDLE: Yes, because I might say, yes, absolutely data catalogue is the best thing, or yes, these access controls, because people need to start sharing their data. But actually, when you talk to people about what they need today, that might look different.
JIM STAMP: Ok. Do we want to just flick back, I think we had some contact details on the slide? We’ve posted the LinkedIn, but we are doing a couple of events later in the year, so if people are interested to come and speak to us in person, we are doing the Buildings Master State and Think Data for Government later in the year. Feel free to come and speak to us.
We’ve got our data services page up there, LinkedIn details there, so please feel free to contact either Lisa or myself. Lisa’s given her email address.
LISA STIDLE: Yes. I also want to do a plug. I’ve mentioned we’re going through a restructure; we’re going to have to build a permanent team for this data platform. So, if you’re watching this and thinking oh my gosh, I’d love to work in Hackney and find out more about this data platform, then please do get in touch with me.
JIM STAMP: Any other plugs you want to make? Are you in Big Data London this year?
LISA STIDLE: Yes, I will be.
JIM STAMP: I’ll be seeing you there then. So yes, feel free to come up. We don’t have a stand; we’re not sponsoring it but there will be a sizeable number of Made Tech employees there and I think some other colleagues and customers are probably going to end up coming as well. We should all get together and create a big WhatsApp group again.
LISA STIDLE: Yes. I think a lot of people from Hackney will attend as well.
JIM STAMP: Excellent, it will be nice to meet people in person at last.
If there are no other questions, I think we are running out of time. Thank you very much for your time today, Lisa. It’s been a pleasure as usual to have a chat with you.
Did you want to say anything to finish up? Was there anything that you wanted to say?
LISA STIDLE: Just thank you, everyone. I hope this was helpful. IF there is anything that we can do to try and make a similar project in your local authority or your department easier, please do get in touch. We are really keen to help shortcut each other’s learning. Like I said, we don’t want to keep reinventing the wheel in local government. Our resources are already stretched but going to be even more stretched. We need to help each other as much as we can.
JIM STAMP: Absolutely, yes. Always here for a chat. Contact either, both, all of us. We’re here. Thank you very much for your time.
LISA STIDLE: Yes, thanks everyone, bye!Back to the episode