July 27, 2023
There are compelling reasons to promote an open approach to public sector data. From unlocking economic growth and making public services better and more efficient, to improving research and benefiting society as a whole. Increasing the speed, awareness and transparency of data sharing is key to enabling efficiencies and innovations.
Since 2012, government departments have been following an ’open by default’ policy. Similarly, the National Data Strategy seeks to encourage the public sector to see the ”economic, social and environmental” opportunities of data, rather than something to be protective of. The recent Open Data Camp ’unconference’ in Wolverhampton saw a diverse group – including members of the Valtech team – come together to explore ways of propagating open data techniques and delivering on the strategies and commitments.
Open Data Camp is run as an unconference across a whole weekend. For anyone unfamiliar with an unconference format, there is no pre-set agenda. Instead, everyone attending the event is invited to pitch ideas for sessions. There was no shortage of ideas, and topics varied greatly. The tricky part was deciding which sessions to attend, as there were up to five running simultaneously. Valtech sponsored the event, and we hosted sessions on data bias and overcoming commercial barriers.
This article focuses on six of the sessions held over the course of the two-day event.
How important are trust and motivation when it comes to data sharing?
Data and trust are often seen as different subjects but trust in data and systems is crucial. Perspectives from local government suggest that people are more willing to share their data when they believe they will gain from it. Consequently, councils can face resistance when seeking to access GP records. Social care organisations don’t typically face the same challenges.
There was some discussion of the potential for using branding or badges to make it clear a particular data-sharing initiative is trustworthy. Supporting a focus on user benefits with some kind of trust scheme. People are more willing to share data when they have a specific desire or need. It’s also essential to ensure that what, how and why data is being shared is understandable by a general audience. No jargon and a commitment to language that can be widely understood are vital.
Perhaps a mechanism akin to Trustpilot would be an effective model for establishing and communicating trust in the open data space, with the potential for user feedback and ratings. Similarly, questions were raised about the use of the Cyber Essentials scheme and the intended recipients of its badges (consumers, end-users and data publishers).
The group believed that trust in data sharing is built through badges or accreditations that assure us all that participants are committed to responsible data-sharing practices. This would give other organisations, users, citizens and data publishers some protection against misuse. Of course, different audiences value different aspects of trust, including data quality, timeliness, licensing, personally identifiable information (PII), and data-sharing agreements. Not only that, for many, trust is based on adherence to values and principles and not solely on meeting technical standards.
How can we guard against data bias?
The first Valtech-led session focused on data bias. We ran a webinar on this topic last year, bringing together award-winning author Caroline Criado Perez with a cross-government panel to explore the issues and what we can do to overcome them.
While most people are aware of data bias, many are unaware of its prevalence and seriousness. Car safety testing is a good example. For decades crash test dummies haven’t reflected women’s needs. Women and men have differently shaped hip bones and seat belts are designed to restrain us around the hips. A 2019 study from the University of Virginia concluded that women are 73% more likely to be seriously injured or die in the event of a crash than a man wearing a seatbelt.
The discussion considered two challenges. If you’re collecting data, how can you assure yourself that what you are collecting is representative? And if you’re using open data, how can you be confident it doesn’t contain bias?
The group generally agreed that it is difficult to justify collecting demographic data under GDPR. This can be vitally important to assess whether a service is used consistently by different groups. If a service is being underutilised by some groups, we need to be able to identify it, understand what might be getting in the way and do something about it thus ensuring services are equally accessible to all. One participant challenged this by suggesting that the problem is a failure to fully articulate the requirements and make the need to collect the data explicit and understood. DfT’s e-Scooter trials are a good example where the privacy statement sets out the purpose for collecting the data and cites section 149 of the Equality Act 2010.
When we aren’t in control of collecting the data, Machine Learning introduces a risk that any bias in the data could be amplified. The group agreed that processes should be transparent, and algorithms must be periodically tested to establish whether bias exists. We also need a mechanism for providing feedback to data and service owners, something that isn’t always considered. For example, research has shown that voice assistants (like Alexa, Google Home, and Siri) demonstrate gender and racial bias. They consistently recognise and understand white, male American voices more effectively. But how could we provide feedback to their developers?
Are some user groups hard to reach, or are we not trying hard enough?
The ‘Data and Representation’ workshop was attended by several participants from the data bias session. Some of the themes continued. The group considered how and where we research to secure diversity of participation. We talked about “hard to reach” groups of users, a term that was rightfully challenged. Often such groups aren’t “hard to reach”, they just haven’t been presented with a compelling reason to engage in research.
The discussion also touched on the idea of community leaders owning a position for their community. But it’s difficult to know how representative their views are. Individuals within a community may have very opposing views on some or all topics. We concluded by coming back to the earlier theme of trust; without it, people aren’t going to share their data.
How accurate does data need to be?
A session later in the afternoon explored accuracy. The discussion centred on the question of how precise data needs to be. Well, it depends a great deal on the use cases.
Take NaPTaN, a national dataset for uniquely identifying all public transport access points in England, Scotland and Wales. If the data for a particular bus stop is somewhat inaccurate, it won’t have much day-to-day impact on the bus driver. They’ll use the bus stop sign to determine where to stop the bus and the data is, therefore, sufficiently accurate for that purpose. However, what about a connected autonomous vehicle? Even a small inaccuracy could result in the bus stopping in an unsafe place. And data outliers risk the bus leaving the road altogether with the misplaced belief that the bus stop is, in fact, in a nearby river! The need for accuracy is clearly elevated.
How do we overcome commercial concerns about sharing data?
The second Valtech-led session focused on overcoming commercial barriers to open data. Companies sometimes cite commercial concerns about open data and, therefore, a reluctance or refusal to share data. These could sometimes be genuine concerns where releasing data could put their business at risk. But equally, a short-term concern sometimes prevents us from delivering benefits in the short-to-medium term; in this case the reluctance of the few will hold back the typically wider benefits that would be delivered to a much broader group of users and stakeholders.
We discussed how best to approach this problem, something we have experience with from our work with OZEV on EV Chargepoint data (which we shared with the group). To find the right balance, we should seek to understand commercial sensitivities (through consultation or other research), consider reducing or restricting the data set and even mandate a specific level of data sharing.
OZEV’s consultation determined that both static and dynamic data should be made available (some chargepoint operators objected to the latter) and the OCPI standard was adopted. You can read more on pages 21-23 of the Government Response to the 2021 Consultation on the Consumer Experience at Public Chargepoints. The draft legislation recently published as The Public Charge Point Regulations 2023 refers explicitly to ‘open public charge point data’ and details the need for operators to collect and share timely data about the use of the system.
Returning to the earlier discussions on data bias, we were pleased to see that the Government Response considers the need to address the absence of accessibility data. Inaccessible charging can be a significant barrier to entry for disabled and older drivers, as a report by the Research Institute for Disabled Consumers explains.
How can we help more people use existing data sets?
The public sector creates a considerable amount of data, but that doesn’t necessarily increase the use and re-use of data by other teams, services, departments and organisations. To drive the use of open data, we must consider how we can help others discover, understand and use our data. Creating data products – i.e., preparing, presenting and promoting them as if they were products – helps to accelerate data reuse and sharing.
We discussed how to start by creating easily sharable, open data sets and considering how they might be more widely used. It was recognised that data sharing between departments can be challenging, requiring clear communication about available data needs, and suggesting the data product approach can help bridge between worldviews and experiences. By engaging users and involving domain experts, we can help change attitudes to investing time and resources in making data accessible and creating an open data culture – even making it an obligation in our organisations.
Participants shared experiences of how understanding the product lifecycle, adoption curves, user segmentation, and MVP (Minimum Viable Product) thinking can help. A product view can also be helpful because it abstracts the complexity of data so that a wider audience can understand it, including any nuances.
Digital and data teams need to learn from each other on the technical front, iterating core interfaces and building empathy. Collaborating to establish standards and architecture and to maintain data infrastructure is crucial.
After two days packed with discussion and debate, it was clear that Open Data Camp continues to make an essential contribution to the use of data in the public sector. To read about more of the outcomes and discussions, go to https://www.odcamp.uk. If you’d like to talk about how your organisation can support – and benefit from - open data, please get in touch with Valtech.
Open Data Camp 8 was held at Wolverhampton University’s Springfield Campus,
an architecturally stunning yet also practical location for the event.