Adam Stoner

Security and effectiveness of a digital census

March 21st is census day in England and Wales and an important milestone because the 2021 census is the first mostly-digital census ever conducted here.

A digital census has its obvious benefits, namely that statistics can be gleamed immediately on the available data. What interests me is not the results of the census but the data security and privacy implications that a digital census inherently has and whether, considering we already share so much of ourselves anyway, a census is fit for purpose.

ONE: Conducting a digital census and storing sensitive information

Concern around data privacy and government surveillance has increased in recent years. The revelations of Edward Snowden, Christopher Wylie and other whistleblowers have all come to light since the last census and knowledge that the data we provide companies is being used to profile and sell us is hardly secret. In July 2019, the Information Commissioner’s Office conducted a survey that revealed that the public has a ‘low level of confidence in companies and organisations storing and using personal information’ mostly thanks to concerns about data theft, data misuse, and that data being sold. The Open Data Institute and YouGov in October 2019 discovered that less than a third of citizens trust central government or local authorities with their data. More 25 to 34 year olds trust credit card companies than they do our elected leaders.

I have completed (and you should complete) the census especially when failure to do so is punishable with a £1,000 fine. Despite such a heavy financial penalty, I suspect we may see record non-compliance this decade thanks to this distrust, social media conspiracy theories, and protest. And the reality is that a lot of this distrust is actually well placed: data breaches happen all the time and their scope and impact are increasing in size as we share more data with more outfits.

All of this has a surprisingly positive impact (unless you work for the census office). People are becoming more reluctant to share their personal information. The number of people willing to share their home address fell from 41% to 31% from 2018 to 2019 and only 54% of respondents to one survey said they were willing to share their email address.

Data from the census is consumed in two key ways. The first is instant and is available to statisticians as soon as you begin submitting information; the second features a time-delay of 100 years.

  1. Anonymised, aggregated statistics such as population and demographic. Your individual data point is featured here but you are not individually identifiable.
  2. Personally identifiable information. Information specific to you available for public consumption after 100 years, including your address, religion, sexuality and more.

Where any data is concerned, you’ve got to trust:

  1. The security of the person who is submitting data; that their computer or telephone is free from anything that may leak or compromise their data
  2. The security of the staff working with the data (and there are as many as 30,000 of them)
  3. The security of staff equipment; their computers, telephones, and data storage techniques
  4. The security of company infrastructure the data is stored on centrally; servers and networks
  5. The security of any third-party companies it’s shared with; the people who own the servers and their staff and all of their equipment and infrastructure

The Census 2021 website says that ‘everyone working on the census signs the Census Confidentiality Undertaking’ and that ‘[i]t’s a crime for them to unlawfully share personal census information’ but the law didn’t prevent the release of 3.2 billion records from data breaches in the first two months of 2021, so why it would be a deterrent here I do not know.

What I’m alluding to is that sooner or later, 2021 census information may suffer a data breach and end up somewhere it shouldn’t. There are simply too many possible attack vectors.

Let’s not underestimate how valuable of a weapon this information actually is. Ex-Cambridge Analytica employee Brittany Kaiser described their data modelling as a ‘weapons-grade communication technique’ and claimed that because it was so dangerous, it was export-controlled. Information warfare is the new-norm and raw data on the entire population of a country could be a very alluring dataset for a foreign power or a shiny trophy for a black-hat hacker. Someone associated with an online hacker group claimed to have laid their hands on 2011 census data pretty immediately thanks to the ‘security-illiterate UK government’ and posted what appeared to be an entire dataset for public viewing on Pastebin before it was taken down.

Of course, I am assuming here that a bad actor is the cause of all data breaches, which isn’t entirely true. Since the last census, the government themselves have managed to simply lose or misplace at least the following:

Is it any surprise two-thirds of people distrust authorities with their data when their track record of keeping it safe is so abysmal? Those conducting the census would refute this stating on their website that they have a ‘security regime that follows government standards’. Judging from their track record, those standards aren’t great.

All this said, digital censuses are more undoubtedly more robust and arguably immutable than paper ones: 1931 census returns were completely destroyed in a fire in Middlesex where the census was being stored which is a terrible shame. Could we one day see a blockchain census?

TWO: Is a census fit for purpose?

The question isn’t whether censuses themselves are good or bad – I actually think they’re very valuable historic tools, which is why you should definitely fill yours out – but whether the methodology is correct and whether they’re necessary to reflect on current times.

The census asks several questions but they fall into three categories:

  1. What and where you are: Your address, your biological birth sex, your age
  2. Who you identify as: Your gender, your sexuality, your religious beliefs
  3. How you live: Homeowner or renter, how many cars you own

Due to World War II, the 1941 census wasn’t taken but the National Registration Act 1939 established a National Register ‘for the issue of identity cards’ and took a population count on 29 September 1939. Forty million people were registered in some 7,000 transcript books providing a viable census substitute, recording nearly the same information.

A census is remarkably useful, representing in solid statistics changing behaviours and outlooks but I’d also argue it’s not the business of anyone what sexuality you are, what God you might want to believe in, nor what the relationship you have with the people in your household is. The rest – where you live, how old you are, and whether you own a car or rent a home – is already available from HMRC, the DVLA, and more.

Photographer Noah Kalina reflected on this idea stating that a photograph is worth more many years after it’s taken and I think that sentiment is applicable here too. A census, or something like the Mass Observation diary project, is potentially our best way of measuring the past but we have many better ways of measuring the present. As a matter of fact, censuses are so useless at measuring ‘right now’ that people are already calling for a second ‘emergency census’ in 2026 given the impact coronavirus and the UK’s exit from the European Union has had on our lifestyles.

Again (and because I really don’t want to be sued) you definitely should fill out the census, but arguably it’s the people who don’t that unwittingly reveal the most about society.

18 Mar 2021

about podcast contact