Time for the data policy space to mature

In today’s blog post, I repost an ungated version of my Business Day article with Pietman Roos in which we discuss the importance of data ethics in the age of big data and AI.

Time for the data policy space to mature

There are still legal vacuums, which over time could give rise to unfair practices
by Daan Steenkamp and Pietman Roos

Data is a distinct resource that has its own ethical considerations for exploration, processing and distribution. The terrain is mostly “ethics”, as opposed to law, since the data landscape is in many ways still unregulated and akin to the Wild West. While privacy rights have been entrenched amid the rush to tap into this “new oil” and exploit it as a resource, there are still large legal vacuums, which over time could give rise to unfair practices. In all the excitement of the data economy, we should be asking what type of industry we are creating.

Crude oil is something that requires immense resources to transform the raw product into a useful end product, just like data. The regulatory framework for oil exploration has developed to include drilling rights, environmental permits and processing and distribution safety checks. These are all sensible safeguards to protect the environment, consumers and smaller operators against the domination of large companies. If “data is the new oil”, it follows that it should get a similar regulatory treatment to ensure that there is no legal excuse not to act in an ethical manner.

Why regulate more, though? It is not as if SA business does not have enough red tape. If anything, far more can be done to pare down the administrative burden imposed on large and small business alike.

Though it is in some respects quite unlike any other industry, the data industry is governed by the same economic principles, one of which is scale benefits and the threat of monopoly. Smaller operators can easily be wiped out by large companies that entrench their monopolies in a palpably unfair way, such as infringing on their copyright. An example is that of data structures or metadata, a form of intellectual property in the data industry that it is just as crucial and valuable as branding or forms of intangible assets.

The practice of data scraping, for example, raises concerns relating to consent, fair use and copyright. Metadata, or data that describe data, fall in somewhat of a grey area in copyright law, though data structures that map out how different inputs should relate to each other can be protected. For example, the time and location of a phone call as opposed to the content of the call itself, should also be protected since such information could be used in unfair ways or could represent intellectual property of the company that structured a data set. Scraping by large companies of public and private data is increasingly prevalent but often occurs without checks of website terms of service or the consent of data owners.

Data businesses can start with the process of protecting their metadata by registering patents, but that is relatively expensive and diverts resources from other development priorities. This also does not stop scraping and owners might have to resort to even more costly litigation, where established large entities can outlitigate smaller start-ups. This is even worse if the offending party is a foreign entity. As scraping for training artificial intelligence models expands beyond text, regulations and technical safeguards need to evolve. One policy response could be to create a specialised court to hear matters on copyright and support start-ups with a commission system to investigate and prosecute copyright infringements.

The analogy of data as crude oil falls short in two important respects, however. First, maintaining trust is paramount. Second, without ethical data use there can be no long-term success for firms that use large amounts of data.

It is true that corporate social responsibility has become a key consideration for companies but maintaining trust and adhering to ethical standards in data use are critical if companies are to harness the potential of data.

Trust is an intangible asset. Without it, data will not be shared or relationships maintained. Breaches of trust risk reputational damage, regulatory penalties, or legal relief. Businesses already understand this with respect to physical assets and other intangible assets such as brand value. But data assets add another dimension. Transparency and mechanisms for stewarding stakeholder trust must be embedded in processes, to minimise the risk of misunderstandings and avoid data misuse. People who develop crypto technologies understand this. As firms begin to monetise data and build data products, CEOs must appreciate how crucial maintaining trust is.

The promise of big data is unearthing real-time insights into consumer needs, economic and financial developments and the opportunities and risks that arise, and up-to-date evidence-based policy development. But this comes with new and ethical risks relating to consent, legal compliance and property rights.

The good news is that there are tools and technologies that promote ethical data collection, support development and maintenance of trust and avoid legal or reputational losses. There has been an explosion of data anonymisation and automated data compliance tools that promote transparency and ethical data use and reuse within national legal frameworks.

One example is the Statistical Data and Metadata Exchange (SDMX) standard for statistical agencies and financial institutions used by Eurostat and the IMF that is being phased in globally for statistical agencies, banks and government entities. Domestic institutions such as Stats SA are also planning to introduce it in coming years. SDMX provides a global standard for the exchange of statistical data and metadata. This ensures that members of the data ecosystem can seamlessly exchange and comprehend data across diverse data sets, while baking in best practice data governance and auditing capabilities to the way databases and data pipelines are set up and managed.

At the end of the 2007 film, There Will Be Blood, the oil baron and arch villain, Daniel Plainview, craws over how he bested a smaller rival through drainage: sucking oil from a neighbouring oilfield over which he had no legal title. These subterranean machinations, unseen but with very real consequences, are like what is happening in the data industry. The data policy space needs to mature, quickly, to avert entrenching unfair and unethical practices.

• Dr Steenkamp is CEO at Codera Analytics and a research associate with the economics department at Stellenbosch University. Roos is an associate with Codera.

Codera Blog Newsletter

Sign up to receive a weekly summary of our blog posts

Check your inbox for a confirmation email

Verified by MonsterInsights