Economics of data: (Part 1 What is data?)
Phuah Eng Chye (18 July 2020)
As some put it, personal data will be the new “oil” – a valuable resource of the 21st century. It will emerge as a new asset class touching all aspects of society. At its core, personal data represents a post-industrial opportunity. It has unprecedented complexity, velocity and global reach…Indeed, rethinking the central importance of the individual is fundamental to the transformational nature of this opportunity because that will spur solutions and insights. As personal data increasingly becomes a critical source of innovation and value, business boundaries are being redrawn…Far from certain, however, is how much value will ultimately be created, and who will gain from it. The underlying regulatory, business and technological issues are highly complex, interdependent and ever changing…Fundamental questions about privacy, property, global governance, human rights – essentially around who should benefit from the products and services built upon personal data – are major uncertainties shaping the opportunity. Yet, we can’t just hit the “pause button” and let these issues sort themselves out. Building the legal, cultural, technological and economic infrastructure to enable the development of a balanced personal data ecosystem is vitally important to improving the state of the world. World Economic Forum (January 2011) “Personal data: The emergence of a new asset class”.
Data is increasingly being regarded as the most valuable resource in the information society. Different analogies have been used to describe its economic role. Each analogy reflects the different features and can be categorised as follows.
Data as oil[1]
Data as oil is the most popular but is also possibly the least consequential analogy. It was popularised by a World Economic Forum/Bain report highlighting the central role of data in the millennium. There are similarities. Like oil in the industrial economy, data is a basic resource for an information-driven economy. Like oil, it needs to be discovered, extracted and processed. The most apt insight is that data is replacing oil as the fuel to an information economy; particularly in powering AI engines. But the comparison is misleading in other respects. Oil is physical and therefore scarce. Information is intangible[2] (and non-rivalrous) and therefore abundant. Privacy advocates regard data as oil as a trojan horse argument to convince policymakers to “shy away from potential privacy legislation out of fear that putting checks on access to data” [3] will lead to a competitive disadvantage.
Data as property or rights
Data as property is perhaps the most powerful analogy because of the commercial and social implications of ownership. Maria Savona notes “data collection, treatment and analytics represent intangible investments and have only more recently started representing the lion’s share of accumulated collective intelligence on which companies rely…The literature on intangibles considers the spending on data analytics, now increasingly Machine Learning (ML) and Artificial Intelligence (AI), as intangible investments”.
While the concept sounds simple, it is difficult to operationalise legally enforceable rights to protect data ownership. Valentina Pavel argues it is problematic to create “a new law that attaches property rights to data” because data “can be copied and transmitted at almost zero cost. Therefore, there are limited barriers that can be exercised with property rights and exclusivity”. For example, “once you sell data, it’s a one-time transaction that can’t be simply reversed. Moreover, once you sell, the company can do whatever it wants with the data…If you transfer your property rights over data to others, there is no real way to assure that the data won’t be abused”.
She points out there is no “easy consensus on what exactly I can own. Is it my bank and credit statement, my smart meter reading, my GPS coordinates? How about my picture with my friends and family? If they are in the picture, do they own it too? What happens with genetic data? It contains information about my family, so if I reveal it, will my family also have property rights over it? What about my future children and grandchildren, too? Data about me is also data about other people. And what happens to the data about me that is generated without my knowledge?”
Valentina Pavel notes “designing a system of data property rights would require a classification and inventory of all possible data types that can be owned, along with their state (e.g., data in transit, data in storage). Questions would include: What data do we assign property rights to? Is it data that is collected, or analysed, or aggregated, or data that is being profiled? Would data in transit be owned as well, or only data that is already stored somewhere? Could the same data have multiple owners?”
It might therefore be easier to express ownership as a form of rights. Maria Savona suggests data can be considered as an “intellectual property or more simply a licensable asset owned by the individual who generates it, as part of a system that recognises and protects intellectual property rights and asks platforms owners to pay a licence for use”. This is “possibly more inclusive way to tackle the issue of redistributing data value” from data generated by workers and consumers. She argues “to the extent that individual data is used (collected, aggregated, and analysed) by the firm to increase its intangible assets, it should be treated as use of an intellectual asset and remunerated through the payment of a licence fee. This would change the nature of the contract between individual and company: rather than being paid a wage within a labour contract, the individual would be paid for the use concession of a licensable asset, her authorship’s right…By considering it as a licensable asset, data generators could choose to be paid a license use fee when data analytics are used for private purposes and feed into profits (e.g. marketing analytics)”.
Maria Savona argues the advantages of the intellectual property approach is that “it could (i) Reduce the infrastructural burden of administering a digital tax or changing digital ownership; (ii) Ensure dismissed workers do not lose their rights on data ownership once they are out of the labour contract; (iii) Reduce the likelihood that certain workers miss being paid a wage against the use of their data; (iv) Ensure that firms keep paying an IPR to consumers who have completed/exhausted their consumption transactions, but who have provided data that continues to contribute to the intangible assets of the firm; (v) This way we do not necessarily tax innovative firms, but redistribute profits directly”.
But questions arise as to whether raw data can be protected by copyright. Valentina Pavel explains “intellectual property rights may at first seem akin to data ownership, but there is a fundamental difference. For example, copyright law protects the original expression of an idea, not the idea or information itself…In the case of EU database law, the protection applies to the creation of the database, not the data entries themselves…On a more fundamental level, property rights are alienable, which means you can essentially transfer them from one person to the next. Human rights such as the right to privacy and data protection are inalienable. If you transfer them, they lose any meaning. What’s the point of freedom if you renounce it? And what is more: if you sell data but want to keep some basic rights about the uses of that data, you are actually thinking about a rights-based approach, not a property one”.
The Australian Government Productivity Commission Inquiry Report points out that under Australian law, “no one owns data, and this is generally the case overseas too, although copyright and various other laws can ascribe various rights to parties”. The concept of data ownership is “nebulous. If a consumer cannot trade with their data, then it is hardly accurate to contend there is data ownership”. They explained that “thinking about data as personal property creates messy overlaps with copyright law”. In instances where there were multiple owners, the difficulty in resolving competing claims” could render data unusable in practice”.
They argue that “thinking about individual’s information in the context of consumer rights[4] avoids many of these problems. And, a case can be made that the concept of your data always being your data suggests a more inalienable right than one of ownership. Rights may be balanced against other competing interests, but they cannot be contracted away or sold with no further recourse for the individual in the event of data misuse or emerging new opportunities for beneficial data use”. The Commission believes “data rights give a more enduring and workable outcome for individuals”.
Hal Varian suggests that “instead of focusing on data ownership – a concept appropriate for private goods – we really should think about data access. Data is rarely sold in the same way private goods are sold, rather it is licensed for specific uses”. For example, rather than ask “who should own autonomous vehicle data”, it is better to ask “who should have access to autonomous vehicle data and what can they do with it?” Hal Varian highlights “many parties can simultaneously access autonomous vehicle data. In fact, from the viewpoint of safety it seems very likely that multiple parties should be allowed to access autonomous vehicle data”. “There could easily be several data collection points in a car: the engine, the navigation system, mobile phones in rider’s pockets, and so on. Requiring exclusivity without a good reason for doing so would unnecessarily limit what can be done with the data”.
Hence, the analogy of data as property face considerable challenges. Information effects[5] such as convergence, bundling and ambiguity also complicate the task of identifying and tracing data ownership. In addition, an increasing amount of data are not directly created by individuals but by machines. How would the ownership of such data be determined?
Data as labour
If data as property presents the owner perspective, then data as labour presents the worker perspective. Imanol Arrieta Ibarra, Leonard Goff, Diego Jiménez Hernández, Jaron Lanier and E. Glen Weyl explains that “in the digital economy, user data is typically treated as capital created by corporations observing willing individuals. This neglects users’ role in creating data, reducing incentives for users, distributing the gains from the data economy unequally and stoking fears of automation. Instead treating data (at least partially) as labor could help resolve these issues and restore a functioning market for user contributions”. Their arguments are thus rooted in the political economy of data that seeks to recognise the contributions of data workers and to raise their share of economic value.
They argue that when data is viewed as capital, it becomes a by-product from consumption. The payoffs are channelled to firms “to encourage entrepreneurship and innovation”. When data is viewed as labour, they become “user possessions that should primarily benefit their owners” and the payoffs are channelled “to individual users to encourage increased quality and quantity of data”. Data as labour thus views “data work as a new source of digital dignity” and “sees the need for large-scale institutions to check the ability of data platforms to exploit monopsony power over data providers and ensure a fair and vibrant market for data labor”.
In relation to this, Jaron Laniere and Glen Weyl pitched the concept of “a true market economy coupled with a diverse, open society online” where “people will be paid for their data and will pay for services that require data from others”. Hence, “data dignity” “translates the concept of human dignity that was central to defeating the totalitarianisms of the twentieth century to our contemporary context in which our data needs to be protected from new concentrations of power”.
Maria Savona notes that redistributing value within the data as labour framework face significant implementation challenges. “First, a substantial overhaul would be needed to adequately inform consumers – and, indeed, workers – of the unequal terms of the barter they are currently offered before feeding their data into big tech platforms. Second, and no less important, a general purpose tracking infrastructure does not yet exist and would have to be imposed by public regulation…Also, this approach would need a radical rethinking of the labour markets configuration, including mitigating the potential for counterproductive exploitation and misuse of the incentive to generate unnecessary mass of data within labour contracts”. Hence, data as labour is tied to the “debates around the changing nature of work, particularly on what the intrinsic and extrinsic incentives to work have become in the digital economy. Remunerating data generators could either undermine or enhance the social rewards attached to the sense of belonging to online communities, of being empowered and contributing to social value”.
Maria Savona points out “redistribution of data value would need to start as soon as the individual data are collected in some aggregate form for further use. Traceability is therefore the single, most crucial implementation challenge”. She suggests “arguably, the GDPR and related fundamental data protection regulations are a milestone to build upon to implement traceability. Any record of individual consent to use of their data as a result of GDPR compliance is potentially a traceability starting point that allows implementing all the forms of data value redistribution…This can become a General Data Tracking Regulation that would allow corporate actor to be accountable for any profitable use of individual data”.
Data is non-physical and non-rivalrous
Oil, property or labour are imperfect analogies because they perceive data from an industrial perspective; as physical equivalents that can be owned or employed. In contrast, data is non-physical and non-rivalrous. Charles I. Jones and Christopher Tonetti points out “the economics of data raises many important questions”. “Markets for data provide financial incentives that promote broader use, but if selling data increases the rate of creative destruction, firms may hoard data in ways that are socially inefficient”. They also note “the nonrivalry of data may create strong pressures to increase scale” and that “data may serve as a barrier to entry. A natural concern about the limited sales allocations is that as a firm accumulates data, this may make it harder for other firms to enter”.
They thought that while government policies to sharply limit the use of consumer data by firms may generate privacy gains, “it may potentially have an even larger cost because of the inefficiency that arises from a nonrival input not being used at the appropriate scale”. Similarly, outlawing data sales “entirely may be particularly harmful. Instead, our analysis indicates that giving data property rights to consumers can lead to allocations that are close to optimal. Consumers balance their concerns for privacy against the economic gains that come from selling data to all interested parties”. Charles I. Jones and Christopher Tonetti point out that “because data is infinitely usable, there are large social gains to allocations in which the same data is used by multiple firms simultaneously”. In this context, “state-owned enterprises could be encouraged to share data with each other. Or, in an industry context with trade, this difference could lead to firms (e.g., in China) having a distinct productivity advantage in data-intensive products”.
Regulating data as a public good
The disruption of industries reflects the changing role of data on business organisation. For example, data collection and use were ancillary in the traditional taxi business setting. In contrast, data represents the core of the ride sharing business model. The massive amounts of data on customers, drivers, cars, routes, fares and traffic conditions are highly valuable. But who should own it? Should the data be exclusively owned by the ridesharing companies given their massive investments? Do customers and drivers have ownership claims because of their role in creating the data? Or should private claims be over-ridden because the data is derived from the use of public space and therefore can be considered a public good?
Thus, the changing role of data in business triggers regulatory questions on data ownership not covered by traditional industry regulation. New regulations are needed to clarify the ownership and use rights – including on the data that ridesharing firms should be obliged to make transparent (to drivers, customers and competitors), to submit to regulators and the conditions on their use. In other words, new rules are required to address the tension between private ownership claims and the public good nature of data. Governments need to balance the trade-offs between harnessing self-interest by endorsing private ownership and harnessing community collaboration through regulating data as a public good.
Conclusions
Overall, it is difficult to shape perspectives on data through simple analogies because of its multi-dimensional nature. Data is a national asset that affects a country’s security, competitiveness, efficiency, innovativeness and growth. But there is little doubt that data is emerging as an important factor of production with wide-ranging effects that are reshaping economies that are becoming information based.
References
Australian Government Productivity Commission Inquiry Report (31 March 2017) “Data availability and use inquiry report”. http://www.pc.gov.au/inquiries/completed/data-access/report/data-access.pdf
Charles I. Jones, Christopher Tonetti (September 2019) “Nonrivalry and the economics of data”. NBER. https://www.nber.org/papers/w26260
Hal Varian (July 2018) “Artificial intelligence, economics, and industrial organization”.
NBER. http://www.nber.org/papers/w24839.pdf
Imanol Arrieta Ibarra, Leonard Goff, Diego Jiménez Hernández, Jaron Lanier, E. Glen Weyl (27 December 2017) “Should we treat data as labor? Moving beyond free”. American Economic Association Papers & Proceedings. https://ssrn.com/abstract=3093683
Jaron Laniere, Glen Weyl (26 September 2018) “A blueprint for a better digital society”. https://hbr.org/2018/09/a-blueprint-for-a-better-digital-society
Justin Sherman, Samm Sacks (13 June 2019) “The myth of China’s big A.I. advantage”. Slate. https://slate.com/technology/2019/06/data-not-new-oil-kai-fu-lee-china-artificial-intelligence.html
Maria Savona (2019) “The value of data: Towards a framework to redistribute it”. University of Sussex Business School. SPRU working paper. https://www.sussex.ac.uk/webteam/gateway/file.php?name=2019-21-swps-savona.pdf&site=25
Phuah Eng Chye (2015) Policy paradigms for the anorexic and financialised economy: Managing the transition to an information society.
Valentina Pavel (17 July 2019) “Our data future”. Privacy International. https://privacyinternational.org/long-read/3088/our-data-future
World Economic Forum (January 2011) “Personal data: The emergence of a new asset class”. In collaboration with Bain & Company, Inc. http://www3.weforum.org/docs/WEF_ITTC_PersonalDataNewAsset_Report_2011.pdf
[1] The WEF report highlighted the comment by Meglena Kuneva in March 2009: “Personal data is the new oil of the Internet and the new currency of the digital world.”
[2] See Policy paradigms for the anorexic and financialised economy: Managing the transition to an information society.
[3] See Justin Sherman and Samm Sacks.
[4] As is the case for countries like United States, United Kingdom, European Union, New Zealand, and Canada.
[5] The anorexic and financialised economy: Transition to an information society.