The Leap Blog: February 2019

Friday, February 22, 2019

Data localisation in India: Questioning the means and ends

Data localisation has become a recurring topic in Indian public policy debates. This has been prompted by moves such as the RBI directive in April 2018 mandating local storage of all payments-related data; the proposals in the draft Personal Data Protection Bill, 2018; and localisaton proposals in other sectors such as e-commerce and health. Calls for localising data are increasingly tied together with the narrative of "data colonisation", with localisation being seen as an antidote to control of global data sets by large multi-national corporations. At the same time, there are broader concerns about the growing relevance of the digital economy, its diverse socio-economic impacts and the limited ability of states to effectively regulate this space. In the absence of a global compact on issues such as privacy, cyber security, surveillance, and cross border data flows, data localisation is being seen as a tool, although a contested one, to exert national control over the digital ecosystem.

Much of the conversation around localisation has been centered around economic arguments, in terms of its compliance costs, impacts on the industry and overall economic growth. In a recent paper on this subject, we try to broaden this debate by classifying the arguments around localisation into three perspectives -- the civil liberties perspective, with a focus on expression and privacy rights; the government functions perspective, focusing on data access by state agencies for regulatory and law enforcement purposes; and the economic perspective referred to above.

Following an exploration of these different perspectives, we note that the overall costs of across the board data localisation norms are likely to outweigh its expected benefits. Yet, there may indeed be circumstances where a narrowly-tailored localisation requirement might be justified. Therefore, rather than implementing far reaching, but poorly thought out, solutions mandating data localisation, the current focus should be on building a transparent process for weighing the trade-offs of data localisation in different contexts. At the same time we must be equally cautious of sweeping "free flow of data" provisions in international trade agreements, which may amount to giving up the ability to adopt specific measures as and when a need is identified.

What is data localisation?

The term data localisation generally refers to requirements for the physical storage of data within a country's national boundaries although it is sometimes used more broadly to include any sort of restrictions on cross border data flows (Chander and Le, 2015). Ferracane (2017) categorises such restrictions into two broad heads -- strict and conditional. The former category includes requirements of local storage or processing of data or, in stricter cases, a complete ban on transferring the data abroad. In case of conditional restrictions, the transfer of the data is made subject to the satisfaction of certain conditions, such as obtaining the consent of the user before transferring the data.

In the paper, we use the term in its commonly understood sense, implying the mandatory requirements of local data storage. This could be in the form of exclusive retention norms, which mandate that the data should be retained only on domestic servers, or the slightly less stringent version of data mirroring that compels at least one copy of the data to be stored locally.

Despite the attention it has garnered in recent times, the data localisation debate in not something new. As per a study conducted by the European Centre for International Political Economy, over 80 different localisation measures were introduced in the 64 countries studied by them in the last 50 odd years (Ferracane, Lee-Makiyama & de Marel, 2018). Links can however be drawn between the surge in data localisation measures in the last ten to fifteen years and the rise of the data-driven economy with accompanying social, economic and political consequences.

While some countries, like Russia, China, Vietnam and Indonesia, have opted for relatively broad based localisation requirements, most others tend to apply differential standards based on the nature of the data and the sector to which it pertains. To take a few examples, sectoral localisation norms are found in Australia (health data), France (data relating to judicial proceedings) and Germany (telecommunications metadata and tax accounting data) (Cory, 2017). It is also common to find localisation requirements for government and public sector data. India has also adopted localisation norms for certain specific types of data, such as public records as well as data held by telecommunications providers. Pursuant to the RBI's directive in 2018, all payments sector data is also required to be stored "only in India". The paper criticises the manner in which this decision was brought into effect, without sufficient articulation of the objectives; inadequate justification for choosing exclusive localisation as the most appropriate response; and absence of any public consultation.

In the sections that follow we outline the implications of localisation measures from the civil liberties, government functions and economic implications perspectives.

Civil liberties perspective

While localisation may affect a number of rights, including those relating to business, property and association, the primary rights affected are that of privacy and freedom of speech and expression.

Privacy and security of data: The increasing privacy awareness in India, particularly after the Supreme Court's judgment in the Puttaswamy case and the Cambridge Analytica-Facebook incident, is often used as a peg to demand the localisation of personal data. This is also reflected in the Personal Data Protection Bill, 2018, which mandates the creation of local copies of all personal data (subject to certain exceptions). The Bill also requires exclusive domestic processing of certain categories of data, which are to be notified in the future. Given the increasing volumes of user data being generated and captured in the digital ecosystem and the possible harms that may occur from unauthorised uses of personal data, there is little doubt about the need for having appropriate legal and technical frameworks for data protection. It is, however, questionable whether merely locating data within the territory of India would actually make it any safer or less likely to be misused, particularly in the absence of a modern and well-functioning data protection law.

There are three sets of issues to be considered in this context:

Architectural issues: The first set of questions relate to whether localisation would lead to greater centralisation or decentralisation of data; and which of these would be preferable from a security and data protection perspective? Some argue that forced localisation would cause providers to spread their resources over a large number of locations, with reduced security at each level (Cohen et al., 2017). It is also argued that domestic enterprises may lack access to the necessary infrastructure and technical or human capacity to implement strong data security measures (compared to bigger, globally competitive entities based in jurisdictions of their choice) (Chander & Le, 2015). Kuner (2015) however points to the "jackpot problem" -- that hackers often target large global players, precisely because of their size and the quantity of user information they store. In addition, there is also the question of how data localisation requirements will be monitored and enforced and what this may mean from a civil liberties perspective.

Adequacy of current laws: While privacy has been recognised as a fundamental right, the institutional framework for enforcing this right still remains inadequate. Indian law also continues to grant wide powers of interference with privacy rights to the Government. Notably, the Government has broad powers to call for information under the Criminal Procedure Code and other surveillance related laws. In the absence of broader legal or regulatory reform, it is therefore questionable whether localisation will actually enhance privacy and security of personal data of Indians. It is worth noting in this respect that instead of insisting on mandatory localisation, alternative and less intrusive measures could also be considered to ensure the safety of data irrespective of its location. The European GDPR, for instance, utilises measures such as binding contractual rules and adequacy decisions to ensure that data is protected irrespective of its location.

Given the requirements for an interference with the fundamental right to privacy, as articulated by the Supreme Court in the Puttaswamy case, the onus would be on the state to demonstrate the proportionality and necessity of any localisation measures. This would involve demonstrating that no alternative, less intrusive means are available to reach the same end, which will be hard to justify in case of sweeping localisation measures.

Domestic and foreign surveillance: In the absence of adequate checks and balances in the law, localisation can enable more intrusive information gathering by local intelligence and law enforcement agencies (LEAs). While acknowledging this concern, it is sometimes argued that localisation would also limit the ability of foreign intelligence agencies to spy on Indian data. However, the sufficiency of this reason as a ground for localisation can be contested on three fronts. First, as noted above, localisation would make it easier for local agencies to carry out surveillance, both through legal as well as extra-legal means. Increased surveillance by domestic agencies would constitute a greater immediate threat to citizens compared to surveillance by foreign agencies. Second, legal developments such as the passage of the CLOUD act in the United States (US) authorise US agencies to access data stored abroad by US companies. Finally, given what we know about the pervasive and sophisticated nature of intelligence tactics used by several agencies, localisation may not actually stop them from accessing local data. To fully safeguard domestic data against any such interference will require a level of isolation from the Internet, which is not desirable or even possible in a modern democratic setup.

It therefore appears that localisation may increase domestic surveillance while the benefits with respect to foreign surveillance remain unclear. However, in this context one also has to keep in mind that while a citizen may have some protections against surveillance conducted domestically, this would be much harder in case of surveillance by foreign actors.

Ultimately, the degree of protection afforded to data depends on the effectiveness of the data protection regime and the technical measures being implemented. India is currently lacking on both parameters. Without such frameworks in place, using privacy or security of data or the possibility of a data breach as an explanation to mandate localisation appears far-fetched or, at best, premature. In general, the interests of Indian users would be better served by making sure that the relevant data is adequately protected, irrespective of its location, by putting in place a comprehensive law covering issues of data retention, access by regulators, courts and LEAs and safer mechanisms for cross border data transfers.

Freedom of speech and expression: As far as the effects of localisation on expression rights are concerned, one must keep in mind that an essential characteristic of the Internet is the ability to send and receive information freely across borders. This global access enables the Internet's generativity -- the capacity to enable unforeseen innovation; which could be harmed by broad localisation norms. While merely locating data in a country does not in itself make it vulnerable to censorship (or surveillance); data would certainly be more vulnerable if the country the data was located in had laws that gave the state greater powers of restricting access to content, or if it lacked the capacity or will to ensure proper oversight of its executive agencies.

The Indian state has been increasingly resorting to broad based censorship measures in the digital space. Examples of this include the proposal of requiring Internet intermediaries to undertake proactive monitoring of content on their platforms (Bailey, Parsheera and Rahman, 2019); and the increasing number and duration of Internet shutdowns. These instances indicate that localisation may provide yet another tool for the state to carry out censorship more easily and effectively.

Localisation could also mean that smaller entities or those that do not consider India to be a significant enough market to justify the financial and transactional costs of localisation could pull out their services. This is known to have happened in the European context post the enactment of the GDPR, which resulted in some online multiplayer games and foreign news websites becoming inaccessible to European users. It is also worth remembering that censorship of localised content could make it inaccessible all over the world (not just domestically).

One of the arguments put forth by the Justice Srikrishna Committee in support of localisation is that it would reduce the vulnerabilities that India may face in case of any breach in undersea cables and resulting disconnection from the Internet. We believe that the benefits to speech rights, which may result in that (low probability) circumstance, are offset by the real threat of increased censorship and denial of access to media and services on an ongoing basis.

Government functions perspective

It is the duty of a state to ensure that individuals are adequately protected and have an effective remedy for breach of their rights. This requires state agencies to have appropriate tools for the investigation and take down of illegal content, in accordance with the procedure established by law. Equally, regulatory entities also have genuine requirements to access data in connection with the discharge of their functions. It has however been noted that jurisdictional and other barriers often make it difficult for domestic agencies to gain legitimate access to the required data. The absence of broader international agreements on cross-border data sharing and complexity or delays in the processes under mutual legal assistance treaties (MLATs) further complicate this problem.

On the face of it, it therefore appears that localisation would aid law enforcement and other domestic institutions to implement local laws more effectively. It would also not be a stretch to argue that companies are far more likely to respond to requests from local authorities in circumstances where these agencies are in a position to take punitive action against physical infrastructure or personnel. However, a closer examination may lead one to question whether localisation will indeed help enforce laws or secure regulatory access on account of the factors listed below.

Location not the only determinant: Location is not the only determinant of lawful access by LEAs and regulatory agencies. A significant amount of data flows are encrypted in nature. In fact, regulatory entities such as the RBI themselves mandate encryption of certain forms of data. This implies that even if the data is stored locally, authorities have to go through the process of making lawful decryption requests before the data is accessible to them in a usable form. The Apple-FBI situation where the company refused to decrypt data for the FBI illustrates the kind of barriers that may be faced in this process.

Need for proportionate measures: Localisation may not always be the proportionate or least intrusive measure to ensure regulatory access or compliance with local laws. For instance, the entities captured by the RBI directive for the payment sector, already have various reporting and access requirements by virtue of being licensed service providers. These could be made more stringent without a need to localise the data. Similarly, tax laws are also evolving to account for cross-jurisdictional activities -- for example, through the "equilisation" levy adopted in 2016 and the more recent development on taxation based on "significant economic presence" irrespective of having a place of business in India.

Legitimacy of requests: The apparent unwillingness on the part of global intermediaries (such as Google and Facebook) to comply with government requests for information, does not necessarily imply a recalcitrance on the part of these business to comply. It may for instance indicate vague or improper requests being made by the Government and its agencies. That said, anecdotal evidence, whether in the form of the Snowden revelations or otherwise, does reflect a significant mismatch in the information sharing by large Internet intermediaries with Governments in their home countries compared to countries such as India.

To the extent that legitimate Government access remains a problem, the existence of less intrusive measures that could achieve the same ends needs to be explored. For instance, the Telecom Regulatory Authority of India had noted in its cloud computing recommendations that India should try and sign more MLATs and more holistic ones, which could also include mechanisms for electronic processing of requests. The private sector could also be encouraged to adopt electronic reporting mechanisms for government requests. For instance, reports suggest that Apple is already working on such a platform. At the same time, more work needs to be done on identifying the specific problems being faced by law enforcement or other agencies in accessing any specific types of data; the responsible stakeholders and targeted interventions that may be adopted. Where necessary, such measures may include limited localisation. This may, for instance, mean requiring specific categories of providers to keep a copy of the data within the country, if it can be clearly demonstrated that immediate and on-demand access to specific types of data is necessary for the discharge of specific state functions and the same cannot be achieved through other less intrusive means.

Economic perspective

The third set of arguments one sees in the context of localisation relate to issues of the Internet economy and costs of localisation measures. We consider three issues in this context: first, the macro and micro economic costs of localisation; second, the effects on the local economy in terms of inviting reciprocal measures, boosting competition in the sector or aiding local manufacturing or AI industries; and finally, we examine how localisation related measures are increasingly becoming a part of the international trade discourse.

Costs and impacts on the economy: One of the main arguments against mandatory localisation stems from the cost that it is likely to impose on businesses and consequently, consumers and the economy as a whole. Widespread localisation norms will mean that businesses and other users -- both domestic and foreign -- will no longer have the flexibility to choose the most cost-effective or task-specific location to store their data. In addition to reducing the benefits made possible through economies of scale, companies will also need to duplicate infrastructure in multiple jurisdictions. The global nature of the Internet has also enabled numerous cross-jurisdictional services, platforms and functions - ranging from high-end cloud based services to detection of fraud in credit card systems using cross-jurisdictional data. These costs / efficiency losses will ultimately be passed onto consumers in the form of higher costs of service or reduced functionality.

While several sources refer to the cost implications of localisation, there are only a handful of studies that actually attempt to quantify the potential economic gains or losses. The most oft quoted (though not uncontested) study on this subject has been released by the European Centre for International Political Economy. This predicts a reduction of the Indian GDP by almost a percentage point should broad localisation measures be introduced (Bauer et all, 2014). The authors note that any gains stemming from data localisation are too small to outweigh losses in terms of welfare and output in the general economy. Another study points to how forced data localisation laws would require companies to pay 30-60 percent more for their computing needs (Leviathan, 2015). Two newer studies however demonstrate that while restrictions on cross border data flows inhibit trade and services, policies targeting the uses of data, which include measures ranging from data retention requirements to government access and data breach notification norms, have a much larger negative impact on productivity (Ferracane and van der Marel, 2018; Ferracane, van der Marel & Kren, 2018)

Looking at the digital ecosystem in India, it appears that the costs imposed by broad localisation measures would be non-trivial, given the underdeveloped state of India's data center infrastructure. A part of this is due to the costs involved in building large data centres, the absence of proper downstream infrastructure such as uninterrupted power supply as well as weather conditions in India which necessitate greater expenditure on cooling. Notably, a Gartner study in 2015 found that India held just about 1.2 percent of the world's data center infrastructure and 5.23 percent in the Asia-Pacific region (IAMAI, 2016). Taking into account factors like energy cost, international bandwidth, ease of doing business and taxation provisions, the Cushman and Wakefiled (2016), Data Center Risk Index score placed India at thirty sixth position, with a score of 47.84 (out of a highest score of 100). Essentially, present conditions make it uneconomical and inefficient to host large quantities of data in India. The now abandoned report of the draft e-Commerce Task Force (2018), also acknowledged this fact. It highlighted the need for capacity development in terms of infrastructure for data centres, improvements in power supply and tax benefits before mandating full data localisation.

Effects on the domestic industry: While it is often claimed that localisation could provide a boost to local manufacturing and employment, this is contestable on the grounds that most equipment in data centres is imported and in any event, not much employment is generated by data centers. For instance, a $1 billion data center built by Apple in North Carolina, United States in 2011, created only 50 full-time jobs and another 250 support jobs in areas such as security and maintenance (Cory, 2017).

It is also unclear how localisation measures could act to aid competition (or reduce the reach of large dominant players) in the digital economy. Bigger companies are generally better placed to respond to and meet regulatory requirements. It is therefore possible that while the bigger companies can easily afford to set up data centers in India, smaller firms (whether Indian or foreign) may face relatively higher entry barriers on account of increased costs, thereby hampering competition in some sectors.

Another powerful narrative that has emerged in recent times is about the need for domestic mechanisms for the creation, sharing and use of data for the development of artificial intelligence (AI) development. To quote from the Justice Srikrishna Committee's report, "The growth of AI is heavily dependent on harnessing data, which underscores the relevance of policies that would ensure the processing of data within the country using local infrastructure built for that purpose". In this respect, it is not very clear how merely locating data in India will make it accessible for beneficial research, in the public or private domains.

One must also be aware of the possibility of retaliatory measures and the effects this could have on the vital information technology (IT) and related sectors. The IT sector contributed about 7.9 percent of India's GDP in the year 2017-18 (MeITY, 2018) and a sizeable part of export of India's IT services sector comes from the outsourcing / business process management industry (IBEF, 2018). Therefore, India's role in furthering a global push towards increased data localisation needs to be considered carefully, taking into account the likely consequences of reciprocal localisation measures by other countries.

Free trade agreements (FTAs): The growing importance of global e-commerce has placed data localisation debates at the heart of many international trade discussions. The US, in particular, has been at the forefront of pushing for the removal of various kinds of restraints on cross-border trade carried out through electronic means. Despite attempts by the US and other countries like Canada and Japan, the e-commerce conversation at the World Trade Organization (WTO) level is limited to discussions and has not achieved a rule-making mandate (Macleod, 2015). Many developing countries, including India, have resisted a broadening on this mandate. Many researchers also oppose this sort of "mission creep" at the WTO on the ground that it would require developing countries to sign away their right to strategically regulate the digital market and data flows (Gurumurthy and Chami, 2017).

While the global e-commerce discussions under the WTO have not managed to progress, provisions relating to cross-border trade and localisation of data have found their way into other multilateral arrangements. Prominent among these are the recently signed Comprehensive and Progressive Agreement for Trans-Pacific Partnership and the recently signed US-Mexico-Canada Free Trade Agreement. These arrangements contain fairly strong measures to support the free flows of data across borders.

The position adopted by those who seek to include data flow related issues in trade agreements appears to be based on the notion that personal data must be treated as any other commodity. Accordingly, free flows of data must be the de facto
position unless justified by overwhelming public policy concerns. What constitutes a legitimate public policy concern would be adjudicated at the international level, under the WTO framework (Hill, 2017). However, this approach has been challenged on three grounds. The first is a rights-based argument that sees personal data as being essential to a person's autonomy and identity, and therefore more than a tradeable commodity. Second, is the fact that commercial exploitation and trade in commodities of various kinds are in any case subject to various kinds of regulation or taxes. Third, is the concern that the use of WTO mechanisms for handling data flows would reduce democratic control over data (Hill, 2017).

Ultimately, irrespective of whether one considers trade negotiations to be an appropriate location to discuss cross-border data flows, a global resolution of the issue appears unlikely unless the rights based, economic and strategic concerns of developing nations are duly accounted for. The concern, however, remains that despite the absence of an agreement at the WTO level, widely worded data flow restrictions have already found their way into a number of bilateral and multilateral trade agreements.

Conclusions

The paper examines the key arguments that are generally used to make a case for data localisation under three heads. First, there is the claim that local hosting of data will enhance its privacy and security by ensuring that an adequate level of protection is given to the data. Second, it is argued that lack of government access to data (due to it being stored in another jurisdiction) impedes the law enforcement and regulatory functions of the state, which can be addressed through localisation. Third, there is the narrative on the economic benefits that will accrue to the domestic industry in terms of creating local data infrastructure, employment, and contributions to the AI ecosystem.

Following an assessment of each of these perspectives we find that the costs of introducing broad and sweeping data localisation norms are likely to outweigh its benefits, from a rights-based perspective as well as an economic one. India's approach to this question must also be informed by strategic thinking on whether a closed data economy or an open one would be more conducive to meeting its long-term social and economic goals.

However, this is not to suggest that data localisation can never qualify as a justified measure. There may indeed by circumstances where local storage (and even processing) of the data can be justified, particularly on certain normative grounds. In order to identify such instances and arrive at a narrowly tailored response, the policymaking process should ensure that any measures are adopted only pursuant to a well-defined and transparent evaluation process. The steps in this process would include (i) articulation of the specific problem(s) that are sought to be addressed: (ii) identification of the range of measures that could be used to combat the problem and assessment of the expected costs and benefits of each intervention; and (iii) evaluation of whether localisation is the least restrictive means to address the problem, with a graded approach of considering the least intrusive form of localisation before proceeding to stricter requirements. Importantly, this entire process should be carried out in an open and transparent manner allowing stakeholders the opportunity to question and strengthen the analysis.

References

Bauer et al, 2014: Matthias Bauer, Hosuk Lee-Makiyama, Erik van der Marel and Bert Verschelde, The costs of data localisation: Friendly Fire on Economic Recovery, ECIPE Occasional Paper, No. 3/2014.

Cory, 2017: Nigel Cory, Cross Border Data Flows: Where Are the Barriers and What Do They Cost?, Information Technology and Innovation Foundation, May 2017.

Chander & Le, 2015: A Chander and UP Le, Data nationalism, Emory Law Journal, 64(3).

Ferracane, 2017: MF Ferracane, Restrictions on cross-border data flows: a taxonomy, European Centre for International Political Economy.

Ferracane, Lee-Makiyama & de Marel, 2018: MF Ferracane, H Lee-Makiyami and EV der Marel, Digital trade restrictiveness index, European Centre for International Political Economy.

Goldsmith and Wu, 2006: J Goldsmith and Tim Wu, Who controls the internet: Illusions of a borderless world, Oxford University Press.

Baeur, 2016: M Bauer, MF Ferracane, E van der Marel & B Verschelde, Tracing the conomic impact of regulations on the free flow of data and data localisation, Centre for International Governance Innovation and Chatham House.

Gurumurthy and Chami, 2017: A Gurumurthy & AVN Chami, The grand myth of cross border data flows in trade deals, IT for Change.

Leviathan, 2015: Leviathan, Quantifying the costs of forced localisation, Leviathan Security Group.

IAMAI, 2016: Internet and Mobile Association of India, Make in India: Conducive policy and regulatory environment to incentivise data center infrastructure.

Ferracane & van der Marel, 2018: MF Ferracane & E van der Marel, Do data policy restrictions inhibit trade in services?, European Centre for International Political Economy.

Ferracane, van der Marel & 2018: MF Ferracane, E van der Marel & J Kren, Do data policy restrictions impact the productivity performance of firms and industries?, European Centre for International Political Economy.

IBEF, 2018: India Brand Equity Foundation, IT & ITeS Industry in India.

Macleod, 2015: J Macleod, E-commerce and the WTO: A developmental agenda, GEG Africa.

Hill, 2017: R Hill, Second contribution to the June-September 2017 Open Consultation of the ITU CWG-internet, why should data flow freely?, Association for Proper Internet Governance.

MeITY, 2018: Software and services sector, Ministry of Electronics and Information Technology.

Srikrishna Committee, 2018: Report of the Committee of Experts under the Chairmanship of Justice BN Srikrishna, A free and fair digital economy: Protecting privacy, empowering Indians.

Ecommerce task force, 2018: Electronic commerce in India: Draft national policy framework (non-official version), Medianama.

Cushman and Wakefield, 2016:Data center risk index, Cushman and Wakefield.

The authors are technology policy researchers at the National Institute of Public Finance & Policy. They thank Ajay Shah for valuable discussions.

Announcements

About DAKSH

DAKSH is a Bengaluru based civil society organisation that is working on judicial reforms. DAKSH works at the intersection of data science, public policy and operations research. Our primary focus is the study of the problem of pendency of cases in the Indian legal system, with the aim of suggesting sustainable solutions based on quantitative research and empirical legal methods. We are actively involved in creating sustainable solutions to improve judicial efficiency, process, administration and management.

Position

Technology Lead at Bangalore

Responsibilities

Leading Daksh’s technology initiatives.
Data collation from multiple sources, data parsing and channelling.
Making data available to the public including database management, UX, dashboards.
Customising, maintaining and enhancing DAKSH's CourtLog tool based on project needs, bug fixing, enhancing UX.
Provide analytic and strategic support to other programmes at DAKSH.
Engaging, coordinating and monitoring the work of external vendors.

Experience

At least four years of experience in technology roles.
Good interpersonal skills – ability to work collaboratively.
Deep passion to make an impact in the field of judicial reforms and familiarity with key technology-led interventions that are making a positive impact in this field.

Skills Required:

JavaScript (very strong)
Elastic Search (very strong)
PostgreSQL (strong)
AngularJS – (moderate)

Preferred:

NativeScript (strong)

Contact us

Interested candidates may please send their resumes to surya@dakshindia.org

Tuesday, February 19, 2019

Disclosures in privacy policies: Does 'notice and consent' work?

by Rishab Bailey, Smriti Parsheera, Faiza Rahman and Renuka Sane.

In a recent paper, Disclosures in privacy policies: Does notice and consent work? we evaluate the quality of privacy policies of five popular online services in India -- Google, Flipkart, Paytm, WhatsApp and Uber. Our goal is to question whether the present notice and consent regime is broken because of the way in which privacy policies are designed?

We analyse the identified privacy policies from the perspective of access -- how easy are they to find, how easy are they to read, and on issues of substantive content -- how well do they conform to well recognised principles of a model data protection law? In doing so, we evaluate whether the policies have specific, unambiguous and clear provisions that lend themselves to easy comprehension. It is pertinent to highlight that the versions of the privacy policies that were accessed for this study were dated as of March, 2018 i.e. before the European General Data Protection Regulations (GDPR) was enforced.

We try to evaluate how much do users typically understand of what they are signing up for, and if this can inform us on whether consent is an effective tool to enable individual control over personal data in the online environment. We conduct surveys in five universities in and around New Delhi, and randomly assign one of the five privacy policies to students in the classroom along with a questionnaire. The questions are classified into three categories -- 'easy', 'intermediate', and 'difficult'. The easy questions have a simple and direct answer in the policy. The intermediate questions require a closer reading of the policy making it slightly harder to figure the correct response. The difficult questions require careful reading and some inference. We evaluate the level of understanding of the policies based on the number of questions answered correctly.

Setting the context: Why is this question important?

The 'notice and consent' framework has been the basis for much of the thinking in modern data protection and privacy laws. It relies on the ability of providers to collect and process personal data conditional on providing adequate information to, and obtaining the consent of, the data subject. Its intuitive appeal lies in the normative value of individual autonomy that is the cornerstone of modern liberal democracies. Seeking consent is said to ensure an individual's autonomy and control over her personal information, enabling 'privacy self-management' (Solove, 2013).

There is, however, a growing concern around the inability of this model to provide individuals with meaningful control over their data in light of evolving technologies and data practices (Mathan, 2017). Concerns in this regard arise due to numerous reasons, including the fact that most people do not read the policies, do not opt out or change the default privacy settings (CPRC, 2018; ISOC, 2012), are not able to understand the policies, face consent fatigue and are therefore unable to make rational choices about the costs and benefits of consenting to the collection, use, and disclosure of their personal data (McDonald, 2008; Solove, 2013). Many privacy harms flow from an aggregation of pieces of data over a period of time through interconnected databases of different entities, or from the use of complex machine learning algorithms to make automated decisions. It is, therefore, unrealistic to expect people to assess the impact of permitting the downstream use and transfer of their data (Solove, 2013). Moreover, privacy policies are often binary in nature where people either have to fully opt-in or completely opt-out of using the services (Cate, 2013).

A large body of literature has therefore evolved that demonstrates that consent is broken, and yet, accepts the necessity of finding ways to make the notice and consent regime work better. These points about the evolving nature of consent have also been acknowledged in policy and legal debates. For example, in Europe, the recently enforced GDPR has continued (and in fact attempted to strengthen) the consent model implemented under the Data Protection Directive of 1996, while also setting out several duties of data controllers. In August 2017, the Supreme Court of India recognised the fundamental right to privacy (Puttaswamy, 2017). Around the same time the Government of India constituted a committee under the chairpersonship of Justice B. N. Srikrishna (Srikrishna Committee) to draft a data protection law. The Srikrishna Committee's report and the draft Personal Data Protection Bill, 2018 submitted by them to the Government have affirmed the central role of an effective notice and consent regime, making consent one of the grounds for processing of data.

As per the Srikrishna Committee's recommendations, for consent to be valid, it should be 'free, informed, specific, clear and capable of being withdrawn'. In case of 'sensitive personal data', the draft Personal Data Protection Bill, 2018 proposes a higher standard of 'explicit consent' with additional requirements on what would amount to informed, clear and specific consent with respect to such data. Given the critical role of consent in the draft law, it becomes important to question whether, and how, consent based frameworks can be made to work better?

Results: Accessibility

We first analyse accessibility of the selected privacy policies (Google, Flipkart, Paytm, WhatsApp and Uber) based on a series of measures including how embedded a policy is within a particular website, the length of the privacy policies, and the languages they are made available in. We find that the policies can generally be accessed through 1-3 clicks (from the main web page). However, the links to the privacy policies are usually positioned at the bottom of the main web page, and in relatively small font size. This does not lend itself to easy discoverability, particularly as links to the privacy policies are usually not highlighted.

As far as length of the policies is concerned, the privacy policies of the Indian companies we studied are significantly shorter than the studied multinational companies ('MNCs'). This is largely due to the greater number of issues touched upon as well as more detailed explanations of rights and obligations by the MNCs. Some of this may be due to the fact that the MNCs' policies may be following some of the obligations under data protection laws of foreign countries that contain more onerous requirements than India's Information Technology Act, 2000 (and the rules under it).

Interestingly, Google is the only company amongst those studied that provides a copy of its privacy policy in languages other than English. Despite some of the other websites being made available in Indian languages (for instance, Uber's website can be accessed in Hindi), the privacy policy continues to be accessible only in English. This clearly illustrates a problem in a country where English speakers number roughly only 10-15 percent of the population.

Results: Readability

While measuring readability is not an exact science, tools such as the Flesch-Kincaid reading ease and grade level tests have been used for decades to analyse metrics such as word and sentence length and their impact on readability. It should be noted that the model does not actually analyse the meaning of words used, whether they could have multiple or ambiguous meanings, whether words used in the text are commonly used, etc. It is therefore possible for a completely un-understandable text (consisting of short but rarely used, complex or ambiguous words) comprising short sentences with short words to have a high readability score. Having said this, the scores do provide a useful comparative matrix to evaluate the readability of the privacy policies.

Applying the Flesh-Kincaid test to each of the privacy policies under study, we find that the policies are rated as either 'very difficult' (Uber, Google, Paytm) or 'difficult' (Flipkart, WhatsApp). The reading ease score of the policies ranged from 16.44 (Uber) to 41.03 (Flipkart) -- a higher score indicates better readability. To put these scores in context, Reader's Digest has a readability score of about 65; Harry Potter books are in range of 80s; and Harvard Law in the low 30s (Lively, 2015; Flesch, 1979).

The results therefore indicate that all the privacy policies under study are complicated documents and require a firm grasp of English and reasonably advanced comprehension abilities to be understood. Given that the target audience for many of these online services ranges from adolescents upwards, it appears that the privacy policies will prima facie be too complicated for many users to comprehend.

Results: Visual presentation

Another way in which reading a privacy policy can be made easier, both in terms of readability and comprehension, is through the use of highlights, marginal notes and by properly segregating and identifying overarching topics. We find some evidence of this in the studied policies.

Uber's privacy policy is divided into multiple sections with each sub-heading in bold font. The policy also contains marginal notes that summarise each section, thereby making the policy easier to understand at a glance. Notably, Uber also provides an easy-to-read summary of their privacy policies in a separate "overview" page. Google's privacy policy also contains segregated sections, and a table of contents which permits easy access to different portions of the policy. Interestingly, the policy also frequently uses layered information or pop-ups where additional information is presented pertaining to certain terms and activities when a user moves the cursor over highlighted words. While WhatsApp also provides segregated sections, it does not generally provide additional information in a layered manner or highlight particularly important information (though, certain highlighted terms do allow click-throughs, for instance "Facebook family of companies" and "cookies").

The two Indian companies -- Flipkart and Paytm -- do not provide layered information or any further click-throughs in their privacy policies. Flipkart demarcates sections using a bold font (in the same font size as the rest of the document), while Paytm utilises a larger font size, in bold, for section headings. The effects of some of these presentation strategies, like click-throughs and pop-ups, are however not reflected in our survey results as the survey was conducted using printed copies of the privacy policies.

Results: Terminology

We focus next on the kind of terminology used in the privacy policies. Our focus here remains on the text of the policies, without getting into the the manner in which the policies may be implemented in actual practice. We note that the use of legal and technical terminology in a privacy policy can lead to a decrease in comprehensibility for the user. Unless specifically defined, a user may not be aware of the true import of a particular word, particularly if technical in nature.
For instance, WhatsApp's privacy policy says:

we do not retain your messages in the ordinary course of providing our Services to you

This does not define what the phrase "ordinary course" implies or explain what the exceptions are. A user may, on a thorough reading of the policy come to understand that an exception may apply to situations where for instance, law enforcement is involved. However, there is no clarity on this. Similarly, the use of words and phrases such as "third party", "affiliate", "profiling", etc., may also lead to confusion in the minds of users given the absence of any specific definitions.

Connected to the problem of lack of adequate information within a privacy policy, is the issue of whether the information being provided is trustworthy and reliable. While it is outside the scope of the present paper to examine the issue of trust in online services, it must be kept in mind that online businesses frequently appear to treat user privacy rights with less than due respect (not least due to the lack of bargaining power and information asymmetry between the parties).

Results: Substantive analysis

For the substantive analysis of the policies, we analyse the policies based on how they conform to well recognised principles of a model data protection law -- i.e. whether they detail the methods and manner of collection of data; the permitted uses of data; information sharing practices with a third party, including with affiliated entities, and law enforcement; whether users are informed of data breaches; whether users are given rights pertaining to access, deletion and export of data; and whether users can seek clarifications or information about the uses of their data or the privacy policy itself. We also evaluate whether these policies have specific, unambiguous and clear provisions that lend themselves to easy comprehension.

The table below provides a snapshot of whether policies have specific provisions on the ten issues identified as a basis for analysis of policies (Y indicating that the issue is addressed in the policy, N indicating it is not, and NS indicating the issue is not specified.)

Our analysis indicates that many parts of the policies are poorly drafted, often containing language that seems intended to insulate the company from liability rather than genuinely informing the user. In several cases, the policies do not include rights that would be considered essential in a modern privacy framework (for instance clauses covering data breach notification, or data retention periods). Sometimes, the policies also seem to assume that the user has knowledge of legal terms and is up-to-date with statutory and other regulatory requirements in their jurisdiction (for instance, the policies studied frequently use terms such as 'to the extent permitted by law', 'as permitted by law', etc.).

Overall, we find that privacy policies are fairly widely drafted to permit service providers broad powers to collect and process information in pursuance of their business interests. Users currently have little to no leeway in amending the contracts entered into by them and must usually sign up for the entire contract if they wish to access the service (though certain services such as Google and WhatsApp do include some granularity in their privacy policies).

Results: Survey

Survey respondents do not obtain very high scores on the privacy policy quiz. The average score of the sample (155 students) is about 5.3 on 10, i.e. on an average respondents were able to correctly answer 5 out of the 10 questions. The policy-wise scores varied between 4.6 (WhatsApp) to 5.9 (Uber).

Respondents fared the worst on policies that had the most unspecified terms, and on policies that were long. They also seemed unable to understand terms such as 'third-party', 'affiliate' and 'business-partner', that are often used in the context of data sharing arrangements.

Not surprisingly, we find that a greater percentage of respondents got the easier questions (as classified by us) correct. For example, almost 76% of the respondents got the correct answer to Q1 on collection of data; about 68% got the correct answer to Q5 on data sharing with the Government, as this information was explicitly provided in most of the policies. The more difficult questions, classified based on factors such as the use of complex legal terms or ambiguity about specific provisions, saw poorer results.

We believe that the complexity of the language and inadequacy of specific details in the policies are reflected in the low understanding of respondents. What is interesting about the responses to the survey is that when provisions are clearly drafted, or when users can be expected to find the answers in the policy, they are more likely to evaluate the questions correctly. However, when terms whose meaning is not precisely defined are used (such as 'third-party' and 'affiliate', for example), respondents make more mistakes. This suggests that in an environment where respondents actually do read the policy, and when the policy is unambiguously drafted, respondents are able to make better sense of what is being offered to them. Better design and drafting of privacy policies is therefore a prerequisite for notice and consent to work better.

Conclusion

While surveys of a similar nature have been conducted in other jurisdictions, we are not aware of any similar study (to understand how users interact with privacy policies) involving Indian participants. The peculiarities of the Indian context throw up new challenges of diversity in language, literacy, modes of Internet access and other variations among the over 500 million Internet users in India. All of these factors will play a role in determining the appropriate design of disclosures and consent frameworks for Indian users.

Our study makes a modest start in that direction by questioning how well do educated, English-speaking users fare in terms of understanding privacy policies. Making the same privacy policies accessible to the larger set of Indian users, many of whom are first time adopters of technology, is undoubtedly going to be a much larger challenge. The study therefore raises further questions on what drives understanding of privacy policies -- whether factors such as age, education, intelligence quotient, comfort with English, urbanisation, familiarity with Internet-based services, all play a role in how an individual evaluates what is on offer? It also raises questions on how privacy policies should be designed so that users are able to understand them better.

Ultimately, the goal of privacy policies should be to make it possible for individuals to evaluate trade-offs between privacy and service, and make choices that suit their preferences, which might themselves change over time. Finding ways to make the notice and consent framework more meaningful is an essential part of this process.

References

Solove, 2013: Daniel Solove, Privacy self-management and the consent dilemma, 126 Harvard Law Review 1880 (2013).

Mathan, 2017: Rahul Mathan, Beyond consent: A new paradigm for data protection, Takshashila Discussion Document 2017-03, 2017.

CPRC, 2018: Consumer Policy Research Centre, Australian consumers soft targets in big data economy, 2018.

Flesch, 1979: Rudolph Flesch, How to write plain english: A book for lawyers and consumers, 1979.

ISOC, 2012: Internet Society, Global Internet user survey, 2012.

McDonald, 2008: A McDonald and LF Cranor, The cost of reading privacy policies, I/S: A journal of law and policy for the information society, 4(3), 543-568, 2008.

Puttaswamy, 2017: Justice K.S. Puttaswamy v. Union of India, WP (Civil) No. 494 of 2012, Supreme Court of India.

Cate, 2013: F Cate and V Mayer-Schonberger, Notice and consent in a world of big data, International Data Privacy Law, 3, No. 2, 67-73, 2013.

Lively, 2015: Gerald Lively, Readability, Book Notes Plus, April, 2015.
&nbsp

The authors are researchers at National Institute of Public Finance and Policy. They would like to thank Omidyar Network for supporting this research.

Friday, February 15, 2019

Announcements

NASSCOM has openings for two senior technology policy positions.

Job Description

Lead policy initiatives focused on industry verticals - IT services, Business Process Management, Data Processing, Communication, FinTech, Banking, E-Commerce, Health and Mobility.
Work on technology policy issues arising out of concerns around data protection, privacy, reskilling, liability, taxation, competition, net neutrality, ethics etc.
Represent NASSCOM with government, regulators and members at a senior level. Assist in developing appropriate policies.
Mentor and lead team members. Develop policy submissions and research papers.
Work with members, think tanks, research firms, consulting and law firms, multi-stakeholder to build robust policy views.
Roughly 50-60% of the work will be based on individual contribution and rest based on co-ordination and management.

Knowledge, Skills, Qualifications and Experience

Essential
- Minimum of ten years of relevant work experience.
- Ability to question status quo, think out of the box and an open mind to contrary view points.
- Experience of working with the Government and Regulators at a senior level.
- Excellent writing and speaking skills in English.
- Published relevant research papers, blogs and opinion articles in reputed publications.
- Demonstrated ability to work in multidisciplinary teams.
- Self-motivated.
- Ethical mindset.
Desirable
- Knowledge of public economics and law.
- Working experience with reputed (a) policy think tanks, (b) Member of Parliament, (c) law firms and (d) media publications.
- Masters of laws, post graduate qualifications in public policy or economics

Position is Noida based. Email hr[at]nasscom[dot]in with position - policy in subject and a max 400 words write-up with CV.

Friday, February 08, 2019

Author: Faiza Rahman

Faiza Rahman is a researcher at the National Institute of Public Finance and Policy.

On this blog:

Backdoors to Encryption: Analysing an Intermediary's Duty to Provide 'Technical Assistance', 10 May 2021
Response to the Consultation Whitepaper on 'Strategy for National Open Digital Ecosystems (NODEs)', 12 July 2020
Constitutionalism During a Crisis: The Case of Aarogya Setu, 25 May 2020
Comments on the draft Personal Data Protection Bill, 2019: Part II, 10 April 2020
Comments on the draft Personal Data Protection Bill, 2019, 3 April 2020
Disclosures in privacy policies: Does 'notice and consent' work?, 19 February 2019
Response to the Draft Personal Data Protection Bill, 2018, 20 October 2018
Placing surveillance reforms in the data protection debate, 6 August 2018.
India's communication surveillance through the Puttaswamy lens, 18 May 2018.
Towards a data protection framework for India, 22 February 2018.
An analysis of Puttaswamy: the Supreme Court's privacy verdict, 20 September 2017.

Search interesting materials

Friday, February 22, 2019

Data localisation in India: Questioning the means and ends

What is data localisation?

Civil liberties perspective

Government functions perspective

Economic perspective

Conclusions

References

Announcements

About DAKSH

Position

Responsibilities

Experience

Skills Required:

Preferred:

Contact us

Tuesday, February 19, 2019

Disclosures in privacy policies: Does 'notice and consent' work?

Setting the context: Why is this question important?

Results: Accessibility

Results: Readability

Results: Visual presentation

Results: Terminology

Results: Substantive analysis

Results: Survey

Conclusion

References

Friday, February 15, 2019

Announcements

Friday, February 08, 2019

Author: Faiza Rahman