The web operates the way it does, largely, because there is a lot of money to be made in the digitally-connected ecosystem. Without the revenues brought in by DoubleClick, as an example, Google would likely be reluctant to provide its free services that are intended to bring you into Google’s ad-serving environment. A question that needs to be asked, however, is whether DoubleClick and related ad delivery systems: (a) collect personal information; (b) if the answer to (a) is “yes”, then whether such collections might constitute privacy infringements.
In the course of this post, I begin by outlining what constitutes personal information and then proceed to outline DoubleClick’s method of collecting personal information. After providing these outlines, I argue that online advertising systems do collect personal information and that the definitions that Google offers for what constitutes ‘personal information’ are arguably out of line with Canadian sensibilities of what is ‘personal information’. As a result, I’ll conclude by asserting that violations may in fact be occurring, with the argument largely emerging from Nissembaum’s work on contextual integrity. Before proceeding, however, I’ll note that I’m not a lawyer, nor am I a law student: what follows is born from a critical reading of information about digital services and writings from philosophers, political scientists, technologists and privacy commissioners.
Central to the claim that personal information is collected and used in ways that individuals do not anticipate or desire is identifying what the term ‘personal information’ refers to. Also, we have to consider the degrees of surveillance that individuals should expect experience in ‘public’ environments. These environments should be taken to include not just city parks and commercial stores, but also non-passworded/registration-free websites. The Information and Privacy Commissioner of Ontario (IPC), in their lengthy analysis of privacy and video surveillance in mass transit systems, argues that:
While the expectation of privacy in public spaces may be lower than in private spaces, it is not entirely eliminated. People do have a right to expect the following: that their personal information will only be collected for legitimate, limited, and specific purposes; that the collection of their personal information will be limited to the minimum necessary for the specified purposes; and that their personal information will only be used and disclosed for the specified purposes.
Moreover, there are socially-construed norms that ought to be recognized; humans are cognitively designed for the analogue world – this is one of the reasons why long-term storage of digitally externalized memory is so problematic – and as such analogue norms ought to carry over into the digital domain. In carrying over such norms, and considering the nature of information sharing in the analogue world, we need to focus on “not simply restricting the flow of information but ensuring that it flows appropriately,”  with “appropriately” a reference to making sure that data flows accord with contextualized social norms. In the social world there are finely calibrated sets of norms that govern the flow of personal information in distinct social contexts; such norms ought to be drawn into the digital instead of being repudiated in favour of an entirely novel set of norms that (negatively) rebalance power-relationships away from information/data holders in favour of data-aggregation bodies.
In thinking of a shopping environment that I physically walk into, precious little information is collected about me until I am involved in making a purchase (barring the introduction of high-tech analysis systems, such as behavioural recognition software tied to cameras, RFID scanners monitoring when shopping items are picked up, etc). Even where I am involved in multiple interactions with sales staff these are discrete, and not collated in a format that can be processed by a computer nor indexed to my name or a unique identifier. I am likely quickly forgotten about – retention of ‘collected’ data is incredibly limited, limited for the contextual needs of the sales situation. This is obviously different in a digital environment, where each page that is opened is logged, time on pages noted, entry and egress points monitored. The data collected in the digital environment is only ‘forgotten’ after a set period of time by the website administrator. This induces novel notion of ‘appropriate’ data durations; Daniel Solove in his book The Digital Person speaks to the dangers built into these data-drive dossiers that are compiled about individuals, and Viktor Mayer-Schonberger speaks to the risks of not forgetting in the digital era. In both authors’ texts, a resonating theme is that ‘appropriate’ data retention periods in a digital environment are radically, and often unnecessarily, different from their analogue precursors.
In the former sales situation, we have an instance of personal information (perhaps they ask your name) being retained for a limited period of time and for a legitimate, specific purpose (e.g. developing a relationship with an individual for sales purposes whilst in the store). It is subsequently disposed of following the conclusion of the transaction in most cases. This is directly at odds with the data collected in the server-based situation, where the smallest action is often recorded and used for extensive marketing and enhanced surveillance purposes.
Thus far, all that I have asserted is that more information is collected in a digital, and specifically sales, environment online than offline and that socially-construed norms from analogue contexts ought to be drawn to digital situations. As part of this, there is a recognition that ‘public’ online actions should not expect total privacy, but a lowered expectation of privacy does not equal an absolute dearth, or absence, of privacy. Let’s now move to outline what might constitute personally identifiable information.
In the interests of maintaining consistency (and avoid accusations of ‘cherry-picking’ from different government reports) we can turn to definitions of personal information in section 2(1) of Ontario’s Freedom of Information and Protection of Privacy Act that are prominently drawn on in the IPC’s analysis of the legitimacy of video surveillance. Personal information refers to recorded information about an identifiable individual and includes:
- information relating to the race, national or ethnic origin, colour, religion, age, sex, sexual orientation or marital or family status of the individual;
- information relating to the education or the medical, psychiatric, psychological, criminal or employment history of the individual or information relating to financial transactions in which the individual has been involved;
- any identifying number, symbol or other particular assigned to the individual (emphasis added);
- the address, telephone number, fingerprints or blood type of the individual;
- the personal opinions or views of the individual except if they relate to another individual;
- correspondence sent to an institution by the individual that is implicitly or explicitly of a private or confidential nature, and replies to that correspondence that would reveal the contents of the original correspondence (emphasis added);
- the views or opinions of another individual about the individual, and;
- the individual’s name if it appears with other personal information relating to the individual or where the disclosure of the name would reveal other personal information about the individual.
The two boldfaced sections, above, are the domains within which personal information is most likely to be collected and used over the course of online transactions.
DoubleClick, an advertising company that is owned by Google, operates by depositing small persistent cookies (which contain unique alphanumeric codes) on individuals’ computers. Persistent cookies remain on your computer after you close it – and can be juxtaposed against session cookies that are deleted with the closure of the browser – and used to trace users as they move about the web and deliver targeted ads with the additional assistance of web beacons. Such beacons are usually 1×1 pixels in size and designed to blend into a website so that the end user never notices their existence. This cookie-beacon-advertising system is intended to be transparent to the consumer.
DoubleClick asserts that personal information includes (but is not limited to) “name, address, telephone number, email address, social security number, bank account number or credit card number.” Sensitive information – information the company does not share or collect – “categorically includes but is not limited to data related to an individual’s health or medical condition, sexual behavior or orientation, or detailed personal finances, information that appears to relate to children under the age of 13 at the time of data collection; and PII otherwise protected under federal or state law (for example, cable subscriber information or video rental records).” It’s key to note that in this latter reference, what the sensitive information protected by federal and state law, is solely in reference to American privacy laws.
If you will permit a brief detour, the federal Office of the Privacy Commissioner acknowledges that personal information is collected with RFID system when locational data is associated with an RFID chip’s unique identifier. I argue that the unique number associated with the RFID chip and the unique alphanumeric string associated with a cookie an analogous, and further that movement across the web constitutes digital “movement” paralleling physical movements that can be detected by proximity RFID readers. Just as a sales company tracking information in their ‘private’ space would be seen as collecting personal information, so should any web company that tracks users as they move through the company website.
On this basis, the association of where a customer moves online with a unique number constitutes the collection of personal information; DoubleClick’s assertion that they are simply collecting “non-personally identifiable information” in their daily business activities doesn’t hold water. Moreover, from the perspective of the IPC’s understanding of personal information, an identifying number or symbolic representation is being correlated with a distinct computer moving across the ‘net, and it is entirely possible that “private correspondence” is being identified insofar as what are (from a social-norms perspective) private data surfing actions are surveilled and included in a DoubleClick database for the purposes of advertising.
There are two immediate responses that come to mind after asserting the DoubleClick is collecting personal information. On the one hand, someone opposing this position might argue that no person is identified, but a computer represented by an IP address (and perhaps a host of computers that are sitting behind a NAT-enabled router) is instead identified. Alternately, one might maintain that on the basis that online advertising systems are widespread the usage of DoubleClick’s (and other advertising agencies’) services is thus legitimate. I’ll address these in turn.
To the first, the Office of the Privacy Commissioner of Canada has asserted that where there is an association of personal information with an IP address that the address constitutes personal information. To put it another way, the IP address itself, on its own, isn’t ‘personally identifiable information’ but when it is mashed together with other data sources then it can be seen as personal information and thus deserving protection. For those individuals who have opted out of receiving a DoubleClick cookie they are still targeted with ads, in part based on their IP address as well as the name of the site they are visiting, specific web page they are visiting, key values, operating system type, windows version, user’s local time, and the ‘non-personally identifiable’ information sent by the website itself. This involves a collection of information and associating it with an IP address, and this information in a unique configuration may constitute the collection of personal information, regardless of DoubleClick’s assertions otherwise.
To the second, social norms dictate that certain modes of surveillance are to be genuinely ‘expected’ but such norms do not mean that individuals entirely abandon their rights to privacy as soon as they leave their home(s). What is critical in determining whether a surveillance practice is being performed in such a way that it constitutes a privacy violation is whether or not “its use in a particular context, in a particular way is not overly unusual. One cannot generalize from the observation that certain installation in certain contexts are commonplace, accepted, and supported to the conclusion that all installations irrespective of contexts will not violate expectations of privacy.” To put this in context, it can be read to say that simply because the installation and use of DoubleClick cookies is permissible under the context of American privacy norms (assuming that this is even the case) it cannot subsequently be asserted that these cookies are also permissible under the social contexts of Canadian or European privacy norms. Given that Canada and Europe have significantly stronger privacy protections that the United States – and if we suppose that the laws that are established are reflective of the dominant socio-political norms of a citizenry – it seems key that a normative needs to analysis be conducted to determine whether, after translating their analogue surveillance norms to a digital domain, Canadians and Europeans are would see the deposition of tracking cookies on their computer as normatively permissible.
So what is the takeaway from all of this? First, that it’s key to not just read how third-party groups that you may be working with (such as DoubleClick for advertising purposes) define and collect personal information. You have to go at least one more step to ensure that how these definitions are made and collections occur are in accordance with the laws and social norms of your nation. Second, that from the point of view of managing flows of information that end-users are interested in ensuring that their information flows in accordance with with their social norms, norms that are emergent from their analogue experiences and lives. Discounting such norms and demanding the individuals (for some reason…) simply adopt ‘digital norms’ misses the fact that humans remain analogue creatures, with analogue modes of dealing with the work as a key part of their social and biological characteristics. The failure of a web environment to recognize both of these points – that you need to check that collections of data are permissible and in accordance with social norms – and act on them threatens to undermine user trust, and even Google recognizes that in the absence of their end-users’ trust their business models would collapse.
 Nissembaum, Privacy in Context: Technology, Policy, and the Integrity of Social Life, p. 2.
 Nissembaum, Privacy in Context: Technology, Policy, and the Integrity of Social Life, p. 235.