I worry that increasingly far-reaching and burdensome copyright laws, when combined with the analysis techniques of Deep Packet Inspection (DPI), will lead to pervasive chilling of speech. I see this as having real issues for both the creation and development of contemporary culture, which depends on mixing the past into new creations (with ‘the past’ increasingly copy-written), and for the opportunities to use rich media environments such as the Internet to create and distribute political statements. Copyright isn’t just an issue for musicians and artists; it’s an issue for anyone who is or who wants to engage in digital self-expression in media-creative ways.
Given that my earlier post about this relationship between DPI and freedom of expression may have seemed overly paranoid, I thought that I should substantiate it a bit by turning to a DPI vendor’s white paper on copyright. In one of their most recent white papers, ipoque talks about “Copyright Protection in the Internet“. One of the great things about this white paper is how the author(s) have divided their analysis; they identify different methods of limiting or stopping infringement theoretically (i.e. can a technology do this?) and then provide a ‘reality check’ (i.e. can this practically be implemented without gross rights violations or technical nightmares), and end each analysis with a conclusion that sums up ipoque’s official position on the method in question. I want to focus on detecting infringing files, rather than on preventing such transfers of those file, on the basis that it is the former that really depends on DPI to be effective.
This technique relies on capturing parts of a file to generate a unique representation of the file in question. Fingerprinting has the advantage of identifying infringing material even if it’s been changed. ipoque notes that the issue with this mode of analysis is that it is computationally expensive, and thus cannot be implemented in real-time network environments. You need to capture files for analysis, and this has associated data-retention and privacy issues. Moreover, this mode of examination cannot penetrate encrypted file archives or data packets. If ISPs adopt fingerprinting, more infringing material will be encrypted, which will render this mode of analysis (effectively) useless.
In terms of mashups, the worry is that if fingerprinting is successful then an ‘infringer’ who is developing or sharing a mashup can be quickly identified and shut down. The question then becomes, ‘what constitutes fair use?’ and ‘how do I know if what I am doing is fair use?’ In countries such as Canada, which lack an American-style fair use defence, things get even trickier. Importantly, when fingerprinting each packet there is an assumption that each file is potentially carrying copy-written data (and is thus ‘guilty’) as opposed to requiring some motivation or evidence to subsequent examine and match packets with existing ‘fingerprints’.
This being said, ipoque is incredibly reluctant to suggest widespread adoption of this mode of analysis. While not noted in ipoque’s white paper, if fingerprinting were sparingly used then it could easily be seen as a digital equivalent analogue policing tools; if appropriate judicial warrants are obtained, data packets could be collected using a wiretap warrant and subsequently analyzed by digital fingerprint. Whether a society is willing to allow law enforcement access to this kind of heuristic analysis is a question for civil society, and is a debate that we desperately need to have.
2. File Hash-Based Identification and Blacklisting
When a piece of software or video is released onto torrent sites, it is often provided in a series of different formats. Each of these differently formatted files has a unique hash identifier (e.g. playme.avi and playme.mpg would play the same content in a different file type), and the ‘format shift’ can lead to multiple hash codes being associated with the same content (ipoque sees the common ratio between a title and its copies as 1:3-6). Traffic managers can maintain at least one million hash entries and selectively block/allow file transfers. These managers, and their analysis of hash identifiers, are effectively deployed against unencrypted public file-sharing environments. At the moment, DPI appliances allow for this kind of analysis and blocking, but ipoque maintains that whole countries or larger regions would need to participate in a common anti-infringement strategy for hash-identification to effectively stop or limit infringement. Details on relatively low costs (~2-3 Euro/user/year) are provided in the white paper. This would situate ISPs as guardians of content and, while ipoque doesn’t mention this, it seems as though this would necessarily call into question their status as common carriers. Further, were ISPs to adopt hash-based identification systems, then it would seem as though encryption does protect consumers’ privacy by masking data content from the ISPs that would be deploying DPI-enabled surveillance dragnets. This would contrast with their stated position in the 2008/2009 Internet Survey, which I previously talked about.
The Politics of Hash-Based Identification
The DPI company does note the following:
The implementation of such a system is new territory for most countries and would certainly trigger fierce debates involving the content industry, privacy and data protectionists, and consumer protection groups. In several countries there are ongoing discussions about this countermeasure. (ipoque 2009: 5)
This both nicely recognizes the debates that should/could/are occurring as these modes of content analysis are being deployed, and that the copyright struggle is being framed as a ‘war’ (through use of the term ‘countermeasures’) in the discussion of hash-based identification. This mode of heuristic analysis would outfox most infringing behaviour that was non-encrypted, though as soon as users moves to encrypted traffic it is likely that black- and white-lists would need to be deployed. In turning to such lists, a normalization of what are ‘legitimate’ and ‘illegitimate’ file repositories/share sites would begin, and reinforce the position that some ISPs are already engaged in normalizing their consumers’ habits through DPI-regulated data transfer policies.
Such normalization practices have severe repercussions for online freedom and liberty of action and movement. Moreover, implicit in ipoque’s suggestion that effective hash-monitoring requires national or regional policies is that there would need to be a widespread harmonization of copyright policy. If contemporary American-style copyright law becomes the model that other nations are required to harmonize with, then nothing will enter the public commons for a very long time, and thus political speech and cultural growth could be forced underground.
Active versus Passive Monitoring
There is a question of ‘how’ data traffic surveillance occurs. Active monitoring would see a copyright holder use a P2P program to connect to infringing peers, copy their IP addresses, and subsequently associate those addresses with their end-users. Passive monitoring, on the other hand, “inspects the complete Internet traffic, ignoring all uninteresting traffic and looking only for exchanges of copyrighted titles” (ipoque 2009: 6). ipoque recognizes that this would cause “severe privacy and data protection concerns as it has, potentially, access to all data, including e-mails, web traffic, etc. The two methods – active and passive monitoring – are totally disparate technologies” (ibid., emphasis added). Active monitoring (such as what the RIAA tries to do) has received incredibly negative attention, and ipoque argues that passive monitoring “is politically unfeasible in most countries” (ipoque 2008: 7). One is left wondering whether passive monitoring would remain politically unfeasible should the shadowy Anti-Counterfeiting and Trade Agreement be formally accepted by participating governments.
In the case of behavioural targeting in the UK and analysis of data traffic by Canadian ISPs, passive monitoring is being used. As such, the worries that privacy advocates are identifying and vocalizing emerge because DPI primarily is being used for such passive action. Passive monitoring is, in effect, a dragnet surveillance apparatus. Active monitoring, on the other hand, at least appears to be a more targeted mode of surveillance, though the ethics and accuracy of ‘active’ monitoring are questionable.
I wonder: what would happen if the DPI debates began to revolve around ipoque’s technical separation of active and passive surveillance? Would the public be as swayed by privacy advocates’ arguments if ISPs and other gatekeepers shifted from passive to active monitoring?