Recently, I’ve heard back from Detica about CView and wanted to share the information that Detica has been provided. CView is the copyright detection Deep Packet Inspection (DPI) appliance that Virgin Media will be trialling, and is intended to measure the amount of copyright infringing files that cross Virgin’s network. This index will let Virgin determine whether the content deals they sign with content producers have a noticeable impact on the amount of infringing P2P traffic on their network. Where such deals reduce infringements, then we might expect Virgin to invest resources in agreements with content producers, and if such agreements have no impact then Virgin’s monies will likely be spent on alternate capital investments. I’ll note up front that I’ve sent some followup questions to seek additional clarity where the answers I received were somewhat hazy; such haziness appears to have been from a miscommunication, and is likely attributable to a particular question that was poorly phrased. Up front, I will state that I’m not willing to release the name of who I’m speaking with at Detica, as I don’t think that their name is needed for public consumption and would be an inappropriate disclosure of personal information.
The key question that is lurking in my own mind – if not that of others interested in the CView product – is whether or not the appliance can associate inspected data flows with individuals. In essence, I’m curious about whether or not CView has the ability to collect ‘personally identifiable information’ as outlined by the Privacy Commissioner of Canada in her recent findings on Bell’s use of DPI. In her findings, the Commissioner argues that because Bell customers’ subscriber ID and IP address are temporarily collated that personal information is being collected that Bell does collect personal information.
In the case of Bell, this didn’t mean that they had to stop the collection, but that they had to adjust their privacy policies to reflect this collection (though it should be noted that any such association and collection will happen, with or without a DPI appliance, because Bell always associates a subscriber ID with dynamically assigned IP addresses).
Now, this means that my examination of the CView system and consideration of privacy is different from those approaching the system from the stance of Regulation of Investigative Powers Act which might be seen as putting me off-side of some privacy advocates. I don’t necessarily have an issue with that, and in fact think that strong, well meaning discussion amongst the privacy community can be quite healthy – different levels of analysis and approaches are called for when facing particularly novel technological systems, and expecting a lockstep approach of these technologies and their accompanying politics is somewhat absurd. For my purposes, I’ll simplify things and identify a privacy infringement (for my purposes, if not those of RIPA) as entailing:
- A collection, processing, storage, or analysis of data that is associated with an individual, or a very specific set of individuals;
- A case where whatever is collected, processed, stored, or analyzed is done so to influence the individual, or specific set of individuals, in a particular and reasonably direct manner;
- An instance of data anonymization where there is the strong likelihood that such anonymization is either intentionally compromised or unlikely to be effective.
In terms of the CView system, let’s first address the concern of anonymization. Specifically, we have to ask how stringent the anonymization system actually is. When I asked Detica about this process, they informed me that because the CView device is intended to produce a Copyright Infringement Index (aka the ‘Piracy Index’) by evaluating the overall filesharing on a network that identity information isn’t required for this objective. IP addresses are anonymized at the source/DPI device using a pseudo-random replacement algorithm, which also entails ignoring the external IP addresses. The key generation system is managed automatically by the device (and thus an ISP can’t muck around with the system), and keys are periodically cycled and redistributed. The keys are never made available outside of the device, and once a set of keys for a given time period are discarded they cannot be recovered – the process is irreversible. On this basis, we can argue that no subscriber ID is associated with the randomized replacement algorithm, there is no way to associate a subscriber ID with the pseudo-random number after the fact, and as such the anonymization system should serve its purpose. Of course, there is a concern that there are no such things as anonymization processes – as noted by Paul Ohm – but I think that a more technical analysis of data logs would be required to figure out whether or not we could make the push that Detica’s system is a failure. At the very least, they appear to be making a real effort in keeping data sets anonymous and doing what they can to prevent privacy infringing behaviour.
One of the questions I posed to Detica, which was related to CView identifying copyright infringing files, went as follows:
“…what method is used to identify content. Is Detica using a file hash-based identification process or fingerprinting system? I ask because broadly identifying protocol alone would render any analysis of P2P data traffic as inherently infringing somewhat problematic, given that P2P is also used for legitimate file transfers.”
The member of the company I wrote to admitted that they couldn’t go into the specifics of how the system performed identifications for commercial reasons – this is normal when dealing with what are effectively corporate secrets – and thus couldn’t speak to their system using either fingerprinting or hash-based analysis. They did say, however, that the system is conservative, insofar as it makes its assessments based on assumptions that transfers are legitimate unless there are reasonable grounds for determining otherwise. As I read/translate this statement, it says to me that rather than classifying all P2P traffic as infringing, the system only flags infringing content as that which can be matched against its index of infringing files. Whether this entails fingerprinting (where only a fragment of a file is identified as infringing, as in a mashup that includes a second or two of a song, instead of the whole file, as in a .mp3 file of Madonna’s ‘Like a Virgin’), however, is unknown.
Detica’s responses maintain that their CView system is deployed in a passive mode (which is expected), and I’ve asked for clarification about whether or not it rests inline or offline – whether the appliances will perform traffic analysis in real time inline with the flow of data passing through the ISP’s network, or in a delayed fashion that sees the data traffic ‘offloaded’ out of the ISP’s network. I expect that it is a passive, inline appliance, but we’ll see. The company does maintain that “there is no persistence of any analysed content – Detica CView(tm) is a measurement system so could not be used as an evidence collection mechanism.” This means that the DPI appliance cannot be used, as designed, to identify individuals trading infringing material online, and thus cannot be effectively used to enforce any three-strikes law.
Ultimately, given that CView is engaging in network-level intelligence, without correlating IP addresses with a unique signature or code, let alone a subscriber ID, I’m not certain that this system is necessarily ‘privacy infringing’ as it’s presently configured and deployed. Does this mean that it can be used to subsequently insist on deeper penetration and analysis of who is trafficking following the establishment of a ‘piracy index’? Quite possibly – the political ramifications of having quantifiable network intelligence are vast. One of the reasons why DPI appliances in general are so interesting is how they are wrapped up in the politics of net neutrality, privacy, and copyright. Despite their interesting intersection along the crossroads of digital issues, perhaps we need to develop an archetype to engage with these devices as follows:
- What does the technology do, today? Does this constitute a privacy (or, preferably, constitutional rights grounded) infringement?
- What can the technology do, tomorrow? In light of what it can do, how should we advocate for strong protections to prevent our concerns from arising, and channel the technology towards ‘good’ outcomes?
- Ask the question ‘what needs to be put in place to ensure that the ‘good’ outcomes of tomorrow triumph over the possible ‘bad’ ones?’ and provide resources to achieve the good and avoid the bad.
This is a simple schema (and, actually, deserving a deeper analysis), but parallels what I’ve come to adopt over the past few months. It is critical that we analytically distinguish between temporal realities and futural possibilities, as well as between the issues of network neutrality, copyright, and privacy (among others) to develop sufficiently nuanced and complicated understandings and resolutions to the insertion of DPI appliances in ISP infrastructures. DPI is unlikely to go away; the aim now has to be to identify and proclaim ‘good’ uses of the technology and work to prevent the ‘bad’ uses from becoming prominent telecommunication practices.