Aggregating Information About CView

cryingcopyrightOver the past little while there has been considerable attention focused on Virgin Media’s decision to trial Detica’s CView copyright monitoring system. This system uses Deep Packet Inspection (DPI) technology to identify data protocols and likely files that are being transferred in order to generate a Copyright Infringement Index (i.e. a ‘Piracy Index’). As outlined by Detica, CView will let ISPs work with content creators to determine whether ISPs providing content through their portals lead to reductions in ‘infringing’ transfers of content through P2P file sharing.

The story about Detica’s involvement really broke with Chris Williams’ piece over at the Register entitled, “Virgin Media to trial filesharing monitoring system.” In the piece, he recognized that the trial will encompass roughly 40% of Virgin’s customers, that the aim is to measure overall levels of filesharing rather than identify individual customers, and (at least initially) will focus on music. After I read the piece, I send some questions off to Detica and posted them (“Virgin to Use DPI to ID Copyright Infringement“) based on my reading of Williams’ piece and Detica’s consultation paper, and shortly thereafter followed up with Detica’s responses and thoughts on CView and privacy infringements (“Update to Virgin Media and Copyright DPI“). Between the posting of my questions, and the response from Detica, Richard Clayton had a meeting with representatives from Detica and posted the information they released to him over at Light Blue Touchpaper in a posting “What does Detica Detect?” The Register was also able to get face time with people working at Detica, leading Williams to produce his second piece “Spook firm readies Virgin Media filesharing probes.”

In the rest of this post, I want to pull together the information that has come to light so that we can get a better picture of what is known about CView. As such, this is very much a summary rather than an analytic post; hopefully I’ll have time to delve the information more critically in the near future.

How does CView integrate with the ISP network?

In the interview with Williams, Dan Klein of CView acknowledged that the CView appliances are expensive enough that ISPs are unlikely to purchase very many of them. The system,

starts by using fibre taps to pick off traffic from an appropriate part of the ISP network. They use a fibre tap rather than “port mirroring” to make it easier for the ISP to be sure that they won’t disrupt any traffic. The links that they monitor need not be carrying all of the ISP’s traffic — they merely hope that it will be a statistically significant sample.

The raw traffic is then sent to the CView box, which can handle multiple 10Gbit links. The first stage of processing is in hardware (FPGAs), then software takes over. The “external” endpoint identity is discarded and the “internal” identity is encrypted using a key that is not made available outside the box (ie: the intent is to make the customer “anonymous” but to be able to link different activity from the same source). (“What does Detica detect?“)

Put in other words, the Detica CView system engages in a passive, offline (as opposed to inline) analysis – the traffic is split (i.e. mirrored) from the ISP network so that consumers don’t experience any meaningful impact on their speed, if any impact whatsoever is even felt.

What does CView detect?

The Register, in their December 7, 2009, article revealed that eDonkey, Gnutella, and BitTorrent were the protocols that were to be inspected by the CView. At the moment, the appliance is geared to examine for music files, but the original Register piece raises questions of whether or not it will necessarily be limited to just music files in the future.

How does CView perform detections?

After splitting the traffic into the CView appliance, it examines data traffic to determine if data is being carried along one of the three aforementioned file sharing services. Even when encrypting your data traffic, it is often possible to identify the protocol used based on the cleartext data that precedes the encrypted flow. It should be noted that the most recent Internet Evolution test of DPI provided test results confirming this capacity to detected encrypted P2P flows. Detections are performed passively, out-of-line from the ISP’s traffic. When data traffic is identified as being P2P a content field is generated with the below information:

  • the encrypted (and thus anonymised) customer identity
  • the type of P2P protocol
  • the content identifier value
  • the file size
  • a timestamp

Where the P2P flow is encrypted, while a record is generated no data can be entered into its fields. In addition to this information, the CView appliance will generate an ‘acoustic fingerprint’ from the file – this is, perhaps, the ‘content identifier value’ that Clayton notes? – and then passes this information along to a separate statistics box that will identify whether the P2P file is copyright infringing.

What about anonymity?

Of course there are worries that a system like CView could be used to rapidly identify the copyright infringers that are operating on a particular ISP’s network. Given the information provided by Detica, the company certainly is trying to secure the anonymity and identificatory privacy of ISP customers. Specifically,

IP addresses are anonymized at the source/DPI device using a pseudo-random replacement algorithm, which also entails ignoring the external IP addresses. The key generation system is managed automatically by the device (and thus an ISP can’t muck around with the system), and keys are periodically cycled and redistributed. The keys are never made available outside of the device, and once a set of keys for a given time period are discarded they cannot be recovered – the process is irreversible. On this basis, we can argue that no subscriber ID is associated with the randomized replacement algorithm, there is no way to associate a subscriber ID with the pseudo-random number after the fact, and as such the anonymization system should serve its purpose. (“Update to Virgin Media and Copyright DPI“)

Richard Clayton maintains, as I do, that CView is employing DPI in a manner that addresses individual privacy and data protection concerns though is mindful of the possible RIPA issues. Specifically, he writes:

I should also address (especially given the huge fuss over Phorm) the rather important question as to whether the system is lawful to operate? Please note thatIANAL, but I’ve studied their writings in this area a fair bit…

The design as explained above seems to address issues of privacy and data protection (amalgamating statistics and discarding identifiers is a sound technique for jumping these hurdles). But there is then the vexed question of illegal interception. The system does “wire-tapping”, that’s obvious, but the criminal offence is called “interception” and that is carefully defined within the Regulation of Investigatory Powers Act 2000. I expect that Detica would wish to argue that there is no interception because no content is seen by any humans… however, spitting out the file identifier might in itself be sufficient to infringe. It may take some case law before anyone can say for sure. (“What does Detica detect?“)

It will be interesting to see whether or not CView faces the same calibre of public outrage as Phorm did; Phorm ran up against Alexander Hanff (amongst others) but Hanff has recently noted his inability to work as a privacy advocate ‘full time’ this round as he did against Phorm. Admittedly, I see Phorm as engaging in different activities as Detica, but the emphasis often placed against Phorm (as I read things) was DPI first, and behavioural advertising second. That might, admittedly, be a coloured reading on my own part. Regardless, I’m sure that Detica’s PR staff is breathing some small sigh of relief that they won’t be dealing with Hanff full-time.

What is the utility of CView to ISPs?

Detica is promoting CView as a way for ISPs to establish an ‘index’ of copyright infringements. As noted by the Register,

Perhaps most importantly, at least at first, CView will measure how the overall level of copyright infringement via peer-to-peer networks responds to Lord Mandelson’s letter-writing campaign. If the Digital Economy Bill is passed in what remains of this Parliament, those observed by rights holder groups sharing copyright material could start receiving statutory warnings in the post from their ISP as soon as April.

A year later a system of “technical measures” – bandwidth restrictions, blocked protocols and disconnections for the most persistent – imposed on ISPs by Ofcom, is likely to follow. If successful in trial, CView will allow Virgin Media to monitor how its customers respond to the regime, although it will not be involved in idenfiying infringers. (“Spook firm readies Virgin Media filesharing probes“)

Richard Clayton notes that it might be the case that a small number of files might be ‘incorrectly’ identified as infringing, but such identifications are likely small enough to be inconsequential. The worry, of course, is that minor tweaks might turn CView into a “first-class monitoring system” that can be used to identify and target individual users. While an injunction would be required for this, the pushes for such injunctions in the EU mean that this is something that must be kept in mind, even though UK media conglomerates have not previously sought such injunctions.

What isn’t (totally) clear to me

I think that we’re developing a pretty good understanding of what the Detica system entails, as well as its characteristics ‘out of the box’. I’m still unclear about how the ‘audio fingerprints’ are taken – I assume (based on Occam’s razor) that based on what has been released to Richard that hash-based, rather than fingerprint-based, methods of analysis are being performed but can’t be totally certain. (Note: fingerprinting can be used to detect infringement where only a fragment of a file is identified as infringing, as in a mashup that includes a second or two of a song, whereas a hash-based analysis would only examine the totality of the file, as in a .mp3 file of Madonna’s ‘Like a Virgin’.)

The other, fairly major, element that isn’t clear is just how easy it is to ‘tweak’ the CView system as Richard suggests is possible. If we’re talking about a firmware update, that’s a fairly low cost with potentially very major functionality changes to the device, and would run counter to the assurances provided by Detica to Richard, the Williams, and myself that the system is designed to provide anonymity. On the other hand, if it would take a hardware modification, then the infrastructure, manpower and capital expenditure costs to ‘upgrade’ the device might alleviate the drive for ISPs to implement a genuinely granular user-identification system. Function creep with these devices, of course, is a real worry – it would be great for Detica to clarify how these ‘tweaks’ are technically possible as part of their ongoing efforts to be transparent.