Public Databases and Massive Aggregation of Data

This is just a really quick thought that I wanted to toss out.

I perceive a problem associated with the digitization of public records: such digitization allows business interests to gather aggregate data on large collections of people while retaining identifiable characteristics. This allows for a phenomenal sorting potential. At the same time, we might ask, “is there anything we can, or really want to, do about this?”

Paradigm Shift

I hear this a lot – ‘Chris, you have to understand that things are different now. The paradigm is shifting towards transparency, and there’s nothing wrong with that, and you’re being a pain in the ass suggesting that there is anything wrong with transparency. Do you have something to hide, or something like that?’ This particular line bothers the hell out of me, because I shouldn’t have to expose myself without giving my consent, especially when I previously enjoyed a greater degree of privacy as a consequence of obscurity and/or the costs involved with copying, sorting, and analyzing analogue records. I fail to see why I have to give up past nascent rights and expectations just because we can mine data more effectively (hell, that would have been a meaningless statement around the time that I was born…). Efficiency is not the same as superior, better, or (necessarily) wanted.

Solution One: Creative Commons

I (generally) don’t mind people reading about what I’ve written, or about various facets of my life. Were I in court for some reason, a part of the justice system really does entail other people being able to read court records so that they can identify with the law as it was dispensed by and for the people (this is one of the areas where Hegel certainly puts an explanation of the legal system far more eloquently than Kant ever did, though both argue this point along dramatically different avenues). Perhaps some version of the Creative Commons could be developed so that designated uses can automatically search public databases, whereas other uses (such as corporate interests in some cases) would be restricted in the information they could collect per day/have access to in aggregate. Using a spider-like text file, and legislating that business is required to abide by these files, might be one way of dealing with this.

Solution Two: Limited Access Points

This won’t win me friends with advocates of ‘openness’, so get ready. Hell, I don’t know that *I* like this idea, and think that it sacrifices a bit much on the alter of the past. Be that as it may …

What if, to access public databases, you had to have an IP that located you within a particular geographic range? Say you had to be within 50 km of the hosting location/location you presume it should be hosted at to get full access (i.e. if you are accessing information that the Ontario government holds onto, you need to be within 50 km of the parliament, even though the databases might actually be housed in Yellowknife). Perhaps, instead of this location based access, documents should have to be manually saved somehow, with the method used for displaying and saving documents intentionally randomized to prevent mass-saving and aggregation. In essence, why not implement some kind of technology that either correlates geographic location with the ease or difficulty of accessing documents, or implementing quasi-DRM solutions (that felt dirty to suggest…) to limit the easy aggregation of public records.