The other day I created a document in Google Docs entitled "Things that make me go 'Hmm...'". Today, I decided to make a blog out of it, but to focus it on ideas and thoughts that come to me during the work day.
Today's topic: data cleaning. The world we live in is increasingly run by data. The internet is becoming the global government, steering us all towards certain behaviors whether we realize it or not. We're attached to the damn web more than we're not, and the internet itself is becoming more and more integrated into other technologies creating an even larger web which is beginning to encompass all of life.
If amino acids are the building blocks of life, data are the building blocks of the internet. Companies collect data on customers, other companies exist solely to analyze that data, and others still create hardware to store it all. We live in a research-rich world which, you'd think, would have collected enough data by now to accurately model the real, living, breathing world around us. But it does not.
I got a little carried away with that. My original point in writing was that we, as a people, as beings who have taken it upon themselves to design and craft the world around us, need to become better, more accurate, more diligent when storing information.
At the two jobs I've held so far, I have been (and the company has been) continually discouraged by the general disconnect between data and the world it is supposed to represent and also between multiple sources of the same data. So much money (read: "time and effort") is spent on cleaning this data in order to make it useful for analyzing and actually making progress in any one of the millions of fields dirty data now affects.
The disconnect creates the need for manual operation rather than automation. In this data world, millions and millions of people are employed the world over in these menial jobs. The dot com industry (which is beginning to encompass all industries as well as creating new ones) employs domestic workers for some of this work, but also has a major hand in outsourcing it to places like India where it is cheaper.
The point of all this is, our increasing dependence on data and our reliance on those who collect it has led us to a position where we're (a) forcing people into boring jobs and (b) thwarting our own success at innovation as a human race. If life is about the pursuit of happiness, it doesn't look like we're heading in the right direction.
Thursday, May 1, 2008
Subscribe to:
Comments (Atom)