“Data scientists solve complex data problems through employing deep expertise in some scientific discipline. It is generally expected that data scientists are able to work with various elements of mathematics, statistics and computer science, although expertise in these subjects are not required. However, a data scientist is most likely to be an expert in only one or two of these disciplines and proficient in another two or three. Therefore data science is practiced as a team, where the membership of the team have a variety of expertise.” – Wikipedia Reference
The “information explosion” started around 1941. (see reference) Believe it or not, the phrase “data scientist” was coined in 1960 by Peter Naur (Wikipedia Peter Naur) as an interchangeable noun with “computer scientist”. However, the term wasn’t related directly to statistical data management until 1997 when, in an inaugural lecture at H. C. Carver Collegiate Professorship in Statistics at the University of Michigan, Professor Naur defined statistics in science. The name of his lecture was: “Statistics = Data Science?”
It was a very modest beginning to something which has exploded into a entirely new profession today.
In 2006 I was 10 years into my career in data management, manipulation, and adoration. I also had a liberal arts education in political science – clearly unrelated to data science. I had no idea what data science was, but I knew I loved data.
In that year a colleague of mine and I sat down with a quad processor server and MS SQL (some antiquated version) and a terabyte of data getting moved around every night to figure out what to do with it. To our knowledge there was no name for what we were doing. It was really more a pain in the proverbial … well, you know what I mean… for the company we were consulting with. They knew there was value in that data, but at that point there was no popular concept of what that value was. At least not in our little corner of the world.
The data came from click events in a few major websites – a LOT of click events – one terabyte a day worth. My colleague and I were making great strides in ensuring the four processors were working overtime all night every night (one thread per processor running parallel). One day I was discussing the project with a non-technical relative of mine when she stopped me and said in a very startled tone, “You mean you track everything I click on? Creepy!”
Yep, I realized at that moment we were headed right for the last chapter of an Orson Wells novel.
Don’t get me wrong. I still love data. I still ‘practice” data science as my profession and hopefully I get better every day at it (I should be getting better after almost 20 years in this profession). However, that realization was really creepy. We were tracking every click event our customers created on those websites. What else was being tracked in other organizations?
Today we know that everything we touch, every comment we make, every photo we post somewhere is tracked, analyzed and reported on to some organization. I think some of the creepiness has been washed out of the concept simply because we’ve come to accept this “Big Brother” technology as an inevitable part of progress.
The unasked question remains, however: will anything “creep us out” going forward in this scientific practice?
I can think of a few things if I give it enough energy, but I try not to because like most people in this profession I love the story data tells. It is not unlike the feeling of getting into a new car every day and smelling that “new car smell”. It’s something intangible that you cannot define for someone who hasn’t experienced it first hand. Every day I “get in that new car and inhale”. It is as beautiful a high today as it was in 2006 when I couldn’t sleep at night wondering what story our click event data would eventually reveal.