This is another untechnical post – actually, it’s even a personal one. If you’ve been to this blog lately, you may have noticed something, that is, the absence of something – the absence of posts over nearly six months …
I’ve been busy catching up, reading up, immersing myself in things I’ve been interested in for a long time – but which I never imagined could have a real relation to, let alone make part of, my professional life.
But recently, things changed. I’ll try to be doing “for real” what would have seemed just a dream a year ago: data science, machine learning, applied statistics (Bayesian statistics, preferredly).
Doing Data Science … why?
Well, while this may look like quite a change of interests to a reader of this blog, it really is not. I’ve been interested in statistics, probability and “data mining” (as it was called at the time) long before I even wound up in IT. Actually, I have a diploma in psychology, and I’ve never studied computer science (which of course I’ve often regretted for having missed so many fascinating things).
Sure, at that time, in machine learning, much of the interesting stuff was there, too. Neural nets were there, of course. But that was before the age of big data and the boost distributed computing brought to machine learning, before craftsman-like “data mining” became sexy “data science”…
Those were the olden days, when statistics (in psychology, at least), was (1) ANOVA, (2) ANOVA, and (3) … you name it. Whereas today, students (if they are lucky) might be learning statistics from a book like Richard McElreath’s “Statistical Rethinking” (http://xcelab.net/rm/statistical-rethinking/).
That was before the advent of deep learning, which fundamentally changed not just what seems possible but also the way it is approached. Take natural language processing, for example (just check out the materials for Stanford’s Deep Learning for Natural Language Processing course for a great introduction).
While I’m at it … where some people see it as “machine learning versus statistics”, or “machine learning instead of statistics”, for me there’s no antagonism there. Perhaps that’s because of my bio. For me, some of the books I admire most – especially the very well-written, very accessible ISLR – Introduction to Statistical Learning – and its big brother, Elements of Statistical Learning, – are the perfect synthesis.
Returning to the original topic – I’ve even wondered should I start a new blog on machine learning and data science, to avoid people asking the above question (you know, the why data science one, above). But then, your bio is something you can never undo, – all you can do is change the narrative, try to make the narrative work. The narrative works fine for me, I hope I’ve made it plausible to the outside world, too 😉 .
(BTW I’m lucky with the blog title I chose, a few years ago – no need to change that (see https://en.wikipedia.org/wiki/Markov_chain))
And probably, it doesn’t hurt for a data scientist to know how to get data from databases, how to manipulate it in various programming languages, and quite a bit about IT architectures behind.
OK, that was the justification. The other question now is …
Doing Data Science …how?
Well, luckily, I’m not isolated at all with these interests at Trivadis. We’ve already had a strong focus on big data and streaming analytics for quite some time (just see my colleague Guido’s blog who is an internationally renowned expert on these topics), but now additionally there’s a highly motivated group of data scientists ready to turn data into insight 🙂 ).
If you’re reading this you might be a potential customer, so I can’t finish without a sales pitch:
It’s not about the tools. It’s not about the programming languages you use (though some make it easier than others, and I decidedly like the friendly and inspiring, open source Python and R ecosystems). It’s about discovering patterns, detecting underlying structure, uncovering the unknown. About finding out what you didn’t (necessarily) hypothesize, before. And most importantly: about assessing if what you found is valid and will generalize, to the future, to different conditions. If what you found is “real”. There’s a lot more to it than looking at extrapolated forecast curves.
Before I end the sales pitch, let me say that in addition to our consulting services we also offer courses on getting started with Data Science, using either R or Python (see Data Science with Python and Advanced Analytics with R). Both courses are a perfect combination as they work with different data sets and build up their own narratives.
OK, I think that’s it, for narratives. Upcoming posts will be technical again, just this time technical will mostly mean: on machine learning and data science.