Haskell, R, and HaskellR: Combining the best of two worlds (talk at UseR! 2017)

Earlier today, I presented at UseR! 2017 about HaskellR: a great piece of software, developed by Tweag I/O, that allows to seemlessly use R from Haskell.

It was my first UseR!, it was a great experience, and if I had the time I’d like to write a separate blog post about it, as there were things that did not quite align with my prior expectations… Stuff for thought, but not the topic of this post. (Mainly this would be about how the academic talks compared to the non-academic ones.)

So, why HaskellR? If you allow me one personal note… For the ex-psychologist, ex-software-developer, ex-database administrator, now “in over my head” data scientist and machine learning/deep learning person that I am (see this post for that story), there has always been some fixed point of interest (ideal, you could say), and that is the elegance of functional programming. It all started with SICP, which I first read as a (Java) programmer and recently read again (partly) when preparing R 4 hackers, a talk focused to a great part on the functional programming features of R.

For a database administrator, unless you’re very lucky, it’s hard to integrate use of a functional programming language into your work. How about deep learning and/or data science?
For deep learning, there’s Chris Olah’s wonderful blog post linking deep networks to functional programs, but the reality (of widely used frameworks) looks different: TensorFlow, Keras, PyTorch… it’s mostly Python around there, and whatever the niceties of Python (readability, list comprehensions…) writing Python certainly does not feel like writing FP code at all (much less than writing R!).

So in practice, the connections between data science/machine learning/deep learning and functional programming are scarce. If you look for connections, you will quickly stumble upon the Tweag I/O guys’ work: They’ve not just created HaskellR, they’ve also made Haskell run on Spark, thus enabling Haskell applications to use Spark’s MLLib for large-scale machine learning.

What, then, is HaskellR? It’s a way to seemlessly mix R code and Haskell code, with full interoperability in both directions. You can do that in source files, of course, but you can also quickly play around in the interpreter, appropriately called H (no, I was not thinking of its addictive potential here ;-)), and even use Jupyter notebook with HaskellR! In fact, that’s what I did in the demos.

If you’re interested in the technicalities of the implementation, you’ll find that documented in great detail on the HaskellR website (and even more, in their IFL 2014 paper), but otherwise I suggest you take a look at the demos from my talk: First, there’s a notebook showing how to use HaskellR, how to get values from Haskell to R and vice versa, and then, there’s the trading app scenario notebook: Suppose you have a trading app written in Haskell – it’s gotta be lightning fast and as bug-free as possible, right?
But, how about nice visualizations, time series diagnostics, all kinds of sophisticated statistical and machine learning algorithms… Chances are, someone’s implemented that algorithm in R, already! Let’s take ARIMA – one line of code with R.J. Hyndman’s auto.arima package! Visualization? ggplot2, of course! And last not least, an easy way to do deep learning with R’s keras package (interfacing to Python Keras).

Besides the notebooks, you might also want to check out the slides, especially if you’re an R user who hasn’t had much contact with Haskell. Ever wondered why the pipe looks the way it looks, or what the partial and compose functions are doing?

Last not least, a thousand thanks to the guys over at Tweag I/O, who’ve been incredibly helpful in getting the whole setup to run (the best way to get it up and running on Fedora is using nix, which I didn’t have any prior experience with… just at a second level of parentheses, I think I’d like to know more about nix, the package manager and the OS, now too ;-)). This is really the great thing about open source, the cool stuff people build and how helpful they are! So thanks again, guys – I hope to be doing things “at the interface” of ML/DL and FP more often in the future!

Update:
The talk was recorded, and can be viewed here.

Time series prediction – with deep learning

More and more often, and in more and more different areas, deep learning is making its appearance in the world around us.
Many small and medium businesses, however, will probably still think – Deep Learning, that’s for Google, Facebook & co., for the guys with big data and even bigger computing power (barely resisting the temptation to write “yuge power” here).

Partly this may be true. Certainly when it comes to running through immense permutations of hyperparameter settings. The question however is if we can’t obtain good results in more usual dimensions, too – in areas where traditional methods of data science / machine learning prevail. Prevail, as of today, that is.

One such area is time series prediction, with ARIMA & co. top on the leader board. Can deep learning be a serious competitor here? In what cases? Why? Exploring this is like starting out on an unknown road, fascinated by the magical things that may await us 😉
In any case, I’ve started walking down the road (not running!), in a rather take-your-time-and-explore-the-surroundings way. That means there’s much still to come, and it’s really just a beginning.

Here, anyway, is the travel report – the presentation slides, I mean: best viewed on RPubs, as RMarkdown on github, or downloadable as pdf).
Enjoy!

Deep Learning in Action (the less mathy version, this time)

On Tuesday at Hochschule München, Fakultät für Informatik and Mathematik I again gave a guest lecture on Deep Learning (RPubs, github, pdf). This time, it was more about applications than about matrices, more about general understanding than about architecture, and just in general about getting a feel what deep learning is used for and why. (Deep reinforcement learning also made a short appearance in there. Reinforcement learning certainly is another topic to post and/or present about, another time…)

I’ve used a lot of different sources, so I’ve put them all at the end, to make the presentation more readable. (Not only have I used lots of different sources, I’ve also used a few sources a lot. In deep learning, I find myself citing the same sources over and over – be it for the concise explanations, the great visualizations, or the inspiring ideas. Mainly thinking of Chris Olah’s and Andrey Karpathy’s blogs here, of the Deep Learning book, and of several Stanford lecture notes.)

One thing that always gets lost when you publish a presentation are the demos. In this case, I had three demos:

The first two are great sites that allow you to demonstrate the very basics of neural networks directly in the browser: When do you need hidden layers? What role does the form of the dataset play? In what cases can adding a single neuron make a difference between failing at, or successfully solving, a task?
The third demo is just – I think – totally fun: Would you have known that you can play around with your own convolution kernels, just like that, in GIMP? 😉

R 4 hackers

Yesterday at Trivadis Tech Event, I talked about R for Hackers. It was the first session slot on Sunday morning, it was a crazy, nerdy topic, and yet there were, like, 30 people attending! An emphatic thank you to everyone who came!

R a crazy, nerdy topic, – why that, you’ll be asking? What’s so nerdy about using R?
Well, it was about R. But it was neither an introduction (“how to get things done quickly with R”), nor was it even about data science. True, you do get things done super efficiently with R, and true, R is great for data science – but this time, it really was about R as a language!

Because as a language, too, R is cool. In contrast to most object oriented languages, it (at least in it’s most widely used version, S3) uses generic OO, not message-passing OO (ok, I don’t know if this is cool, but it’s really instructive to see how little you need to implement an OO system!).

What definitely is cool though is how R is, quite a bit, a functional programming language! Even using base R, you can write in a functional style, and then there’s Hadley Wickham’s purrr that implements things like function composition or partial application.

Finally, the talk goes into base object internals – closures, builtins, specials… and it ends with a promise … 😉
So, here’s the talk: rpubs, pdf, github. Enjoy!

Deep Learning in Action

On Wednesday at Hochschule München, Fakultät für Informatik and Mathematik I presented about Deep Learning (nbviewer, github, pdf).

Mainly concepts (what’s “deep” in Deep Learning, backpropagation, how to optimize …) and architectures (Multi-Layer Perceptron, Convolutional Neural Network, Recurrent Neural Network), but also demos and code examples (mainly using TensorFlow).

It was/is a lot material to cover in 90 minutes, and conceptual understanding / developing intuition was the main point. Of course, there is great online material to make use of, and you’ll see my preferences in the cited sources ;-).

Next year, having covered the basics, I hope to be developing use cases and practical applications showing applicability of Deep Learning even in non-Google-size (resp: Facebook, Baidu, Apple…) environments.
Stay tuned!

Analyse de sentiments de critiques cinématographiques – version française

This is just an ultra short post saying that last Tuesday, I had the honor of presenting my “Sentiment Analysis of Movie Reviews” talk at Swiss Data Forum – in French 😉 Thanks again guys for having me, and for your patience 🙂

So here’s a link to the French version of the talk – toute la magie de word2vec et doc2vec, en français:-) Enjoy!

Sentiment Analysis of Movie Reviews (3): doc2vec

This is the last – for now – installment of my mini-series on sentiment analysis of the Stanford collection of IMDB reviews.
So far, we’ve had a look at classical bag-of-words models and word vectors (word2vec).
We saw that from the classifiers used, logistic regression performed best, be it in combination with bag-of-words or word2vec.
We also saw that while the word2vec model did in fact model semantic dimensions, it was less successful for classification than bag-of-words, and we explained that by the averaging of word vectors we had to perform to obtain input features on review (not word) level.
So the question now is: How would distributed representations perform if we did not have to throw away information by averaging word vectors?

Document vectors: doc2vec

Shortly after word2vec, Le and Mikolov developed paragraph (document) vector models.
The basic models are

  • Distributed Memory Model of Paragraph Vectors (PV-DM) and
  • Distributed Bag of Words (PV-DBOW)

In PV-DM, in addition to the word vectors, there is a paragraph vector that keeps track of the whole document:

Fig.1: Distributed Memory Model of Paragraph Vectors (PV-DM) (from: Distributed Representations of Sentences and Documents)

With distributed bag-of-words (PV-DBOW), there even aren’t any word vectors, there’s just a paragraph vector trained to predict the context:

Fig.2: Distributed Bag of Words (PV-DBOW) (from: Distributed Representations of Sentences and Documents)

Like word2vec, doc2vec in Python is provided by the gensim library. Please see the gensim doc2vec tutorial for example usage and configuration.

doc2vec: performance on sentiment analysis task

I’ve trained 3 models, with parameter settings as in the above-mentioned doc2vec tutorial: 2 distributed memory models (with word & paragraph vectors averaged or concatenated, respectively), and one distributed bag-of-words model. Here, without further ado, are the results. I’m just referring results for logistic regression as again, this was the best-performing classifier:

test vectors inferred test vectors from model
Distributed memory, vectors averaged (dm/m) 0.81 0.87
Distributed memory, vectors concatenated (dm/c) 0.80 0.82
Distributed bag of words (dbow) 0.90 0.90

Hoorah! We’ve finally beaten bag-of-words … but only by a tiny little 0.1 percent, and we won’t even ask if that’s significant 😉
What should we conclude from that? In my opinion, there’s no reason to be sarcastic here (even if you might have thought I’d made it sound like that ;-)). With doc2vec, we’ve (at least) reached bag-of-words performance for classification, plus we now have semantic dimensions at our disposal. Speaking of which – let’s check what doc2vec thinks is similar to awesome/awful. Will the results be equivalent to those had with word2vec?
These are the words found most similar to awesome (note: the model we’re asking this question isn’t the one that performed best with Logistic Regression (PV-DBOW), as distributed bag-of-words doesn’t train word vectors, – this is instead obtained from the best-performing PV-DMM model):

model.most_similar('awesome', topn=10)

[(u'incredible', 0.9011116027832031),
(u'excellent', 0.8860622644424438),
(u'outstanding', 0.8797732591629028),
(u'exceptional', 0.8539372682571411),
(u'awful', 0.8104138970375061),
(u'astounding', 0.7750493884086609),
(u'alright', 0.7587056159973145),
(u'astonishing', 0.7556235790252686),
(u'extraordinary', 0.743841290473938)]

So, what we see is very similar to the output of word2vec – including the inclusion of awful. Same for what’s judged similar to awful:

model.most_similar('awful', topn=10)

[(u'abysmal', 0.8371909856796265),
(u'appalling', 0.8327066898345947),
(u'atrocious', 0.8309577703475952),
(u'horrible', 0.8192445039749146),
(u'terrible', 0.8124841451644897),
(u'awesome', 0.8104138970375061),
(u'dreadful', 0.8072893023490906),
(u'horrendous', 0.7981990575790405),
(u'amazing', 0.7926105260848999), 
(u'incredible', 0.7852109670639038)]

To sum up – for now – we’ve explored how three models: bag-of-words, word2vec, and doc2vec – perform on sentiment analysis of IMDB movie reviews, in combination with different classifiers the most successful of which was logistic regression. Very similar (around 10% error rate) performance was reached by bag-of-words and doc2vec.
From this you may of course conclude that as of today, there’s no reason not to stick with the straightforward bag-of-words approach.But you could also view this differently. While word2vec appeared in 2013, it was succeeded by doc2vec already in 2014. Now it’s 2016, and things have happened in the meantime, are happening at present, right now. It’s a fascinating field, and even if – in sentiment analysis – we don’t see impressive output yet, impressive output is quite likely to appear sooner or later. I’m curious what we’re going to see!