This weekend was the 2nd matchday of the German Bundesliga and of course I predicted another nine games. Unfortunately, my predictions were not that successful so far. Last weekend I predicted two of nine winners right, this weekend three out of nine. Maybe I have to tune my algorithm better or do some more data work.

I use a Support Vector Machine (SVM) for my predictions. I simply compared three algorithms: SVM, Naïve Bayes and k-nearest neighbor (knn) with seasons 2000/01 until 2010/11 as training data and season 2011/12 as test data. I took all available data into the comparism: teams, matchday, season weight, playing home or away, wins/ties/loses, percentage of all possible points and wins. In the train data the last values were taken across the whole season (as a kind of general performance), whereas in the train data these values were calculated matchday by matchday, because I have to handle the data this way in the actual season, too.

Comparing the three algorithms in general show that SVMs predicts best:

Algorithm games predicted right of 306 games
Support Vector Machines (SVM) 150 49%
Naïve Bayes 130 42,5%
k-nearest-neighbor (knn) 90 29,4%

But there are significant differences comparing the effectiveness of predicting wins, ties, or loses:

Comparism Bar Plot

The graphics shows how many wins, lied or loses were correctly predicted by each of the algorithms. One should keep in mind that I am predicting the games from the home team’s view – that’s why loses are separated. Of all 306 games that were predicted, the home teams actually won 139 and lost 88, 79 games ended in a draw.
The SVM works very well in predicting wins, but really lacks in predicting a tie game. It does predict more loses false that right, but compared to the other two algorithms, it still works best. The other two are especially better for predicting ties and Naïve Bayes is also quite good in predicting wins. If we take a look on the performance during the whole season, it becomes clear that the SVM really is superior:

Comparism Time Plot

The mean indicates how many games were predicted right per matchday in percentage. There is a steady up and down within all three algorithms (thin lines), but the smoothed mean (thick lines) show that SVM and Naïve Bayes are increasing during the first half of the season. In the 2nd half, Naïve Bayes is dropping it’s successful predictions whereas SVM is still predicting on a level of about 0.5. And knn? It does not perform very well, even worse, it’s successful predictions are decreasing more or less during the whole season. As a result, I will be working with SVM. The right prediction rate is still too low. I will do some more data work within the next week.