A new data mining contest is available here. The functional domain is medical, more precisely there is two tasks. First, we need to prediction if a given patient will be transferred to another hospital. The second task is to predict if the patient will die (the medical domain definitively lacks of fun). For each task, we give a score from the most probable patient to the least. The dataset contains many challenges. In this post, I propose my personals ideas to handle these challenges.
Read more…
Predicting the number of sales representatives on a particular time on a particular store is harder than expected. If you instrument the whole process, you could know the activity of your representatives (number of customers, average time of a transaction, activity rate, …). We could then predict the number of required representatives. We know the cost of having set too much of them but what is the cost of having to few representatives? How to value a missed opportunity, a customer unsatisfaction of the quality of service, the behaviour of a too much stressed employee?
Read more…

Programming Collective Intelligence is a great book. It covers most of the existing data mining algorithms and presents many applications for them. It covers clustering (k-means, hierarchical), supervised classification (k-nearest neighbours, Naïve Bayes, decision trees, SVM), data analysis (non negative matrix factorization), optimisation (hill climbing, simulated annealing and genetic algorithms) and end with genetic programming. Along the way, it present application like spam detection, pricing, recommendation, … If you want to start in data mining this is a very good way. 0
Read more…