Archive

Archive for August, 2009

INFORMS Data Mining Contest Part 1

August 16th, 2009 No comments

tropheA new data mining contest is available here.  The functional domain is medical, more precisely there is two tasks. First, we need to prediction if a given patient will be transferred to another hospital. The second task is to predict if the patient will die (the medical domain definitively lacks of fun). For each task, we give a score from the most probable patient to the least. The dataset contains many challenges. In this post, I propose my personals ideas to handle these challenges.

Read more…

The cost of reducing costs

August 7th, 2009 No comments

Cost killingPredicting the number of sales representatives on a particular time on a particular store is harder than expected. If you instrument the whole process, you could know the activity of your representatives (number of customers, average time of a transaction, activity rate, …). We could then predict the number of required representatives. We know the cost of having set too much of them but what is the cost of having to few representatives? How to value a missed opportunity, a customer unsatisfaction of the quality of service, the behaviour of a too much stressed employee?

Read more…

Categories: Thoughts Tags: , ,

Book review : Programming Collective Intelligence

August 1st, 2009 No comments

Programming Collective Intelligence

Programming Collective Intelligence is a great book. It covers most of the existing data mining algorithms and presents many applications for them.  It covers clustering (k-means, hierarchical), supervised classification (k-nearest neighbours, Naïve Bayes, decision trees, SVM), data analysis (non negative matrix factorization), optimisation (hill climbing, simulated annealing and genetic algorithms) and end with genetic programming. Along the way, it present application like spam detection, pricing, recommendation, … If you want to start in data mining this is a very good way. 0

Read more…