Archive

Posts Tagged ‘data mining’

About evaluation

December 6th, 2009 No comments

evaluationWhen deploying a model, one very important thing is to monitor the results. Does it work like you’ve expected? I’m not talking about pre production tests but following the life of your model. I use two kind of reports to do that : preventive reports and corrective reports.  As you expect the first one is created just after the prediction and the second is created after the consequence of the prediction is known.

Read more…

Categories: Thoughts Tags:

Machine learning vs simulation

October 29th, 2009 2 comments

dataminingLately, I was thinking on the difference between machine learning and simulation (for prediction).  Machine learning use historical inputs and outputs to find subsequent outputs.  Simulation, on the other side, asses you get the knowledge, i.e. the underlying model so you don’t need historical data to learn it.  Sometimes you can use both methods to know something, sometimes only one method is available. After thinking about it, I find than the distinction between them is thinner that I thought.

Read more…

Categories: Thoughts Tags: ,

INFORMS Data Mining Contest Part 1

August 16th, 2009 No comments

tropheA new data mining contest is available here.  The functional domain is medical, more precisely there is two tasks. First, we need to prediction if a given patient will be transferred to another hospital. The second task is to predict if the patient will die (the medical domain definitively lacks of fun). For each task, we give a score from the most probable patient to the least. The dataset contains many challenges. In this post, I propose my personals ideas to handle these challenges.

Read more…

Book review : Programming Collective Intelligence

August 1st, 2009 No comments

Programming Collective Intelligence

Programming Collective Intelligence is a great book. It covers most of the existing data mining algorithms and presents many applications for them.  It covers clustering (k-means, hierarchical), supervised classification (k-nearest neighbours, Naïve Bayes, decision trees, SVM), data analysis (non negative matrix factorization), optimisation (hill climbing, simulated annealing and genetic algorithms) and end with genetic programming. Along the way, it present application like spam detection, pricing, recommendation, … If you want to start in data mining this is a very good way. 0

Read more…

How to : What to do when your model fails?

dataminingSometimes (well most of the time) using your favorite data mining methods and the more obvious attributes are not good enough. What to do then? An usual idea is to use every other models your software provides and/or add every attributes you could think of whatever their relation to your problem. In this post, I will try to elaborate a kind of “how to” for this case.

Step 1 : What is my model?

If your model is a neural network, it’s quite hard to get any insight of how it works by looking at the weights or neural functions. How could you improve something you don’t understand?

Read more…

Categories: How To Tags: ,