
Programming Collective Intelligence is a great book. It covers most of the existing data mining algorithms and presents many applications for them. It covers clustering (k-means, hierarchical), supervised classification (k-nearest neighbours, Naïve Bayes, decision trees, SVM), data analysis (non negative matrix factorization), optimisation (hill climbing, simulated annealing and genetic algorithms) and end with genetic programming. Along the way, it present application like spam detection, pricing, recommendation, … If you want to start in data mining this is a very good way. 0
Read more…

Actionable Web Analytics : Using Data to Make Smart Business Decisions is a marketing book like Competing on Analytics. A lot of sentences to say simple things. Here there is even sometimes copy and pastes. Nevertheless, this kind of book are sometimes interesting. This one is to some extends.
Read more…

Now it’s time to create some clusters from our twitter data. In this post, we focus only on biographical tags and we use the old kmeans algorithms in order to find significant clusters. At least we hope so.
Read more…
Last book I read was Collective Intelligence in Action from Satman Alag (ed. Manning). It covers data mining from a web 2.0 related view. Data is generated by users in many form (ratings, tags, blogs, web pages, …). Such data are not well defined. An user can create a new tag like gloupy without giving you the meaning. There is also some text mining issues. How to understand the meaning of a sentences?
The book is divided in three parts. First (half of the book) describe data and more especially how to get them (web crawling, blog trackers). The second part is about exploiting the data, i.e. data mining (clustering and prediction). There is also a chapter on converting text into tokens. The last part is on examples of applications. Making an intelligent search engine or a recommendation engine (with an interesting discussion on Amazon, Google News and Netflix solutions).
Being based on Java code, it relies upon some libraries like Nutch for web crawling, Lucene for text handling and Weka for the data mining. I think there is too much java code in the book. Indeed, it’s boring an you skip easily some pages. For instance, the book use kmeans with self made code, Weka code and JDM (an data mining java api) code. It seems quite useless to see three times the same thing.
Nevertheless, I have found this book very interesting and a very good introduction to web mining, an area where I have little knowledge of.