Big data benchmark : Impala vs Hawq vs Hive

September 4th, 2013 No comments

With the recent release of Pivotal HD, I wanted to check the current state of Hadoop SQL engines. SQL integration is growing in the Hadoop landscape and it’s a good thing for productivity and integration. Despite old and bloated, SQL has still no competition when it comes to data manipulation. I’ve downloaded the latest single node VM from Cloudera and Pivotal and made some tests. Results were quite interesting.

Read more…

Categories: News, Thoughts Tags: , , ,

Book Review : Assessing and Improving Prediction and Classification

In the age of big data, predictive analytics is all around us. Probably you have heard about the Netflix Prize, awarding  US$1,000,000 to anyone capable of beating their recommendation engine by 10%. The winning algorithm was based on the the combination of many classifiers. Assessing and Improving Prediction and Classification basically explains you how to built an even better algorithm. It is specialized in the classifier assessment and how to combine many classifiers (or train many versions of it).

Read more…

Categories: Book review Tags:

Hadoop landscape review 2013

April 25th, 2013 No comments

I’ve spent some time lately to dig into the Hadoop ecosystem both from a product survey and some hands on. Here is some remarks about the state of Hadoop in April 2013. I’ve played with Greenplum HD 1.2 and CDH4.2 and read a lot of stuff about Hadoop and peripherical products.

Read more…

Categories: Tools Tags: ,

Hadoop is dead thanks to EMC, long live to Hadoop

February 25th, 2013 1 comment


Today EMC announced the launch of Pivotal HD, their new version of Greenplum HD. Most of the underlying detail are not know and neither the pricing, but it’s really a game changer. Technologically it’s just Greenplum DB on top of HDFS. In fact we can say it’s just Greenplum DB. It feature an awesome integration with HDFS but that’s it. I believe Greenplum DB will be faster than Pivotal HD on a traditional data warehouse workload. Nothing new here then. For Hadoop, it’s a new day.

Read more…

Categories: News Tags:

Price deal analysis

February 16th, 2013 1 comment


Price deal is a common method to boost sales. Steam push it to an art. There is always a lot of stuff to buy at -50% or even -75%. But is it really so easy? Should you just cut price to increase profitability? How can you really asses the impact of such price deal campaign? Well it’s not that easy and you should rely on analytics to really understand what is happening.

Read more…