Archive

Posts Tagged ‘data manipulation’

Data Manipulation Part 2 : ETL

September 2nd, 2011 No comments

dataminingMy last post discuss about SQL queries. Nevertheless, sometimes data came from differents databases. In such cases, it is no longer possible to use SQL. ETL tools, which stands for Extract Transform Load are designed to easilly allow data transformations. I have currently used three tools : Talend, SAP Business Objects Data Integrator and Kettle. I will review them and explain one or two tips I’ve learned using ETL tools.

Read more…

Data Manipulation Part 1 : SQL

dataminingData manipulation is a big part of a data mining process. Some authors claims it could take 80% of a data mining project. I could only agree. If data comes from the data warehouse it could be a lot faster. If you have to dig (and understand) operational systems or  adding some externals data the works takes even more time. Therefore it is of greatest importance to be efficient in data manipulation. Currently I use two way to do this task : big SQL queries or ETL depending on the situation.

Read more…