Getting Started With Greenplum For Big Data Analytics


I have gone through the bookGetting Started With Greenplum For Big Data Analytics” from  PacktPub.com  http://bit.ly/HYOwrW, This is a fabulous one for those Big Data enthusiastic and those who wants to integrate Data Integration (Data warehouse ) with Big Data using the Greenplum as warehouse by using external ETL Tool Informatica (Power Exchange) to load data to Greenplum. Practical approach of using various Greenplum utilities (gpload (Insert, Update, Merge) – INSERT,COPY) have been clearly explained. Complete overview on “Unified Analytics Platform” and also Physical Architecture of Greenplum clearly discussed. I would recommend to go through this book for those who are enthuse to understand the Big Data Analytics and to integrate the DW with Big Data.  

By the end of reading this book reader would be able to understand.
  • What is Big data ?
  • What is Hadoop, Hive, Pig, Sqoop components.
  • How to query the data stored in HDFS file system and load the data to Greenplum ? ( Data Communication between Hadoop and Greenplum)
  • What is Chorus ? how is it used for integrating the multidimensional data visualization from Tableau software. Capability of the Chorus to grab data from HDFS and also from Greenplum database to create the dashboards in Tableau.
Certainly you will be getting expertise on below tactics once you go through this book.
Greenplum Database Management System : 
  • How to start/stop the Greenplum database instance ?  
  • How to monitor the workload on the Database by using the GUI Greenplum Command Center ?
  • Performance monitoring.
  • Parallel Data loading.
  • Query monitoring on the Greenplum.
 Data Computing Appliance :
  • Compute/Storage/Database/Network Architecture for the overall Greenplum setup.
  • Hardware/System configuration for database on multi node
Optimizing and Querying the Greenplum Database :
  • Various functions used in queries and also how to get the query execution plan using the EXPLAIN and ANALYZE functions
  • Parallel data flow using the Dynamic Pipeline in Greenplum .
  • Greenplum table distribution and partitioning (Colum Oriented or Hash Distribution and Random Distribution).
  • Pushdown Optimization using the ODBC.
Weka : Waikato Environment For Knowledge Analysis. 
  • How to use Weka for Knowledge Analysis, Data Mining and Machine Learning.
  • How it is used for Data processing, Regression, Clustering and classification and also for the data visualization.
MADlib :  
  • Magnetic, Agile and Deep library of scalable, Parallel, Advanced in database functions
  • How it is helpful in-database functions
  •  How to do in-Database Analytics using the MADlib.
R - Programming 
  • What is R- Programming ?
  • How it is used for statistical data analysis and exploration ?
  • How slicing/dicing, data modeling and data visualization is possible using R- Program?
  • The use of R - Programme in Predictive Analysis ?
Text Analysis :
  • What is Text Analysis ?
  • What are the main challenges in Text Analytics ?
  • Techniques involved in text analytics ?
Predictive Analytics :
  • What is the Predictive Analytics and how this can be achieved where it used ?

1 comment:

Featured Post

Cloud : A Trendsetter Technology