Greenplum With Informatica


Currently the Greenplum 4.2 version is supported with Informatica 9.1 Hot Fix 6 and Informatica 9.5 Hot Fix 2. There are two ways to connect to Greenplum from Informatica
  • First one is through ODBC drivers available in informatica ODBC setup. For mass loading performance wise ODBC is slow for loading the data of around 5GB , GP load utility will take 8 minutes and ODBC will take 22 minutes.
  • Second option would be to connect Natively by purchasing the license for Greenplum pwx and Gpload utility for mass loading.
- Install the gpload utility on Informatica server which would take hardly 50 MB of Space.
- Install the Greenplum pwx on Informatica server it doesn't hold much memory as two files will be kept in /server/bin directory which will be used for registery.

GP load utility installation :

Create a directory called GPLOADER
/NAS/Informatica/GPLOADER/
Place the gpload software dump  greenplum-loaders-4.2.2.0-build-5-RHEL5-x86_64.zip in above directory.

Unzip the greenplum-loaders-4.2.2.0-build-5-RHEL5-x86_64.zip

-rwxr-xr-x 1 informat informat 14484741 Jun 15  2012 greenplum-loaders-4.2.2.0-build-5-RHEL5-x86_64.bin
-rw-r--r-- 1 informat informat  1277122 Sep  6 23:40 gpsd_tbwgr_20130904.sql.gz
-rw-r--r-- 1 informat informat 14324949 Sep  6 23:41 greenplum-loaders-4.2.2.0-build-5-RHEL5-x86_64.zip

run the below binary file
./greenplum-loaders-4.2.2.0-build-5-RHEL5-x86_64.bin

Give the installation path as /NAS/Informatica/GPLOADER/
Which will intstall the gpload utility by creating the below files and directories in the below path

/NAS/Informatica/GPLOADER/

ls -ltr

drwxr-xr-x 2 informat informat   1024 Jun 15  2012 lib
drwxr-xr-x 3 informat informat     96 Jun 15  2012 ext
drwxr-xr-x 3 informat informat     96 Jun 15  2012 bin
-r--r--r-- 1    informat informat    186602 Jun 15  2012 LICENSE.thirdparty
-r--r--r-- 1     informat informat    50349 Jun 15  2012 GPDB-LICENSE.txt
-r--r--r-- 1     informat informat   1081 Sep  9 10:30 greenplum_loaders_path.sh

Take the backup of the .profile file and edit the .profile file and add the below entries.

#GPLOAD
GPHOME_LOADERS=/NAS/Informatica/GPLOADER/
export GPHOME_LOADERS
PYTHONPATH=$GPHOME_LOADERS/bin/ext
export PYTHONPATH

PATH=$GPHOME_LOADERS/bin:$GPHOME_LOADERS/ext/python/bin:$PATH

LD_LIBRARY_PATH =$GPHOME_LOADERS/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

Run the .profile file.

. .profile
 
To Install the Greenplum PWX

Create a directory /NAS/Informatica/PWXGP3012
Place the Greenplum3012.zip file in the above directory.
unzip Greenplum3012.zip

Go to the directory 3.0.1.2 created by system
cd /NAS/Informatica/PWXGP3012/3.0.1.2/

You will find the below files select the file based on your Operating system

ls -ltr 

-rw-r--r-- 1 informat informat  103770523 Apr 22 21:57 3012_Greenplum_for_PWX_Installer_aix-ppc64.zip
-rw-r--r-- 1 informat informat  46936988 Apr 22 21:57 3012_Greenplum_for_PWX_Installer_linux-x64.zip
-rw-r--r-- 1 informat informat  82487341 Apr 22 21:58 3012_Greenplum_for_PWX_Installer_solaris-sp64.zip
-rw-r--r-- 1 informat informat  37353912 Apr 22 21:58 3012_Greenplum_for_PWX_Installer_winem-64t.zip

unzip 3012_Greenplum_for_PWX_Installer_linux-x64.zip

$ ls -ltr
total 264213

drwxr-xr-x 2 informat informat      1024 Dec 17  2012 properties
-rw-r--r-- 1    informat informat       107 Dec 17  2012 install.sh
drwxr-xr-x 4 informat informat        96 Dec 17  2012 Server
drwxr-xr-x 2 informat informat      1024 Dec 17  2012 Messages
-rw-r--r-- 1    informat informat 103770523 Apr 22 21:57 3012_Greenplum_for_PWX_Installer_aix-ppc64.zip
-rw-r--r-- 1    informat informat  46936988 Apr 22 21:57 3012_Greenplum_for_PWX_Installer_linux-x64.zip
-rw-r--r-- 1    informat informat  82487341 Apr 22 21:58 3012_Greenplum_for_PWX_Installer_solaris-sp64.zip
-rw-r--r-- 1   informat informat  37353912 Apr 22 21:58 3012_Greenplum_for_PWX_Installer_winem-64t.zip
drwxr-xr-x 3   informat informat        96 Apr 23 10:26 source

unzip greenplum-pwx-3.0.1.0-build-3-RHEL5-x86_64.bin

In the middle it will ask for the Informatica Installation directory, Please provide the $INF_HOME absolute path.
  • Once the installation completes please confirm whether below files have successfully  
  1. $INFA_HOME/server/bin/pmgpwriter.so
  2. $INFA_HOME/server/bin/Plugin/GreenplumConnector.xml 
  • Once you verify the above files next is to register the greenplum plugin to repository.
  1. Connect to the repository using the pmrep connect command 
  2.  Register the plug in using the  pmrep registerplugin
 pmrep -r Reposiory_name -d domain_name -n user_name -x password

 pmrep registerplugin –I $INFA_HOME/server/bin/Plugin/GreenplumConnector.xml -e –N

 Go to Workflow Manager Relational Connection search for Greenplum Connection and create the new connection.










20 comments:

  1. Is that codepage relevant to the connection? I'm using this plugin and running into file encoding errors with the GPFDIST. Have you seen a way to ensure that the file sent to GPLOAD is encoded with UTF8?

    ReplyDelete
  2. Certainly the comments will be answered for authorized email ids ...request you to please comment with your email id

    ReplyDelete
  3. Hi !
    I am trying to make this configuration but using Windows Server 2008 and Windows 7 x64 environment for Informatica (not Linux) while Greenplum database is on CentoOS.
    Can I get help about how doing it please ?

    ReplyDelete
  4. Install "PowerExchange for Greenplum For Windows" on Informatica Windows server and import the tables. I will send you the detailed document to your email.

    ReplyDelete
  5. Thank you ! I will be waiting for the documents :)

    ReplyDelete
  6. I installed "PowerExchange for Greenplum For Windows" on Informatica Windows server and imported the tables but I couldn't connect the workflow later to load data to Greenplum database. I couldn't get this :
    http://4.bp.blogspot.com/-9HwV8BPRhdU/UjHBKXlN65I/AAAAAAAAAIk/how3qJxBk9I/s1600/A.png

    ReplyDelete
  7. What is the error your getting and I have sent you the document pls check it.

    ReplyDelete
  8. Thank you for the document, I saw the email and I will get back to you. The last error that I am getting after installing GP Load utility and PWX is :
    31012 [error] the gphome_loaders environment variable is not set
    Thanks,

    ReplyDelete
  9. Once you install the GP load utility add the below environment variable.
    GPHOME_LOADERS=/NAS/Informatica/GPLOADER/.
    Please follow the below steps for adding the ENV variable in windows machine.

    1 ) From the Desktop, right-click My Computer and click Properties.
    2) Click Advanced System Settings link in the left column.
    3) In the System Properties window click the Environment Variables button.

    ReplyDelete
  10. Hi Shivakumar !
    I am still having the same problem !
    31012 [ERROR] The GPHOME_LOADERS environment variable is not set.

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. I have these errors occurring during the workflow execution from down to top :

    1/GPWRT_31012 : [ERROR] The GPHOME_LOADERS environment variable is not set.
    2/Error initializing DTM for session [s_map_gp_test].
    3/Plug-in #431050's method PPluginDriver::init() failed.
    4/Error occurred during [initializing] writer plug-in #431050.

    ReplyDelete
  13. Hi Latif

    Can you confirm did you install the GP Load utility ?

    GPHOME_LOADERS Set to the installation directory of the Greenplum loaders. For example:
    C:\ProgramFiles\Greenplum\greenplum-loaders--build-\

    Many Thanks

    Shivakumar

    ReplyDelete
  14. Hi Shivakumar !
    I installed the GP Load utility under C:\Informatica\GPLOAD\ and in the GPHOME_LOADERS it's set like that path.

    Best Regards,

    Latif

    ReplyDelete
  15. Hi Shivakumar,

    Is PWX useful to load data from a Greenplum source or it's same to use a PostgreSQL ODBC connector ?

    Thanks.

    ReplyDelete
  16. PWX needs to be used either it’s source or target as Greenplum, Power Exchange helps to integrate with Greenplum server (Most of the time Power Exchange is a message oriented middle ware), Whereas GOLOAD helps to load the data to the tables based on INSERT, UPSERT etc.

    ReplyDelete
  17. Hi Shiva,

    we have GP comnectors and working fine, but recently we implemented CITRIX environment to connect informatica clients. But already existed jobs with GP connections are working fine, but users are trying to create new connections from Desinger tool to pull the tables. But they are unable to find the GP related drivers(while creating relation connection from Designer). i am thinking that we have to install GP client on citrix server right?
    or is there any other steps to see the GP related drivers on desinger tool?

    Thanks & Regards
    Bala

    ReplyDelete
  18. Schema definition in connection is not overriding when schema is not public it always take public as schema and if schema is different its throws a error how to avoid that.

    ReplyDelete
  19. Hi Shiva,

    I am not able to see Greenplum connection from the session-source connection. However I can see as target connection when I select greenplum writer. Why am I not able to see in the source connections?

    Sudheer Balleda.

    ReplyDelete

Featured Post

Cloud : A Trendsetter Technology