Informatica BDE - ETL on Hadoop Infrastructure
Jagpreet Kaur
Business Technology Manager at ZS | Passionate about #healthcareconsulting #pharmaconsulting #businessanalytics #customerengagement
BDE or Informatica Big Data Edition is a widely used product from Informatica that can be used as an ETL tool to work in Hadoop environment along with traditional Relational Database tools. Talend, Pentaho, etc are other similar tools that have the capacity to integrate with Hadoop systems yet Informatica is one of the leading ETL tool vendor. Informatica BDE can be found in Informatica 9.6 versions onwards and has become one of the leading tool considering its parent company.
With coming up of Hadoop and huge datasets that substantially support underlying AI and ML modules, the need to store data worth petabytes of capacity and building an ETL tools that can leverage with Hadoop became need of the hour. It requires substantial coding knowledge to work directly with Hadoop and build Map Reduce Jobs that splits the input data-set into independent smaller chunks which are processed by the map tasks in a completely parallel manner.
Hadoop tools such as HIVE made it easier to write SQL queries over Hive database. Hence, a number of organizations started using HIVE as a data warehouse tool for storing data in Hadoop. The job for Extracting , transforming, and loading the data in Hadoop is done by Informatica BDE.
Informatica BDE can run on either of two modes i.e. native mode and hive mode. In native mode, the system runs as a normal PC mapping but in hive mode you can push down the whole mapping logic to HIVE and make it run on the Hadoop Cluster there by absorbing the parallelism provided by Hadoop.
With Informatica BDE you can do the following ETL tasks at a very high performance & speed :
- Makes it easy to create connection to all the different sources and integrate data from those sources. Basically it makes the process easier to ingest complex files such as XML, Cobol, AVRO,JSON etc.
- Extract , transform, load between traditional RDBMS or HIVE Source & Targets.
- Push the whole ETL logic to hadoop cluster and make use of the MapReduce framework. Basically it makes building hadoop jobs easier.
Informatica BDE uses the Informatica developer interface to build the mappings , deploy and create applications. Anyone who has used IDQ before might be familiar with the Informatica developer interface and it has a lot of similarity with the much familiar PC designer IDQ.