Hortonworks Data Platform and Hortonworks Data Flow integration Use Case discussion: Athens,Greece


This week in Athens Greece, I was able to discuss solutions for client use cases to integrate Hortonworks HDP and HDF platforms. In addition to use cases I was able to go over fundamentals in Hadoop administration, Spark and Nifi. I want to share the specific use cases discussed and solutions.

Use Cases

1.Retrieving log data from multiple source servers and ingesting data into Hive tables for data analysis.

2.Offloading EDW data from a source database to Hive for data analysis.

3. Retrieving XML files from source servers and ingesting the data into Hive for data analysis.

Solutions

1a. We went over a solution using MiNiFi agents to install on servers and polling log data to a Nifi Data Flow. Within the data flow use route on content processor to route the correct logs to be put in HDFS for Hive.

1b. We discussed having the applications on the servers publish log messages directly to a Kafka topic for Nifi to load into HDFS.

2a. We reviewed using Apache sqoop to execute a map reduce job to load data into HDFS and than executing a batch process to filter data for Hive internally managed tables.

2b. To use NiFi database controller services to connect to source database to retrieve data and put into HDFS.

3a. Convert XML files to avro format and than ingest the data into HDFS. Run a process to migrate data to Hive tables in orc format.

3b. We reviewed using NiFI processors to read XML and load data in addition to defining a schema and loading the data in HDFS using Spark.


I want to thank everyone for a wonderful experience. I really enjoyed my time in Greece and met some wonderful people!!!

Marios Kogias

Senior Customer Support Engineer at Azul Systems

6 年

I enjoyed the training. The trainer was full of real world experiences to share.

回复

要查看或添加评论,请登录

Damien Edwards的更多文章

社区洞察

其他会员也浏览了