Transform Data Source with AWS Glue: Managed ETL Platform
AWS Glue: A data integration service.
AWS Glue can be used for data enriching, cleansing, normalising, organisation, validation, or formatting purpose for storage within a data lake, data warehouse or databases.
The AWS Glue Jobs acts an orchestrator for ETL workflow. Jobs can be created within in AWS Glue that automates the scripts for ETL tasks. These jobs can be scheduled and chained, or they can be made events driven.
Services of a AWS Glue include following: building event-driven ETL pipelines, as a data-catalog source to find data across multiple data stores, for monitoring ETL jobs even without maintaining code, for data exploration, for building views etc.
Components for AWS Glue Job
Glue Data Catalog: Persistent metadata store that facilitates data exploration around different data stores just like Apache Hive.
Glue Crawlers: Scanners to scan all different types of data in order to classify it, extract schema information from it and store metadata in data catalog to guide ETL operations.
Glue Data Brew: Visual tool for data preparation that makes data cleansing and normalisation operations easier for analysts.
领英推荐
Glue Studio: Graphical interface for creation, running & monitoring ETL jobs in AWS glue.
Glue Elastic View: Builds materialised views for combining or replicating data across multiple data stores.
Benefits of using AWS Glue: It can be used for distributed environments where parallel processing is required to run large workloads faster as compared to any AWS's sister ETL platforms like AWS Lambda which requires more complexity to integrate into data sources. Glue seamlessly facilitates enterprise level data integration providing increased data visibility.
Limitations: AWS Glue comes with certain limitations like it provides support for only two languages - Scala, Python for customising codes and it can be integrated only with platforms within Amazon ecosystem.
PhD Student || IS Research || Website and System Development || Lecture and Training || IT Consultancy
3 年Congratulations