6 Responsibilities of a Data Engineer
Introduction
Data engineering is a relatively new field, and as such, there is a huge variance in the actual job responsibilities across different companies. If you are a student, analyst, engineer, or new to the data space and
Unclear with data engineers’ job responsibilities
Believe that the current state of a data engineer’s job description is messy
Then this Article is for you. In this post, we cover the 6 key responsibilities of a data engineer
Responsibilities of a Data Engineer
1. Move data between systems
This represents the main responsibility of a data engineer. It usually involves
Common tools/frameworks: Pandas, Spark, Dask, Flink, Beam, Debezium, Kafka, Docker, Kubernetes
2. Manage data warehouse
More often than not, most of the company’s data lands within the data warehouse. The responsibilities of a data engineer in this context are
Common modeling techniques: Kimball modeling, Data Vault, Data Lake
Common frameworks: Great expectations, dbt for data quality
Common warehouses: Snowflake, Redshift, Bigquery, Clickhouse, Postgres
3. Schedule, execute, and monitor data pipelines
Data engineers are also responsible for scheduling the ETL pipelines, making sure they run without any issue, and monitoring them.
Common frameworks: Airflow, dbt, Prefect, Dagster, AWS Glue, AWS Lambda, Streaming pipeline using Flink/Spark/Beam
Common databases: MySQL, Postgres, Elastic search and data warehouses
Common storage systems: AWS S3, GCP cloud store
Common monitoring systems: Datadog, Newrelic
4. Serve data to the end-users
Once you have the data available in your data warehouse, it’s time to serve it to the end-user. The end-user can be analysts, an application, external clients, etc. Depending on the end-user you may have to set up
Common tools/languages: Looker, Tableau, Metabase, Superset, role-based permissions(for your system), Python/Scala/Java/Go for API endpoints, pipeline tools for client data dumps
5. Data strategy for the company
Data engineers are involved in coming up with the data strategy for the company. This involves
Common tools/frameworks: Confluence, google docs, RFC documents, brainstormings, meetings
6. Deploy ML models to production
Data scientists and analysts develop sophisticated models that closely model the working of a specific business process. When it’s time to deploy these models, data engineers are usually the ones who optimize them to be used in a production environment.
Common frameworks: Seldon core, AWS MLOps
Ingeniero de datos | AWS User Group Perú - Arequipa | AWS x3
1 年Me gustó tu artículo, felicidades ??; mientras una organización evoluciona algunos puntos de responsabilidad pueden separar del puesto en el tiempo, pero esto refleja la relación "evolutiva" del puesto hasta cierto grado.