登录查看更多内容

Coding

Vincent Rainardi

Data Architect & Data Engineer

发布日期: 2024年9月29日

For many years, my job was to design a data platform. Either architecting the platform, designing the data model, or writing the ETL specification. Or designing the report and dashboard. But in the last one year, in addition to designing I’m also doing coding.

I love coding. When I was doing my masters in ML a few years ago, I enjoyed coding so much. It was in Python. I started my career as a developer. Microsoft Access, FoxPro, Cobol, C++, Javascript, Java, C#.NET, Easytrieve, and of course SQL. I then did MS BI for a long time, coding in SSIS, Reporting Services and MDX for SSAS, as well as in SQL Server. Then I moved to Oracle, doing PL/SQL, IBM MQ, and Teradata, doing BTEQ scripting as well as TCRM and Teradata SQL.

When I was in BI I did Cognos, Tableau, ProClarity, Power BI, Business Objects and Qlik. Oh and Tibco Spotfire too. I picked up different BI tools quite easily, including the scripting language. Spotfire for example, I used JavaScript and Python. But I also coded ETL, i.e. using Informatica PowerCenter, IBM Datastage, Oracle Data Integrator, Composite, Hadoop, Hive, SSIS, ADF, PySpark and now dbt.

When coding, I usually get it clear first in my head about the functionality that I’m about to build. For example, it could be about transforming data. I would find out in detail, what the expected output is, what the inputs are, and what transformations are required. I usually open a Notepad++ and write down in my own words: 1. What I am about to build. 2. The steps to build it. 3. The expected outcomes.

I visualise it until the process is really clear in my mind. As an example, say I was tasked with building a data transformation, which builds client profiles. The output is many thousand clients, complete with all their attributes such as name, date of birth, address, and many others. It would take a few steps to build it, including defining some reusable subroutines. I would visualise it until I was very clear what the key is, how the key should be constructed, and which attributes can be multi-values. For example, a client can have multiple addresses, multiple mobile numbers, and multiple middle names but only one date of birth, one title and one first name. Which multi-values attributes go to which tables, and how those tables are linked together. And also be clear about the data types too. For each column. In each table.

It is also important to be clear about the strategy how I will be building it. For example, if there are several client types, will they be built using a single module with lots of branching inside it, or one module for each client type and we union them at the end. I spend time thinking about the strategy, because it will affect future development. And it will affect testing too. For example, if it is a single module with lots of branching inside it, every time we add a client type testers will need to do regression tests on the client types that we have delivered previously.

And the way I work is I build it piece by piece, module by module. Whether I use Informatica PowerCenter, Azure Data Factory or Power BI, I build it piece by piece. And every time I complete a piece, I test it. So that I know that piece produces the right output. It is a lot easier to test one small piece of code, rather than a big one. And I build it in modular way, the exact opposite of monolithic. Not only modular is easier to build, they are easier to test and easier to debug too.

In terms of time percentage, I probably spend about 20% planning the details, 60% building it, 10% testing and 10% releasing it. But it varies, some application requires more unit tests built in, so it could be 20% planning, 50% building, 20% testing and 10% release process.

People have different style of coding, and in this day and age we code within a team of people. We don’t code alone. So we must agree in advance the style of coding, such as naming convention (columns, variables, table/views), indentation, CTE and subqueries, etc. If I see a piece of code breaking that convention, my head can become itchy. For example:

领英推荐

How To Become an ETL Developer

VaporVM 2 年前

Exusia Hiring Alerts - Ab Initio Experts !!

Sneha Sapkal 1 年前

ETL or ELT

Vincent Rainardi 3 个月前

the convention for indentation is 4 spaces, and someone put 2 spaces
the convention is comma in the front, and someone put the comma at the back
the convention for column name is camel case with an underscore, and someone created columns with all upper case without underscore.
the convention for case when is the WHENs are aligned vertically, someone put the WHENs like a snake.

Those kind of things can really make me uncomfortable. Call it OCD or whatever, but I had it since the first day I coded when I was 16. I need to have things in order, and in good structure. I don’t mind giving up my own habit, and agree a team convention, for example:

I usually use lower case, but if the team standard is upper case, I don’t mind using upper case.
I usually put a comma in the back, but if the team standard is in the front, I’ll put them in the front.
I usually use 2 spaces for indentation, but if the team standard is 4 spaces, I’ll use 4 spaces.

But everyone in the team must follow the convention. Everyone of us sacrifice their own habit in order to have a team convention. Which is a good thing to do. But once we agree a convention, everyone in the team must adhere to it. If not, I have an issue with it. I have an issue seeing something which is out of order. If everything is aligned and one thing is not aligned, it would disturb me.

I got satisfaction by seeing that my code is working. The pay back of our effort is when the output is as per the expectation. That is the good outcome. But a lot of times, before that is achieved, we must resolve many issues. The output is not as expected, and we had to debug the code to find out what’s wrong. That is also very enjoyable for me. Systematically debugging the code to find out which lines cause the issue is very satisfying. Why? Because I know I will be able to find out the source of the issue and I know I will be able to put it right.

How come I have that kind of confidence? Because I’ve done it hundreds of times. I started coding with I was 16 and now I’m 53 so I’ve done it for 37 years. Yes in my early years I doubted myself if I would be able to solve the issues. But as years went by and I managed to solve the problem every time, my confidence grew. These days I enjoyed troubleshooting issues. It could be coding issues, design issues or infrastructure issues. I’ve done them multiple times and I always managed to solve it. Every time. That is why I can enjoy solving new issues. Because I know I will be able to solve them.

The weird thing about coding is that, to be efficient (quick) we need to take time to do it properly. Plan it properly, write it down (how you are going to build it). Verify the approach. Discuss with other people if necessary (to get feedback). Don’t rush into coding it. That’s the easy part. Making sure that you do it the right way is the hard part. If we rush into coding (skipping the planning and design and the strategy all together), the chances are that we’ll get into a mess. And when coding we need to build it systematically. We need to build it modularly, i.e. module by module, piece by piece, layer by layer. Do yourself a big favour, don’t build it as monolith (one big thing). And you’re doing a big favour for someone who comes after you too, because if you build it modularly it is easier to understand, and easier to troubleshoot.

Happy coding.

Sunder Annamraju

Finance Change & Regulatory Programmes Delivery, Financial Systems Expert, Chartered IT Professional

1 个月

Thanks Vincent. I totally get it. Adhering to good practices is invaluable in understanding and debugging code! Commenting your code helps too. As you said, get the logic right and rest will follow in an orderly manner.

2 次回应

要查看或添加评论，请登录

Vincent Rainardi的更多文章

Data Mesh

2024年11月6日

Data Mesh

For many companies Data Mesh is not an option. It’s too expensive and too long.

2 条评论
SQL

2024年11月2日

SQL

SQL is an amazing language. So many other languages come and go, but SQL is here to stay.

4 条评论
Unit test in dbt

2024年10月30日

Unit test in dbt

In dbt, we can create a test to check null values like this: And we can test for acceptable values too, like this: If…
Snowflake: convert arrays to rows

2024年10月30日

Snowflake: convert arrays to rows

In Snowflake we can have a column containing an array like this: To convert that array column to string we just need to…

1 条评论
Build button in dbt

2024年10月30日

Build button in dbt

In dbt, you do dbt build by typing this: dbt build -s Dim_Product.sql You can build the predecessors of that model too…

8 条评论
Querying a Snowflake Data Warehouse

2024年10月29日

Querying a Snowflake Data Warehouse

In Snowflake there is a sample database called Snowflake_Sample_Data. This database contains 6 data warehouses: One of…

2 条评论
Data Warehousing is Dead

2024年10月27日

Data Warehousing is Dead

Do you know why people are saying that data warehousing is dead? Because of AI. And because of data lake.

118 条评论
Managing a Data Warehouse Project

2024年10月20日

Managing a Data Warehouse Project

Say you work for a distribution company in consumer industry, so distributing household products such as packaged…

5 条评论
Objectives of Data Warehousing

2024年10月14日

Objectives of Data Warehousing

We build a data warehouse because we need to integrate data from multiple different sources. We build a data warehouse…

17 条评论
Snowflake Coding (without stored proc)

2024年10月12日

Snowflake Coding (without stored proc)

The traditional way for coding in Snowflake is of course using SQL. You don’t need to create a stored procedure.

1 条评论

See all articles

Coding

Vincent Rainardi

Data Architect & Data Engineer

领英推荐

Vincent Rainardi的更多文章

社区洞察

其他会员也浏览了

ETL or ELT

SQL | DDL, DQL, DML, DCL and TCL Commands

Migrating IBM i Query/400 to SQL iQuery

SQL Server Stored Procedures: Boosting SSIS Performance

SQLTools for the IBM i Admin

3 Simple Steps to Convert a FlatFile into Json / CSV / XML

How to automate data migration testing with DbFit (15 mins read with practical scenarios)

Ab Initio Developer

领英推荐

Vincent Rainardi的更多文章

Data Mesh

SQL

Unit test in dbt

Snowflake: convert arrays to rows

Build button in dbt

Querying a Snowflake Data Warehouse

Data Warehousing is Dead

Managing a Data Warehouse Project

Objectives of Data Warehousing

Snowflake Coding (without stored proc)

社区洞察

其他会员也浏览了

ETL or ELT

SQL | DDL, DQL, DML, DCL and TCL Commands

Migrating IBM i Query/400 to SQL iQuery

SQL Server Stored Procedures: Boosting SSIS Performance

SQLTools for the IBM i Admin

3 Simple Steps to Convert a FlatFile into Json / CSV / XML

How to automate data migration testing with DbFit (15 mins read with practical scenarios)

Ab Initio Developer