Coding

For many years, my job was to design a data platform. Either architecting the platform, designing the data model, or writing the ETL specification. Or designing the report and dashboard. But in the last one year, in addition to designing I’m also doing coding.

I love coding. When I was doing my masters in ML a few years ago, I enjoyed coding so much. It was in Python. I started my career as a developer. Microsoft Access, FoxPro, Cobol, C++, Javascript, Java, C#.NET, Easytrieve, and of course SQL. I then did MS BI for a long time, coding in SSIS, Reporting Services and MDX for SSAS, as well as in SQL Server. Then I moved to Oracle, doing PL/SQL, IBM MQ, and Teradata, doing BTEQ scripting as well as TCRM and Teradata SQL.

When I was in BI I did Cognos, Tableau, ProClarity, Power BI, Business Objects and Qlik. Oh and Tibco Spotfire too. I picked up different BI tools quite easily, including the scripting language. Spotfire for example, I used JavaScript and Python. But I also coded ETL, i.e. using Informatica PowerCenter, IBM Datastage, Oracle Data Integrator, Composite, Hadoop, Hive, SSIS, ADF, PySpark and now dbt.

When coding, I usually get it clear first in my head about the functionality that I’m about to build. For example, it could be about transforming data. I would find out in detail, what the expected output is, what the inputs are, and what transformations are required. I usually open a Notepad++ and write down in my own words: 1. What I am about to build. 2. The steps to build it. 3. The expected outcomes.

I visualise it until the process is really clear in my mind. As an example, say I was tasked with building a data transformation, which builds client profiles. The output is many thousand clients, complete with all their attributes such as name, date of birth, address, and many others. It would take a few steps to build it, including defining some reusable subroutines. I would visualise it until I was very clear what the key is, how the key should be constructed, and which attributes can be multi-values. For example, a client can have multiple addresses, multiple mobile numbers, and multiple middle names but only one date of birth, one title and one first name. Which multi-values attributes go to which tables, and how those tables are linked together. And also be clear about the data types too. For each column. In each table.

It is also important to be clear about the strategy how I will be building it. For example, if there are several client types, will they be built using a single module with lots of branching inside it, or one module for each client type and we union them at the end. I spend time thinking about the strategy, because it will affect future development. And it will affect testing too. For example, if it is a single module with lots of branching inside it, every time we add a client type testers will need to do regression tests on the client types that we have delivered previously.

And the way I work is I build it piece by piece, module by module. Whether I use Informatica PowerCenter, Azure Data Factory or Power BI, I build it piece by piece. And every time I complete a piece, I test it. So that I know that piece produces the right output. It is a lot easier to test one small piece of code, rather than a big one. And I build it in modular way, the exact opposite of monolithic. Not only modular is easier to build, they are easier to test and easier to debug too.

In terms of time percentage, I probably spend about 20% planning the details, 60% building it, 10% testing and 10% releasing it. But it varies, some application requires more unit tests built in, so it could be 20% planning, 50% building, 20% testing and 10% release process.

People have different style of coding, and in this day and age we code within a team of people. We don’t code alone. So we must agree in advance the style of coding, such as naming convention (columns, variables, table/views), indentation, CTE and subqueries, etc. If I see a piece of code breaking that convention, my head can become itchy. For example:

  • the convention for indentation is 4 spaces, and someone put 2 spaces
  • the convention is comma in the front, and someone put the comma at the back
  • the convention for column name is camel case with an underscore, and someone created columns with all upper case without underscore.
  • the convention for case when is the WHENs are aligned vertically, someone put the WHENs like a snake.

Those kind of things can really make me uncomfortable. Call it OCD or whatever, but I had it since the first day I coded when I was 16. I need to have things in order, and in good structure. I don’t mind giving up my own habit, and agree a team convention, for example:

  • I usually use lower case, but if the team standard is upper case, I don’t mind using upper case.
  • I usually put a comma in the back, but if the team standard is in the front, I’ll put them in the front.
  • I usually use 2 spaces for indentation, but if the team standard is 4 spaces, I’ll use 4 spaces.

But everyone in the team must follow the convention. Everyone of us sacrifice their own habit in order to have a team convention. Which is a good thing to do. But once we agree a convention, everyone in the team must adhere to it. If not, I have an issue with it. I have an issue seeing something which is out of order. If everything is aligned and one thing is not aligned, it would disturb me.

I got satisfaction by seeing that my code is working. The pay back of our effort is when the output is as per the expectation. That is the good outcome. But a lot of times, before that is achieved, we must resolve many issues. The output is not as expected, and we had to debug the code to find out what’s wrong. That is also very enjoyable for me. Systematically debugging the code to find out which lines cause the issue is very satisfying. Why? Because I know I will be able to find out the source of the issue and I know I will be able to put it right.

How come I have that kind of confidence? Because I’ve done it hundreds of times. I started coding with I was 16 and now I’m 53 so I’ve done it for 37 years. Yes in my early years I doubted myself if I would be able to solve the issues. But as years went by and I managed to solve the problem every time, my confidence grew. These days I enjoyed troubleshooting issues. It could be coding issues, design issues or infrastructure issues. I’ve done them multiple times and I always managed to solve it. Every time. That is why I can enjoy solving new issues. Because I know I will be able to solve them.

The weird thing about coding is that, to be efficient (quick) we need to take time to do it properly. Plan it properly, write it down (how you are going to build it). Verify the approach. Discuss with other people if necessary (to get feedback). Don’t rush into coding it. That’s the easy part. Making sure that you do it the right way is the hard part. If we rush into coding (skipping the planning and design and the strategy all together), the chances are that we’ll get into a mess. And when coding we need to build it systematically. We need to build it modularly, i.e. module by module, piece by piece, layer by layer. Do yourself a big favour, don’t build it as monolith (one big thing). And you’re doing a big favour for someone who comes after you too, because if you build it modularly it is easier to understand, and easier to troubleshoot.

Happy coding.

Sunder Annamraju

Finance Change & Regulatory Programmes Delivery, Financial Systems Expert, Chartered IT Professional

1 个月

Thanks Vincent. I totally get it. Adhering to good practices is invaluable in understanding and debugging code! Commenting your code helps too. As you said, get the logic right and rest will follow in an orderly manner.

要查看或添加评论,请登录

Vincent Rainardi的更多文章

  • Data Mesh

    Data Mesh

    For many companies Data Mesh is not an option. It’s too expensive and too long.

    2 条评论
  • SQL

    SQL

    SQL is an amazing language. So many other languages come and go, but SQL is here to stay.

    4 条评论
  • Unit test in dbt

    Unit test in dbt

    In dbt, we can create a test to check null values like this: And we can test for acceptable values too, like this: If…

  • Snowflake: convert arrays to rows

    Snowflake: convert arrays to rows

    In Snowflake we can have a column containing an array like this: To convert that array column to string we just need to…

    1 条评论
  • Build button in dbt

    Build button in dbt

    In dbt, you do dbt build by typing this: dbt build -s Dim_Product.sql You can build the predecessors of that model too…

    8 条评论
  • Querying a Snowflake Data Warehouse

    Querying a Snowflake Data Warehouse

    In Snowflake there is a sample database called Snowflake_Sample_Data. This database contains 6 data warehouses: One of…

    2 条评论
  • Data Warehousing is Dead

    Data Warehousing is Dead

    Do you know why people are saying that data warehousing is dead? Because of AI. And because of data lake.

    118 条评论
  • Managing a Data Warehouse Project

    Managing a Data Warehouse Project

    Say you work for a distribution company in consumer industry, so distributing household products such as packaged…

    5 条评论
  • Objectives of Data Warehousing

    Objectives of Data Warehousing

    We build a data warehouse because we need to integrate data from multiple different sources. We build a data warehouse…

    17 条评论
  • Snowflake Coding (without stored proc)

    Snowflake Coding (without stored proc)

    The traditional way for coding in Snowflake is of course using SQL. You don’t need to create a stored procedure.

    1 条评论

社区洞察

其他会员也浏览了