Coding
For many years, my job was to design a data platform. Either architecting the platform, designing the data model, or writing the ETL specification. Or designing the report and dashboard. But in the last one year, in addition to designing I’m also doing coding.
I love coding. When I was doing my masters in ML a few years ago, I enjoyed coding so much. It was in Python. I started my career as a developer. Microsoft Access, FoxPro, Cobol, C++, Javascript, Java, C#.NET, Easytrieve, and of course SQL. I then did MS BI for a long time, coding in SSIS, Reporting Services and MDX for SSAS, as well as in SQL Server. Then I moved to Oracle, doing PL/SQL, IBM MQ, and Teradata, doing BTEQ scripting as well as TCRM and Teradata SQL.
When I was in BI I did Cognos, Tableau, ProClarity, Power BI, Business Objects and Qlik. Oh and Tibco Spotfire too. I picked up different BI tools quite easily, including the scripting language. Spotfire for example, I used JavaScript and Python. But I also coded ETL, i.e. using Informatica PowerCenter, IBM Datastage, Oracle Data Integrator, Composite, Hadoop, Hive, SSIS, ADF, PySpark and now dbt.
When coding, I usually get it clear first in my head about the functionality that I’m about to build. For example, it could be about transforming data. I would find out in detail, what the expected output is, what the inputs are, and what transformations are required. I usually open a Notepad++ and write down in my own words: 1. What I am about to build. 2. The steps to build it. 3. The expected outcomes.
I visualise it until the process is really clear in my mind. As an example, say I was tasked with building a data transformation, which builds client profiles. The output is many thousand clients, complete with all their attributes such as name, date of birth, address, and many others. It would take a few steps to build it, including defining some reusable subroutines. I would visualise it until I was very clear what the key is, how the key should be constructed, and which attributes can be multi-values. For example, a client can have multiple addresses, multiple mobile numbers, and multiple middle names but only one date of birth, one title and one first name. Which multi-values attributes go to which tables, and how those tables are linked together. And also be clear about the data types too. For each column. In each table.
It is also important to be clear about the strategy how I will be building it. For example, if there are several client types, will they be built using a single module with lots of branching inside it, or one module for each client type and we union them at the end. I spend time thinking about the strategy, because it will affect future development. And it will affect testing too. For example, if it is a single module with lots of branching inside it, every time we add a client type testers will need to do regression tests on the client types that we have delivered previously.
And the way I work is I build it piece by piece, module by module. Whether I use Informatica PowerCenter, Azure Data Factory or Power BI, I build it piece by piece. And every time I complete a piece, I test it. So that I know that piece produces the right output. It is a lot easier to test one small piece of code, rather than a big one. And I build it in modular way, the exact opposite of monolithic. Not only modular is easier to build, they are easier to test and easier to debug too.
In terms of time percentage, I probably spend about 20% planning the details, 60% building it, 10% testing and 10% releasing it. But it varies, some application requires more unit tests built in, so it could be 20% planning, 50% building, 20% testing and 10% release process.
People have different style of coding, and in this day and age we code within a team of people. We don’t code alone. So we must agree in advance the style of coding, such as naming convention (columns, variables, table/views), indentation, CTE and subqueries, etc. If I see a piece of code breaking that convention, my head can become itchy. For example:
Those kind of things can really make me uncomfortable. Call it OCD or whatever, but I had it since the first day I coded when I was 16. I need to have things in order, and in good structure. I don’t mind giving up my own habit, and agree a team convention, for example:
But everyone in the team must follow the convention. Everyone of us sacrifice their own habit in order to have a team convention. Which is a good thing to do. But once we agree a convention, everyone in the team must adhere to it. If not, I have an issue with it. I have an issue seeing something which is out of order. If everything is aligned and one thing is not aligned, it would disturb me.
I got satisfaction by seeing that my code is working. The pay back of our effort is when the output is as per the expectation. That is the good outcome. But a lot of times, before that is achieved, we must resolve many issues. The output is not as expected, and we had to debug the code to find out what’s wrong. That is also very enjoyable for me. Systematically debugging the code to find out which lines cause the issue is very satisfying. Why? Because I know I will be able to find out the source of the issue and I know I will be able to put it right.
How come I have that kind of confidence? Because I’ve done it hundreds of times. I started coding with I was 16 and now I’m 53 so I’ve done it for 37 years. Yes in my early years I doubted myself if I would be able to solve the issues. But as years went by and I managed to solve the problem every time, my confidence grew. These days I enjoyed troubleshooting issues. It could be coding issues, design issues or infrastructure issues. I’ve done them multiple times and I always managed to solve it. Every time. That is why I can enjoy solving new issues. Because I know I will be able to solve them.
The weird thing about coding is that, to be efficient (quick) we need to take time to do it properly. Plan it properly, write it down (how you are going to build it). Verify the approach. Discuss with other people if necessary (to get feedback). Don’t rush into coding it. That’s the easy part. Making sure that you do it the right way is the hard part. If we rush into coding (skipping the planning and design and the strategy all together), the chances are that we’ll get into a mess. And when coding we need to build it systematically. We need to build it modularly, i.e. module by module, piece by piece, layer by layer. Do yourself a big favour, don’t build it as monolith (one big thing). And you’re doing a big favour for someone who comes after you too, because if you build it modularly it is easier to understand, and easier to troubleshoot.
Happy coding.
Finance Change & Regulatory Programmes Delivery, Financial Systems Expert, Chartered IT Professional
1 个月Thanks Vincent. I totally get it. Adhering to good practices is invaluable in understanding and debugging code! Commenting your code helps too. As you said, get the logic right and rest will follow in an orderly manner.