Thoughts on software engineering

Some thoughts on the question "What's the end-to-end procedure of driving a project?"

Design phase

1. Motivation and customer benefits. Start from a narrative story revealing the pain points with qualitative descriptions. It justifies the initiation of working on something. Also from PM and SDM's perspective, they will use it to prioritize among many other projects. It also greatly impacts how well the project will get funded and stuffed.

Interpret the same thing, instead of from a customer/user surface perspective, but from a technical/product behavioral perspective. - Functional requirements, Perf requirement, rejected requirements. Capturing the key invariants / input-output behavior.

Do Not commit to any timeline at this stage.

External resource: Input from SDM, PM, and any other stakeholders.

2. Detailed design phase.

Background on components that we are going to touch. It gives a simplified view of the existing architecture featured with some invariants/if-then behavioral patterns. Who are concurrently dependent on the component and what are their expectations of how this comp should work? Ideally distilled into a state machine.

Workflow that breaks things down into interactions between components.

  • - Break things down into
  • + Several key APIs with detailed specifications (sync/async, caller context, concurrency properties, throws, side effects of full/partial/repeated completion. If distributed, what's the data consistency model)
  • + Flow of exercising APIs (Can work on happy paths first, then expand with breadth-first-searching fail/partial complete scenarios)
  • - Or captured by the transition of states (I found explaining with a formal state machine model can be very useful in proving the soundness of the model). What are the possible states, subject to / exposed to other inputs?

Proof of concept. Very important. Prototype showing the viability of the idea. Better with primitive numbers. For big projects it might not be possible, so POC on smaller components.

Misc:

  • Key metrics for monitoring and logging.
  • User interface change.

External resources:

  • Experts on components you are not familiar with. Set up meetings / dive deep yourself to figure out the modeling of existing things. Start from combinations of playing with tests, extensive code reading, old design docs, and pointers from the natives. Quickly validate your idea to prove your understanding.
  • Design review approvers. Treat it as code reviews, and frequently follow up. Actively chasing. Aggressively book meetings.

At this stage, for any delays in closing up the design keep SDM aware. You are expected to get timely support with external resources you need and SDM is the best resort to ensure on this.

About timeline negotiation

Sometimes you need to give a timeline without fully closing the design, it depends on how much risk / potential waste of effort you are willing to take. If beyond the safety threshold, please do push back.

Regarding the timeline, in our team typically 1~2 persons will take up 95% of all the implementation while testing can be separately evenly. Due to organizational commitment something has to happen within a year / a quarter, which denies the possibility of a timely delivery if bottlenecked by the backbone persons. Hence manager and engineers have to work together to choose between any of the following:

  • - Formally trim the project scope to fit the initial delivery into 1~2 person's plates.
  • - Better design the project which allows more people to work on it in parallel. (Add more manpower requires design level awareness in the most ideal case, purely adding chaos to dev later in the worst case)
  • - Most of the time comes with burning night oils (depends on how bad the design is, and how blindly you yield to the timeline proposed by others instead of negotiating and setting the boundaries)

POC - Implementation - testing should take around 2:1:3 time. As far as I'm concerned, it is not so much saying that cutting testing by half is an option, as we pretending it is an option until we can no longer pretend when the cut-off date is running close, where we are rendered in a worse situation of changing things at the very last minute.

Take margins into account for supporting other projects like design discussions, code reviews/ops/customer issues/and uncertainty about the design by extra 3~4 weeks depending on past observations. If not planned and risking the committed timeline, tell them you cannot do it without delay.

3.1 Coding

While working on a super large code base like what we have at Redshift, it is never surprising that we have to touch code that is 3~5 years old without explicit owners maintaining it.

The first rule is always to leave the code in a cleaner state after you have worked on it.

It is vital and common that the first task is not about coding the new features but refactoring the existing code to lay an extensible foundation. It could take 1~2 weeks to clean up, add more validations, and even add the missing test coverages.

The trade-off among extensibility, agility, performance

A project is backed by its business value, not how neat the code is written. Software engineering would be more like a problem of how we balance code extensibility, alige feature delivery, and code performance. It is reasonable that we first work on something quick and dirty to get things to work, then come back later during code freeze to unravel the twists of if-else, unnecessary copies, code duplicates, and ugly workarounds. Sometimes we have to make the suboptimal choice from a pure coding perspective due to a shortage of manpower, tight timelines, and calculated risks. The standards are elastic with boundaries. It's not about perfect coding, but keeping things under control. It means tracking the technical debts, and day 1 code bugs, and keeping them in the backlog with priorities. We don't fix every bug being spotted, but keep things under control.

Having said that, I still do code cleanup + and writing tests for the existing code before working on them. Most of the time alige delivery is not contradictory to maintaining the code. Not only gives you confidence about the product quality but also speeds things up if we are working on well-structured code.

3.2 Anything beyond coding

Weekly follow-up. For project starving on the resources, counting days we lost, and keeping SDM up to date.

For explicit deprioritization of the projects, call out 3 weeks before we are going to lose it. It's paramount to review weekly with SDMs.

Last but not least, document everything with a tracking quip, and email threads of meeting summary. No one would ever remember what has been decided yesterday.

4. Testing

More to come about mocking and dependency injections.

Back to the Component design.

5. Miscs

From an engineer's perspective, I feel there is growing value in knowing beyond the scope. Project delivery is by no means that we are given some problem -> work on a design -> coding -> deploy -> monitor. Things can get deprioritized, trashed, abandoned, or badly resourced as something exciting in the first 2 months loses its business value after that. To get insights into these uncertainties, and change the entire problem formulations, our thinking process has to be deeply rooted in non-technical factors like customer values, how the management team vision the goals at the organization level, how other ongoing projects are changing the common infrastructures, among many other things. Working smart might not only mean being diligent on well-defined problems but also extending the scope of concerns to see how the technical questions are strategically generated.

It's just a vague idea as of now, but I'm sure in the future I will get a better understanding.

To be continued..

Yehan(Davis) Zhang

A huge fan of data intensive App! Voyage starts from OLAP @ Redshift -> Datalake @ Onehouse.

1 年

I'm sure there are no new ideas. Just a summary for me to retrospect my work in the past.

回复

要查看或添加评论,请登录

Yehan(Davis) Zhang的更多文章

  • Concurrency Programming 2

    Concurrency Programming 2

    # Pt1 Chapter 3 Sharing objects For the 3 strategies making object concurrent safe mentioned in the previous post -…

  • Concurrency Programming Digest 1

    Concurrency Programming Digest 1

    A state can be made concurrent safe via: [don't share] Don't share it across threads. [don't change] Make it immutable.

社区洞察

其他会员也浏览了