The curious case of Refactoring

The curious case of Refactoring

I'm going to do a refactoring today, so test engineer please do make a schedule to test my changes, and my goal is to increase performance in one of the endpoint I made.

Now there is a lot of things that wrong from above statement:

  1. You do not need to have a dedicated day to do a refactoring, you do it as you code.
  2. You do not need to bother your test engineer, because you do have unit test.
  3. Refactoring goal is NOT TO IMPROVE PERFORMANCE, IN SOME CASE IT CAN EVEN MAKE PERFORMANCE WORSE

Like many terms in software development "refactoring" is often used very loosely by practitioners, I my self follow more precise term, define by Martin Fowler in it's book Refactoring, which is:

"a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior"

The key word here is: "to make the code easy to understand, and cheaper to modify" so it never claim to make the performance of your code better. Michael Feathers in his book working effectively with legacy code even emphasize by saying that refactoring to achieve readability of code is a must, because code it's self is the single source documentation that will stay reliable, because in the world of lazy programmer when we change the code we often forget to change the documentation with it.

So when I code I always think, can other or even my future me understand my logic, and by thinking like this you do refactoring every single day, so you do not need to have a dedicated day to do this, and you don't even need to tell your colleague because what they wanted to know is the contract of your code, how to use it, what to call, what to passed, what is guarantee to return, and your manager only need to know that the task is done.

How I do Refactor Daily

First I'm going to say that I do not do TDD, why? because couple of time my experiment with it ended up badly, I don't know maybe I do it wrong, will try again some other times, but for now here is the complete flow of my way to code:

  1. Create the code do integrate test to see that the code work: this should be done in a couple of hours, it's really depend on how big the task you did, but then again I usually break the task in small pieces, to guarantee that I have a small functional unit done, and ready to test.
  2. Make the unit test: This unit test would become your guardian while refactoring, this thing would be an indicator that you don't change any behavior while refactoring.
  3. Actually do the refactoring: start simple by renaming some variable, or function or class/struct, there is a dedicated book of how to do it, I would recommend you to read, Martin Fowler Book: Refactoring: Improving the Design of Existing Code

Performance and Code Readability Trade Off

To understand why refactoring is never meant for performance, I will give you some example that I do encounter just recently, I change the code of course, but the problem is the same.

Let's say I'm making an API, by calling others API, get the data and populate additional data that are needed, let's said the data that need to be add is payment_id, so I made this function in golang:

func (g *OrderListUsecase) PopulatePaymentID(orders []OrderList) 
[]OrderList {

    var result []OrderList

    for _, order := range orders {
        ord := order
        ord.PaymentID = getPaymentIDByInvoice(order.Invoice)
        result = append(result, tempOrder)
    }

    return result
}

But then as it always happen thing changes, you need to add more data on API that you made, the goal is to categorize the status, let's said you have a hundred status, and you need to categorize it into 3 type only, failed, success, and in progress, so you made a constant map and change the code like this:

func (g *OrderListUsecase) PopulatePaymentID(orders []OrderList) 
[]OrderList {

    var result []OrderList

    for _, order := range orders {
        ord := order
        ord.PaymentID = getPaymentIDByInvoice(order.Invoice)
        ord.StatusCategory = mapStatusToCategory[order.status]
        result = append(result, tempOrder)
    }

    return result
}

Then another rule come up and you have to add additional data which is has_promo, so transaction that happen close to the end of the month would get a promo price, you change the code again to be like this:

func (g *OrderListUsecase) PopulatePaymentID(orders []OrderList) 
[]OrderList {

    var result []OrderList

    for _, order := range orders {
        ord := order
        ord.PaymentID = getPaymentIDByInvoice(tempOrder.Invoice)
        ord.StatusCategory = mapStatusToCategory[ord.status]
        if order.Date >= promoDate {
            ord.HasPromo = true
        } else {
            ord.HasPromo = false
        }
        result = append(result, tempOrder)
    }
    return result

}

This function break a couple of rule of clean code:

  1. The name of the function should indicate what the function do
  2. Function indentation should not go over 2.
  3. Function should be modular and only do one thing.

Now let's refactor it, and address all of the clean code issued that we list above:

The name of the function should indicate what the function do

The name PopulatePaymentID is misleading because the fact is the function did not do only that, so we change it to:

func (g *OrderListUsecase) PopulateAdditionalData(orders []OrderList) 
[]OrderList {

Done. this technique in refactoring world is called: Changed Function Declaration

Function indentation should not go over 2 & Function should be modular and only do one thing.

I'm going to tackle this issue together, Now I see inside the for loop it does a lot of thing(populate some value) together, what I did was separate and duplicate it.

    var resultPaymentID []OrderList
    for _, order := range orders {
        ord := order
        ord.PaymentID = getPaymentIDByInvoice(tempOrder.Invoice)
        resultPaymentID = append(result, tempOrder)
    }
    
    var resultCategory []OrderList
    for _, order := range resultPaymentID {
        ord := order
        ord.StatusCategory = mapStatusToCategory[ord.status]
        resultCategory = append(result, tempOrder)
    }
    
    var result []OrderList
    for _, order := range resultCategory {
        if order.Date >= promoDate {
            ord.HasPromo = true
        } else {
            ord.HasPromo = false
        }
        resultCategory = append(result, tempOrder)
    }

Now look at that stupid code, why would you do that, because I wanted to do a refactoring technique called: Extract Fucntion. first I need to separate the logic, so that I can extract that to another function, like this:

func (g *OrderListUsecase) PopulateAdditionalData(orders []OrderList) 
[]OrderList {
    var result []OrderList
    result = populatePaymentID(orders)
    result = populateCategory(result)
    result = populateHasPromo(result)
    return result
}


func populatePaymentID(orders []OrderList) []OrderList {
    var result []OrderList
    for _, order := range orders {
        ord := order
        ord.PaymentID = getPaymentIDByInvoice(tempOrder.Invoice)
        result = append(result, tempOrder)
    }
    return result
}


func populateCategory(orders []OrderList) []OrderList {
    var result []OrderList
    for _, order := range orders {
        ord := order
        ord.StatusCategory = mapStatusToCategory[ord.status]
        result = append(result, tempOrder)
    }
    return result
}


func populateHasPromo(orders []OrderList) []OrderList {
    var result []OrderList
    for _, order := range orders {
        if order.Date >= promoDate {
            ord.HasPromo = true
        } else {
            ord.HasPromo = false
        }
        result = append(result, tempOrder)
    }
}

And look at that code, it's pretty, now the logic is more cleared, every unit of function is doing exactly it promise to do, (Note: I purposely left some bad code design for you to figure it out, but for an example purpose I think this is enough)

Every programmer, even a beginner one would bark seeing the change I made, because of simple fact that the code that loop through the data once, now loop through the same data 3times more, if we do the Big Oh notation: my first logic run O(N), and the code after refactor did O(3N).

This is where the trade of between performance and readability happen, now which one do you choose? me personally I'll keep refactoring, a constant times N is really nothing, computer would still able to do it very fast, but when some issued regarding performance do occur when you do integrate test, revert back and change your code with the goal of performance.

To summarize:

"Do refactor everyday, at the end of your cycle of coding."

"Beware of the performance trade of refactoring."

Happy refactoring Guys, see you in the next articles.


Cheers,












要查看或添加评论,请登录

Dicky Kamarullah的更多文章

社区洞察

其他会员也浏览了