Sloppiness, Spaghetti Code, and Security Vulnerabilities , the  hallmarks of Today's IT Industry

Sloppiness, Spaghetti Code, and Security Vulnerabilities , the hallmarks of Today's IT Industry

For those who have weathered the repeated crises in the IT industry, it is pretty clear, though counterintuitive for outsiders, that the industry tends historically to degenerate quickly after each crisis. This degeneration manifests itself in various ways. Firstly, in terms of code quality, engineering practices often rely on the latest trendy framework, disregarding thorough investigation or consideration of alternatives. Secondly, in terms of security, the situation is perpetually worsening with systems becoming increasingly vulnerable and riddled with security flaws. Finally, regarding software engineering practices, spaghetti code and verbose solutions are rampant. These practices are further propagated by the maintenance of legacy code written in outdated languages such as Java and older versions of C++. These issues together paint a gloomy picture of the state of our industry.
Furthermore, the absurdity of the IT industry is magnified by a managerial level that often displays a worrying lack of proficiency in software engineering. This worsens the problem, forcing software engineers to specialize in certain trendy, fashionable frameworks and paradigms despite their evident flaws and security holes. To illustrate, I have presented clear and scientifically backed metrics demonstrating that for a specific project, a solution based on functional programming was two orders of magnitude more efficient in terms of cost, time, maintenance, security, and accuracy compared to the OOP equivalent. Yet, despite the compelling evidence, this solution was dismissed simply because it did not conform to the company's prevailing "cultural norms", whatever that is means, a term coined by marketing and HR departments, and one that seems to have a different, often contradictory definition depending on who you ask.


For the sake of illustration, I had an opportunity to dissect a script written in Python and assembly language, designed to detect S3 boot script vulnerabilities related to UEFI (Unified Extensible Firmware Interface) made by INTEL. The purpose of the routines is quite serious: they seek to expose potential holes that attackers could exploit, forcing the system into hibernation, and then executing malicious code upon restart.

While analyzing the script, I came across several areas for improvement. Let's walk through these together.

Original code made by INTEL:

[bits 32]


; save registers
push? ? eax
push? ? edx
push? ? esi
; These lines are pushing the current state of the registers (eax, edx, esi) onto the stack to preserve their values. This is a common practice in assembly to avoid data corruption during function execution.


call? ? _label
; The CALL instruction is pushing the next instruction's address (return address) onto the stack and then redirecting the execution to the specified label.


db? ? ? 0ffh
dd? ? ? 0 ; shellcode call counter
db? ? ? 0 ; BIOS_CNTL value
dd? ? ? 0 ; TSEGMB value
; Here, we're setting up some initial values for the shellcode call counter, BIOS_CNTL, and TSEGMB values.


_label:
; This is the start of the code block that was previously called.


; get data address
pop? ? ?esi
inc? ? ?esi
; This is retrieving the previously stored address from the stack and incrementing it by one. The address points to the memory space where the data is stored.


; increment call counter
inc? ? ?dword [esi]
; This increases the call counter by one. Each time the payload is called, this value is incremented.


; exit if current call isn't first
cmp? ? ?byte [esi], 1
jne? ? ?_end
; This compares the call counter with 1 and jumps to the _end label if the counter is not 1, meaning this isn't the first call.


; bus = 0, dev = 0x1f, func = 0, offset = 0xdc
mov? ? ?eax, 0x8000f8dc
mov? ? ?dx, 0xcf8
out? ? ?dx, eax
; These lines perform an I/O operation by moving the value 0x8000f8dc to the eax register, the value 0xcf8 to the dx register, and then outputting the value of eax to the I/O port dx.


; read BIOS_CNTL value
mov? ? ?dx, 0xcfc
in? ? ? al, dx
; This reads the BIOS_CNTL value from the specific hardware port.


; save BIOS_CNTL value
mov? ? ?byte [esi + 4], al
; This stores the BIOS_CNTL value for later use.


; bus = 0, dev = 0, func = 0, offset = 0xb8
mov? ? ?eax, 0x800000b8
mov? ? ?dx, 0xcf8
out? ? ?dx, eax
; This sequence again performs an I/O operation, this time with different values.


; read TSEGMB value
mov? ? ?dx, 0xcfc
in? ? ? eax, dx
; This reads the TSEGMB value from the specified hardware port.


; save TSEGMB value
mov? ? ?dword [esi + 5], eax
; This stores the TSEGMB value for later use.


; check if TSEGMB is locked
and? ? ?eax, 1
test? ? eax, eax
jnz? ? ?_end
; This checks if the TSEGMB is locked. If it is locked, it jumps to the _end label.


; bus = 0, dev = 0, func = 0, offset = 0xb8
mov? ? ?eax, 0x800000b8
mov? ? ?dx, 0xcf8
out? ? ?dx, eax
; Another I/O operation sequence.


; write and lock TSEGMB with dummy/incorrect value
mov? ? ?eax, 0xff000001
mov? ? ?dx, 0xcfc
out? ? ?dx

        

As you can see, despite being written in assembly and relatively brief, the code is primarily spaghetti-like, filled with boilerplate, and largely unmaintainable due to unnecessary hardcoded instances. Here's the surprise: this code originates directly from Intel, with its purpose being to scrutinize a central component of its own security scheme, as mentioned earlier. The repeated sequences of instructions tied to specific bus/device/function/offsets could feasibly be abstracted into a subroutine. This would result in a script that's shorter, more readable, and easier to maintain. So, let's attempt to clean up this tangle:

[bits 32]

; save registers
push? ? eax
push? ? edx
push? ? edi


call? ? _label


db? ? ? 0ffh
dd? ? ? 0 ; shellcode call counter
db? ? ? 0 ; BIOS_CNTL value
dd? ? ? 0 ; TSEGMB value


_label:


; get data address
pop? ? ?edi
inc? ? ?edi


; increment call counter
inc? ? ?dword [edi]


; exit if current call isn't first
cmp? ? ?dword [edi], 1
jne? ? ?_end


; bus = 0, dev = 0x1f, func = 0, offset = 0xdc
mov? ? ?eax, 0x8000f8dc
mov? ? ?dx, 0xcf8
out? ? ?dx, eax


; read BIOS_CNTL value
mov? ? ?dx, 0xcfc
in? ? ? al, dx


; save BIOS_CNTL value
; replaced: mov? ? ?byte [esi + 4], al
stosb


; bus = 0, dev = 0, func = 0, offset = 0xb8
mov? ? ?eax, 0x800000b8
mov? ? ?dx, 0xcf8
out? ? ?dx, eax


; read TSEGMB value
mov? ? ?dx, 0xcfc
in? ? ? eax, dx


; save TSEGMB value
; replaced: mov? ? ?dword [esi + 5], eax
stosd


; check if TSEGMB is locked
test? ? eax, eax
jnz? ? ?_end


; bus = 0, dev = 0, func = 0, offset = 0xb8
mov? ? ?eax, 0x800000b8
mov? ? ?dx, 0xcf8
out? ? ?dx, eax


; write and lock TSEGMB with dummy/incorrect value
mov? ? ?eax, 0xff000001
mov? ? ?dx, 0xcfc
out? ? ?dx, eax


_end:


; restore registers
pop? ? ?edi
pop? ? ?edx
pop? ? ?eax

        

what have we improved here?

Readability: we avoid the use of specific offsets with the ESI register, thus make it more readable as it simplifies the operations involving saving values to memory. It is easier to follow that data is being sequentially stored.

Maintainability: the stosb and stosd instructions are used to automatically increment the EDI register and sequentially write to memory. This replaces the previous mov instructions that were using fixed offsets with ESI. The use of stosb and stosd instead of mov with explicit memory addresses make the code easier to maintain. If you wanted to add new data to store, it could be done without having to manually calculate and update offsets.

But that's not all, folks. Now it comes the second stage, we can further improve the abstractions at the level of procedures and routines, which are lacking in that assembly code:

[bits 32]

; save registers
push? ? eax
push? ? edx
push? ? edi


; start of payload
call? ? start_payload


db? ? ? 0ffh
dd? ? ? 0 ; shellcode call counter
db? ? ? 0 ; BIOS_CNTL value
dd? ? ? 0 ; TSEGMB value


start_payload:


; pop return address into edi
pop? ? ?edi
inc? ? ?edi


; increment call counter and exit if not first call
inc? ? ?dword [edi]
cmp? ? ?dword [edi], 1
jne? ? ?end_payload


; perform operations on registers
perform_operations


; save BIOS_CNTL value and read TSEGMB value
stosb
perform_operations


; save TSEGMB value and check if locked
stosd
test? ? eax, eax
jnz? ? ?end_payload


; perform operations again
perform_operations


; lock TSEGMB with dummy value
mov? ? ?eax, 0xff000001
mov? ? ?dx, 0xcfc
out? ? ?dx, eax


end_payload:


; restore registers
pop? ? ?edi
pop? ? ?edx
pop? ? ?eax


; subroutine to abstract common operations
perform_operations proc
? ? mov? ? ?eax, 0x8000f8dc
? ? mov? ? ?dx, 0xcf8
? ? out? ? ?dx, eax
? ? mov? ? ?dx, 0xcfc
? ? in? ? ? al, dx
perform_operations endp

        

What we have done here is introduce Abstract Common Operations: If there are operations that are repeated multiple times in your code (like reading a certain register), you can create a subroutine (a 'proc' in assembly) that encapsulates that operation. This can make the code cleaner and easier to maintain because you only need to modify the code in one place if the operation changes. Additionally, we recommend Using Meaningful Labels: In assembly code, the use of meaningful labels instead of generic ones can greatly enhance readability. For instance, instead of '_label', you could use something like "_start_payload ". Lastly, we have replaced unnecessary, verbose code like:


and? ? ?eax, 1
test? ? eax, eax
jnz? ? ?_end        

with:

test? ? eax, eax
jnz? ? ?end_payloadx        


The spaghetti code saga from a company as significant as Intel doesn't end here. The sloppy assembly code you've seen in its first version is compiled and used inside Python code that is even worse than the assembly code itself. Here are some examples of that:

Unnecessary, sloppily spaghettified boilerplate code in the Intel Python caller:

def _at(data, off, size, fmt): return unpack(fmt, data[off : off + size])[0

def byte_at(data, off = 0): return _at(data, off, 1, 'B')
def word_at(data, off = 0): return _at(data, off, 2, 'H')
def dword_at(data, off = 0): return _at(data, off, 4, 'I')
def qword_at(data, off = 0): return _at(data, off, 8, 'Q')]        

Even in Python, a junior developer could come up with something better like this:

from struct import unpack


def make_at(fmt, size):
? ? def at(data, off=0):
? ? ? ? return unpack(fmt, data[off : off + size])[0]
? ? return at

byte_at = make_at('B', 1)
word_at = make_at('H', 2)
dword_at = make_at('I', 4)
qword_at = make_at('Q', 8)

        

As you see , in this code, make_at is a higher-order function that generates and returns another function (at). The generated at function unpacks binary data of a specific size and format at a given offset. The make_at function is then used to create the byte_at, word_at, dword_at, and qword_at functions, each with a different format and size. This approach encapsulates the call to unpack and the size and format parameters in the returned function, reducing repetition and making the code more flexible and reusable.

A mid-to-senior level engineer, on the other hand, could have come up with an even more improved version, like this one:

from struct import unpack


def make_at(fmt, size):
? ? def at(data, off=0):
? ? ? ? return unpack(fmt, data[off : off + size])[0]
? ? return at


function_params = {
? ? 'byte_at': ('B', 1),
? ? 'word_at': ('H', 2),
? ? 'dword_at': ('I', 4),
? ? 'qword_at': ('Q', 8)
}


globals().update({name: make_at(fmt, size) for name, (fmt, size) in function_params.items()})

        

Here, our middle level developer could imagine a more dynamic solution that generates the functions byte_at, word_at, dword_at, and qword_at and adds them to the global namespace. You can then use these functions as if they were defined normally.

While a senior engineer well-versed in functional programming could can come up with an even more elegant solution in Haskell by using a simple profunctor to avoid Python's global namespace:

{-# LANGUAGE ScopedTypeVariables #-
{-# LANGUAGE TypeApplications #-}
import Data.Profunctor
import Data.Word
import Data.Binary.Get
import Data.ByteString.Lazy as BS


newtype MyProfunctor b a = MyProfunctor { runProfunctor :: BS.ByteString -> a }


instance Profunctor MyProfunctor where
? dimap ab cd (MyProfunctor f) = MyProfunctor (cd . f . ab)


byteAt :: Int -> BS.ByteString -> Word8
byteAt = runGet . runProfunctor $ extractor @Word8


wordAt :: Int -> BS.ByteString -> Word16
wordAt = runGet . runProfunctor $ extractor @Word16


dwordAt :: Int -> BS.ByteString -> Word32
dwordAt = runGet . runProfunctor $ extractor @Word32


qwordAt :: Int -> BS.ByteString -> Word64
qwordAt = runGet . runProfunctor $ extractor @Word64


extractor :: forall a. (Binary a) => MyProfunctor Int (Get a)
extractor = dimap fromIntegral (const get) (MyProfunctor BS.drop)

}        

Here are other examples among dozens in the sloppy Intel Python Code:

bitval = lambda val, b: 0L if val & (1L << b) == 0 else 1L        

When it should be

bitval = lambda val, b: (val >> b) & 1        

and so on and so forth.

CONCLUSIONS:

Taking the case of Intel as an example, but the same applies to most of the IT companies, regardless their size. As one of the giants in the tech industry, it is disheartening to see code that is not up to the standard one would expect from such a company. Their approach, intended to rectify vulnerabilities, instead seems to introduce new potential problems due to its complexity and lack of readability of the code made for that. This lack of maintainability and the inherent risk of introducing new bugs is indicative of the broader problem facing the industry.

The hiring process across the industry is another area of concern. Too often, interviews focus on abstract algorithms and data structures, or on the syntax of specific languages, rather than on practical software engineering skills, problem-solving abilities, and an understanding of best practices. This does not provide a reliable measure of a candidate's potential to contribute effectively to a codebase or a team.

To improve the situation, it's crucial that companies prioritize high-quality, maintainable code. This includes investing in training for their developers, creating an explicit engineering methodology that supports good engineering practices, and ensuring that management understands and supports these goals. The hiring process should be designed to assess candidates' ability to apply these practices and to write good, clean code, rather than focusing solely on abstract problem-solving or specific language syntax.

要查看或添加评论,请登录

Jose Crespo的更多文章

社区洞察

其他会员也浏览了