Understanding operating systems through Linux.
Rick Ramirez
BISO Analyst @ Johnson & Johnson | Supply Chain Risk Management | Compliance Professional | ICS & OT Cybersecurity
The Linux operating system has become the de facto operating system for open source projects. This operating system is prevalent for a startup, commercial usage, and industrial controls. The Linux system's main benefit is its flexibility. This flexibility allows computer scientists to install the Linux operating systems on various computing systems, from microcomputers and supercomputers. Recently Microsoft has also incorporated the Linux kernel in their current Windows 10 operating system. This operating system is ingrained throughout society, and its prevalence is expected only to increase.?
The Linux Operating System
Introduction
The integration of computer systems is apparent throughout society. These devices are deeply intertwined within the infrastructure of civilization that many people are entirely unaware of their presence. The critical resources for modern survival, such as water, gas, and electricity, are all managed by computer systems. In February of 2021, the citizens of Texas faced a severe winter storm. The storm resulted in power outages for hundreds of thousands of Texans and over seventy deaths. This storm revealed to Texans and the rest of the United States how critical these essential resources are to survival (Neuman & Romo, 2021). However, this crisis was a result of a natural disaster. The computer systems that manage power and water are equally vulnerable to a cyber attack.?
Computer systems being such an integral part of modern living, most people should be aware of how they function, although many do not. Computer systems utilize an operating system to manage all of the hardware and peripheral devices needed for the computing system to work. The operating system is a unique program designed to enable human users to communicate with the computer system's hardware devices. The operating system that many people are familiar with Microsoft's Windows operating system is common in most personal computers. A more discrete example would be a smartphone operating system. The iOS operating system is one of the most common operating systems seen in smartphones and other mobile computing devices.?
While Windows and iOS operating systems are popular for personal use, many commercial and industrial industries utilize Linux. Linux is also the foundation for many other well-known operating systems. Google's ChromeOS and the Android operating system are both built on top of Linux. Linux operating systems have found popularity in engineering and specifically in artificial intelligence (Vaughan-Nichols, 2020). Regardless of the omnipresence of Linux in society, many people do not understand how the operating system functions—learning how processes, memory, storage, I/O devices, and security management function in a Linux environment is essential. In understanding these concepts, users will operate efficiently, securely, and innovate computer systems.?
What is Linux??
Linux, as stated earlier, is referred to as an operating system. Although, Linux itself is not an operating system but a the system's kernel. The job of the kernel facilitates the interaction of the software with the hardware of a computer system. The modularity that is inherent within Linux systems comes from utilizing this kernel. The kernel allows technology professionals to tailor the operating system to meet the needs of the computing devices.?
Many organizations have made their versions of Linux operating systems. In the Linux community, those different operating systems are called distros, short for distributions. Common Linux distros are Ubuntu, Linux Mint, Red Hat Enterprise Linux, and Kali Linux. Ubuntu and Red Hat are used in enterprise operations such as email servers, file sharing, and other network resources. Linux Mint is a common desktop distro, is a common distro to revive older personal computers. Kali Linux is designed explicitly for security professionals. This distro provides penetration testers with the tools to conduct penetration tests and vulnerability assessments. The different distros show how the single Linux kernel can be utilized to create customized computing software to meet an organization's or user's needs.?
The History of Linux?
The Linux operating system was initially designed specifically for the Intel 80386 microprocessor (Torvalds & Diamond, 2002). The creator of the Linux operating system is the computer scientist Linus Torvalds. Mr. Torvalds released the first version of the Linux operating system in 1991 as a student at the University of Helsinki (Love, 2018). This first release of Linux did not have a graphical interface. The way that users would interact with this operating system would be through a command-line interface, CLI. This interface required that users pass commands with parameters to be able to execute the required software. Linux systems could not support a graphical user interface, GUI, until 1999 with the introduction of GNOME (Juell, 2021). Throughout the years, Linux has been increasing in popularity. This popularity is that as the world becomes more entangled with technology, the need for custom operating systems increases. The modularity and complete customization have allowed this operating system in governments and large corporations(Juell, 2021). The modularity and flexibility are why we see Linux on mobile phones, IoT devices, and even outer space. Linux is still very young, released thirty years ago, although its mass adoption has made this operating system the foundation for future technologies.
Design Principles?
Linux systems, as stated earlier, were initially designed to run on a specific PC architecture. Although, Linux is in many different types of computing systems. Linux runs microcomputers and is the only operating system found on supercomputers (Vaughan-Nichols, 2013). The success and broad usage of Linux raise several questions revolving around its operation. How is it possible for this system to be compatible with such varying types of computer systems? What are the advantages that Linux systems have over other operating systems? Exploring the design principles of Linux systems enables technology professionals to answer those questions, revealing the power of a Linux system over other operating systems.?
The parts of a Linux system
A Linux system consists of three main components, the kernel, system libraries, and system utilities (LOVE, 2018). As said earlier, the kernel is responsible for the interaction of hardware with application software. For software to be able to interact with the kernel, the system requires a set of libraries. The system libraries provide predefined functions that software can use to interact with the kernel. The system libraries allow for the separation of system functionality without exposing the entire kernel. The air gap created between these two components is ideal for security as it limits access to the kernel code. Concurrently, this operation method also increases performance as the whole kernel code does not need to be loaded to run a process. The third component of a Linux system is the system utilities. These utilities are specialized programs designed to manage the operating system. In a Linux environment, the name for these specialized utilities is daemons (Silberschatz et al., 2014). Those three main components of a Linux operating system work collaboratively to ensure that the system runs appropriately.
Kernel modules
One of the critical benefits of utilizing a Linux kernel is that it supports kernel modules. Kernel modules are bits of kernel code that can be loaded and unloaded arbitrarily by the kernel (Silberschatz et al., 2014). These modules run in the operating system's kernel mode and have full access to all computing resources. Kernel modules allow for the implementation of critical functionality in a system. Kernel modules can implement network protocols, a file system, or load specified device drivers.?
The kernel modules remove the need to develop a new kernel when updating operational needs or installing a new device driver. When software developers want to create a new device driver, they can test and compile only the new code rather than recompiling the original kernel code. Once the driver is verified to work with the existing kernel, it can be used as a kernel module so that other users can utilize it on their systems.?
As the number of modules compatible with the Linux kernel increases, so does the need to manage these software pieces. The kernel manages the modules by dividing the requests into different sections. The first section is the module-management system. The module-management system allows the modules to be loaded into the device's memory and communicate with the kernel. Module loader and unloader work closely with the module-management system to load a kernel module into memory. The driver registration system informs the kernel that there is a new driver available to the system. The final component is the conflict resolution mechanism. This mechanism allows kernel modules to reserve hardware resources to avoid accidental concurrent use.
Process Management
Computer programs are pieces of code that will perform a function when executed by the operating system. High-level programming languages like Python or C++ create these programs. A program's source code is the file that contains the human-readable form of the code that programmers create. The name that technology professionals use to describe a program that is undergoing execution is a process. Processes allow for the execution of commands and programs that the user requests. When the operating system reads the source code, it converts it from the human-readable format, Python syntax, to computer-readable format, binary. Once interpreted, it can conduct the functions laid out by the programmer. Processes can manipulate computer resources, including memory, to load and execute the process's code.
Processes can also internally split how they handle the execution of the process by running different threads. A thread is a slice of the process that allows the operating system to run concurrently to run more efficiently. The expected behavior of a word processor lends itself to be a great example of processes and threads. When a user runs the word processor program, the process executes the word processor code keeping the program running. A thread captures the user's keyboard strokes, and a second thread presents those keystrokes onto the display. Splitting up processes into threads allows for much more functionality and efficiency in applications.??
Linux process models
An operating system's role in processes management is to ensure that processes run without fault and create or execute processes efficiently. The method used to achieve this goal varies between different operating systems. In the Linux operating system, two main system calls allow the system to achieve these requirements. The first system call is fork(). The fork() system calls branches off of an existing process. When this system call is invoked, the content of the parent process is transferred to the child process. When the exec() system call is invoked, the operating system will run the provided program. These two system calls are the foundation of how the Linux kernel manages processes.?
Process Identity
The Linux operating system will use multiple attributes to organize and adequately maintain processes. One major factor that is used to manage processes is process identity. Process identity encompasses multiple different factors that are used to identify and are unique to each process. Each running process in the system receives a unique process ID. This ID allows the operating system to know which application is requesting access to which resources. The following attribute is the credentials for the process. The credential provides the operating system with the set of privileges allowed for the user running the process. This prevents users without the appropriate privileges from running sensitive processes. The Linux systems also record a process's personality. The process personality is a set of system calls that are compatible with the running process. Finally, the process's namespace is the location in the file system that the program is stored(Silberschatz et al., 2014). These attributes in a process are critical for the management of different processes on the computing system.?
Process Environment and Process context
Processes, regardless of the operating system, all require a process environment and context. The process environment requires two resources. The first resource being the argument vectors. The argument vector is a list of commands that can invoke the process. The second resource is the environment vectors. Environment vectors list key-value pairs that provide the environment variables' values for the running process.?
Process context is the current state of the process. A process is constantly changing and contains six different parts. The first part being the scheduling context. The system's scheduler requires information provided by the scheduling context to stop and restart a process. "This information includes saved copies of all the process's registers." (Silberschatz et al., 2014). Accounting is the part of the process context that allows the operating system to keep track of the resources currently being used and the resources that the process has used throughout its lifetime. The file table lists pointers that the system uses to identify and pull programs from storage to be executed. The file system context is information about the process, including its root directory, working directory, and namespace. A signal-handle table is a reference tool for the system to use whenever an external event occurs. This table provides the specific actions that the system needs to respond to when an action occurs. Finally, the virtual memory context is the contents of the private address space used by the operating system.?
Scheduling?
Scheduling is how operating systems can coordinate when different processes and threads run to ensure that they execute in the most efficient way possible. CPUs with limited processing power are responsible for executing processes and threads. In older operating systems, significant processes often create a bottleneck when waiting for a significant computation or user input. The bottleneck prevents other processes from using valuable computing resources. Scheduling combats this issue by offloading the process while waiting for feedback from the user or other processes.?
Process scheduling?
Linux systems use two types of process scheduling methods. A time-sharing algorithm allocates a specific amount of time that each process will run. The amount of time that is given to each process is called a slice. The processes may only run for the amount of time given within their slice. This method ensures that all processes are executing and a single process is not hoarding the resources. This method seems to be the fairest method of scheduling processes. However, this method does not perform well when there are many processes in the queue. There are two ways to accomplish real-time scheduling in Linux. The first method is with a first-come, first-serve scheduling algorithm, and the second method being round-robin scheduling. The FCFS method will execute the processes with the highest priority then execute the queue processes waiting for the most amount of time. The round-robin method will execute the same way, although it will move to the following process if it exceeds its internal slice. The process that exceeds its slice goes to the end of the queue. These are the two main methods that Linux systems use to schedule processes to ensure they execute efficiently.?
Kernel Synchronization
The previous methods explained how the operating system handles user processes. In modern operating systems, the kernel also requires executing processes as well. The way that it schedules these processes are different from the ways that they manage user processes. A system call or an error such as a page fault will trigger the process to run in kernel mode. Kernel-mode allows the process to run in a different space with elevated privileges than user processes (Silberschatz et al., 2014). The kernel can also receive a signal from external drivers requesting a process to run in kernel mode. Technology professionals found that there may be race conditions where two processes are trying to access the same resources simultaneously. To combat this issue, technology professionals use spinlocks and semaphores to lock in the kernel. Spinlocks lock the kernel for a short period to prevent race conditions.?
Symmetric multiprocessing
The Linux 2.0 kernel was the first to support symmetric multiprocessing, SMP. SMP allows for multiple threads to run in parallel across multiple processors. This initial implementation limited kernel code to be processed by one processor at a time. Later versions of the kernel implemented what technology professionals call a spinlock or Big Kernel Lock. These locks allowed for the kernel to be able to process its code across multiple processors. Unfortunately, this method was not easily scalable and did not provide the granularity needed for efficient multiprocessing. Modern Linux kernels use spinlock at greater detail, processor affinity, and load balancing to run efficiently. As said earlier, Locking allows the kernel to lock the CPU to ensure that no changes are made to a resource to prevent race conditions. The operating system will use processor affinity to assign specific processes to a single or group of cores. Processor affinity assigns high-priority processes to a processor and ensures that it is executed. Load balancing ensures that the processes in the queue are distributed equally across all processors. Load balancing algorithms ensure that there are no bottlenecks and improve efficiency.??
Memory Management
Memory management is an operating system feature that controls primary memory. The operating system transfers processes between the main memory and storage disk during execution. The operating system accomplishes this by tracking the status of all of the memory locations. Memory locations are identified and marked as in use or available. The operating system's memory management function is also responsible for determining the amount of memory allocated to a process.?
Memory management components in Linux
Linux divides physical memory into four separate zones. These zones are ZONE_DMA, ZONE_DMA32, ZONE_NORMAL, and ZONE_HIGHMEM. Each of these listed zones corresponds with a specific type of system architecture. Depending on the system's architecture, it can access different amounts of memory at a time. Some architectures can access the first 16 MB of memory, while others can only access 4 MB. The amount of memory that can be accessed based on the system's architecture will determine its zone. The physical memory that is less than 16 MB will fall into ZONE_DMA. ZONE_NORMAL is for memory between 16 MB and 896 MB. The ZONE_HIGHMEM is for anything over 896 MB.?
The Linux kernel uses a page allocator to manage the physical memory in a computer system. The responsibility of the page allocator is to allocate and free up pages for the zone. The allocator uses the buddy system to track the available memory pages. Adjacent units of allocatable memory paired in the buddy system. There is an adjacent buddy with each allocatable memory space. When two paired spaces become accessible, they are combined to create a larger area known as a buddy heap. This heap is paired with a second heap, and when both are free, they are combined to create an even larger space. When the operating system cannot allocate a small amount of memory, it divides larger heaps to accommodate the request.?
The Linux kernel's memory allocations are done either statically by drivers who allocate a contiguous region of memory at machine startup or dynamically by the page allocator (Silberschatz et al., 2014). On the other hand, Kernel functions are not allowed to use the simple allocator to reserve memory. Many components of the Linux operating system need entire pages to be allocated on demand, but smaller memory blocks are frequently needed. The kernel provides an extra allocator for arbitrary-sized requests, where the size of the request is unspecified in advance and can be as small as a few bytes. The Linux kmalloc() system call allocates whole physical pages on demand and breaks them into smaller memory pieces (Silberschatz et al., 2014).?
Slab allocation is another method used by Linux for allocating kernel memory. Multiple contiguous pages combined create a slab and are used to allocate memory for kernel data structures. One or multiple slabs make up a cache (Silberschatz et al., 2014). Each kernel data structure has its own cache. Objects that are instantiations of the kernel data structure that the cache represents are filled into each cache. The slab-allocation algorithm uses caches to store kernel objects (Silberschatz et al., 2014). Several objects are allocated to a cache when it is created. The size of the related slab determines the number of objects in the cache.
Virtual Memory
The Linux virtual memory system is in charge of ensuring that each process has enough address space to operate. It manages loading pages from disk, swaps them back out to disk as required, and creates virtual memory pages on demand. The virtual memory manager in Linux maintains two views of a process's address space: a collection of distinct regions and a set of pages. The address space's logical view describes the configuration of the address space according to instructions obtained by the virtual memory system. The addressable space is divided into non-overlapping regions in this view, with each region reflecting a continuous, page-aligned subset of the address space. Each region has a single virtual memory area structure that specifies the zone's resources, such as reading, writing, and executing permissions in the region and information regarding any files associated with the region (LOVE, 2018). This behavior allows for a quick lookup of the areas corresponding to any virtual address, the regions for each address space are connected into a balanced binary tree. The kernel also maintains a physical view of each address space. This view saves the process's hardware page tables. The page table entries specify the exact location of each virtual memory page at any given time. In the address-space definition, each virtual memory area has a field pointing to a table of functions that implements the main page management features for each virtual memory location.
File System and I/O
The Linux file system is modeled from the UNIX file-system architecture. In this file system, files do not have to be stored as an object or be retrieved from a remote file server. Linux files can be anything that can handle input and output data. Device drivers, interprocess contact networks, and network links will all appear to the user as files. The Linux kernel manages all of these file types by enclosing each file type's implementation specifics behind a software layer known as the virtual file system, VFS.
Virtual File system
The Linux VFS implements object-oriented design principles. It consists of two parts. The first part is a collection of descriptions that define the appearance of file-system objects—the second being a software layer that manipulates the objects. The Linux VFS has four main object types inode, fil, superblock, and dentry objects. Inode files are individual files, fil objects are open files, superblock objects are a whole file system, and dentry objects are individual directories.?
The VFS identifies a set of functions for any of these four object types. The VFS contains a pointer that references a function table, which provides the corresponding object type functions. The address of the individual functions that execute the given operations for that object is in the function table. The VFS software layer will execute an operation on one of the file-system objects by calling the appropriate function from the object's function table without knowing the object's type (Love, 2018). If an inode represents a networked file, a disk file, a network socket, or a directory file, it is unimportant to the VFS.
Ext3 File System?
Linux's standard on-disk file system is known as ext3. Originally the Linux file system was designed as a Minix-compatible file system. Although, it was heavily constrained by 14-character file-name restrictions and had a maximum capacity of 64 MB. The expanded file system, extfs, replaced the Minix file system due to those previous constraints. The second extended file system was created after a redesign to increase stability and scalability and include a few missing functions. This upgrade was called ext2. Further upgrades to the file system allowed for journaling and made use of a new framework called ext3. Ext3 was then improved with advanced file-system features, and this file system is called ext4 (Silberschatz et al., 2014). The advantages that the ext3 file system brought to Linux increased much of the kernel's functionality and usability. This version of the file system was much faster than the previous versions. This version provided journaling, online file system growth, and indexing for more extensive directories, along with increased performance.
Journaling?
Journaling is a common feature of the ext3 file system, in which changes to the file system are written sequentially to a journal. The records that are kept within a journal are the transactions. A transaction is a series of operations that performs a particular purpose and is not deemed committed until it is written to the journal (Silberschatz et al., 2014). The transaction's journal entries are replayed through the file system frameworks, while a pointer is modified when the changes are made to show which actions have been completed and which are still unfinished. Once the transaction has fully completed its steps, it will be deleted from the journal. Transactions will persist even if the computer system crashes. Once the system recovers, those transactions that were not completed must finish their steps.?
Input/Output
The I/O mechanism in Linux is very similar to that of a UNIX system. The drivers for these I/O devices are stored as files. This behavior allows users to access these files and changes the configurations by editing the related files. The system's administrator can protect these sensitive files by utilizing built-in file permissions. Block devices, character devices, and network devices are the three types of devices that Linux systems utilize. Hard disks and floppy disks, CD-ROMs and Blu-ray discs, and flash memory are all examples of block devices that provide random access to fully isolated, fixed-sized blocks of data. Most other devices, such as mice and keyboards, are considered character devices. Block devices allow for random access to these devices. Unlike block devices, character devices are accessed sequentially (Silberschatz et al., 2014). Block and character systems are not treated the same way as network devices. Users cannot transmit data to network computers directly. Instead, they would link to the kernel's networking subsystem and talk to those devices.?
Security
The Linux Security model
The Linux security model is built on the Unix security model, and it is divided into two sections authentication and access control. Authentication ensures that no one can use the device without first proving that they have access privileges to the system. Authentication in Linux is achieved by using a username and password. Linux stories the login credentials in a publically accessible file. Passwords are hashed and salted, which means that the password text is encoded to obscure the password. This method makes it very difficult for attackers to decrypt the password, and the only way to find the password is by trial and error. This method, unfortunately, is not enough to keep attackers from cracking passwords. In order to add a layer of authentication, UNIX providers have created a new protection framework to fix authentication issues. The pluggable authentication modules (PAM) framework is based on a shared library that can be used to authenticate users by any system feature. Linux systems have a variant of this framework available into the system.?
Authentication modules can be loaded on demand using PAM, which is managed by a system-wide configuration file. If a new authentication feature is introduced later, it can be added to the configuration file, and all device components will be able to use it right away.
Access control provides a system for determining whether a person has the authority to access a particular object and blocks access to objects when necessary. In order to track which users have the appropriate privileges to access resources, Linux uses identifiers. A user identifier (UID) is a unique number that distinguishes a particular user or a group of access rights. A group identifier (GID) is an additional identifier used to differentiate between privileges that belong to different users. The system will reference the user or group ID to check if the user has the appropriate privileges.
Conclusion?
The Linux operating system is free and open-source. The Linux kernel is an excellent piece of software that is able to understand the function of operating systems. Understanding how the kernel manages different aspects of the computing devices can increase the understanding of the need for operating systems. In a Linux system, all of the devices and configurations can be modified through files. Manipulating and editing files allows for users to learn and customize this operating system to fit their specific needs. The Linux operating system is steadily increasing in popularity, and understanding these concepts today will drive the innovation of tomorrow's technology.?
References
Etherington, D. (2013, August 7).?Android Nears 80% Market Share In Global Smartphone Shipments, As iOS And BlackBerry Share Slides, Per IDC. TechCrunch. https://techcrunch.com/2013/08/07/android-nears-80-market-share-in-global-smartphone-shipments-as-ios-and-blackberry-share-slides-per-idc/.
Juell, K. (2021, March 11).?History of Linux. DigitalOcean. https://www.digitalocean.com/community/tutorials/brief-history-of-linux.
LOVE, R. O. B. E. R. T. (2018).?Linux Kernel Development. PEARSON EDUCATION INDIA.
Neuman, S., & Romo, V. (2021, February 18).?As Texans Recover Power, 'It's Life Or Death' For Many Bracing For More Frigid Temps. NPR. https://www.npr.org/sections/live-updates-winter-storms-2021/2021/02/18/968973671/its-life-and-death-texans-still-without-power-as-nation-faces-more-winter-storms.
Silberschatz, A., Galvin, P. B., & Gagne, G. (2014).?Operating Systems Concepts. Wiley.
Torvalds, L., & Diamond, D. (2002).?Just for fun: the story of an accidental revolutionary. Harper Collins.
Vaughan-Nichols, S. J. (2013, June 18).?Linux continues to rule supercomputers. ZDNet. https://www.zdnet.com/article/linux-continues-to-rule-supercomputers/.
Vaughan-Nichols, S. J. (2020, May 6).?Most popular operating systems of 2020: The more things change...?ZDNet. https://www.zdnet.com/article/whats-2020s-most-popular-operating-systems/.?