?? Recreate printf() in C
Paul Stuart
ServiceNow | CIS-CSA | CIS-ITSM | CIS-HRSD | ITIL-4 | Integrations | JavaScript |
This article will be a full break down of a project I undertook at the 42 Adelaide coding school. It is a re-build of the printf() function, which is under the standard input/output library in the C programming language. The purpose of doing such a task is to really learn two key things; firstly how memory is handled at a low level in programming, and secondly how to handle/manipulate data types, what ones are useful in specific contexts and why. I am also a firm believer that if you can teach it, then you know it. It's the best way to learn in my opinion, which is why I like to share.
To view the full source code, have a look at my 42-Adelaide repository on GitHub with printf() and other projects being added regularly: https://github.com/codingPaulStuart/42-Adelaide-Core
This is the standard printf() function below, comes from the C library. To recreate it, it has to follow some standards and rules to get the same result.
#include <stdio.h>
int main()
{
printf("Hello Print F, let's make this from scratch!");
return 0;
}
?? Must have the following capabilities:
?? High Level View
The image above shows the entry point to the program and how components are modularized, essentially unless there is no format specifier parsed into the printf() function, it will simply output the string. Where the variations occur are when a format specifier is given (u, d, p, etc.) the ft_type function acts as a switchboard to distribute the parameter to the correct function based on the specifier. See table below for reference on the types of formatting that can be used in the printf() function.
To further illustrate how the functions and files interact, the image below is a sequence diagram for a simple use case, printing '12' to the standard output. From the first actor to the left, the parameter is parsed through a number of functions. Notice there is a lot of recursive function calling, this is because often there will be a need to get the character down to a single digit so it can be printed 1 by 1, and also tracked by a counter, which is passed through all the functions as a pointer.
To look at the printf function in a simple way, if there is no format specifier, then simply output the characters. Otherwise, based on the specifier, call the relevant function to handle the different use cases. See table below for function breakdown:
Below is the entry point for printf and you can see the conditional checks and calling to other important utility functions in the program.
The ft_type function is the main switchboard checking the format specifier and calling the relevant function to handle that use case. Any printf usage with the %u, %p, %i etc will go through this switchboard.
?? Strings
For this function it is one of the more simple cases as it has no format specifier, using a utility function (ft_putchar), which will feature extensively throughout the printf function, calls the write function from the standard library, prints a character to the standard output (1) and assigns 1 byte.
?? Integers?
First thing to explain looking at the number length function is what size_t data type is. The?size_t?type is an unsigned integer type that is used to represent sizes and is guaranteed to be big enough to contain the size of the largest possible object on the system. It is used here because the length of a number cannot be negative, and it is a standard type for sizes and counts in C.
The number length function is used to calculate the number of characters that would be needed to represent the integer as a string. The size_t data type is needed here because we must account for every possible size and range. Also notice the division of 10 is important so you can determine how many digits the number has. Each iteration of the loop reduces the number by one decimal place.
The ft_putnbr_fd takes the number being printed, and the file descriptor of 1, which is the standard output. This function is part of another custom library built in the 42 school as a project. You can view the full libft library on my GitHub repository as well.
The putnum file descriptor from libft library is used for handling edge cases. Represented by the validation check for the value =?-2147483648, which is the minimum value for a 32-bit signed int. This is an important check in the first if statement, because the absolute value of –2147483648 cannot be represented in a 32-bit signed integer, it can only be printed as a string representation.
A continual theme that runs through all the printf functions I have remade involves the use of recursion, meaning the function will actually call itself. This is often needed because to output the characters they have to be done one by one, which means breaking them down continuously until there is only one character.
You can see this on two occasions in the function, for the second recursive call towards the bottom, If the number is not negative and is 10 or greater, the function first recursively calls itself by dividing by 10, effectively printing all but the last digit. Then calculates the last digit by taking the number, dividing by 10, and converting it to ASCII character.
领英推荐
Remember any single digit + '0' is really adding 48, which will give the ASCII value.
?? Pointers
Three functions are used for outputting pointers. We need to calculate the length of the string, set the prefix of '0x' before the hex representation, then finally output the hex representation of the pointer using recursion again, and also finally using the standard output with the write function.
Notice this time with the recursion we divide and get the difference of out input by 16, because of the base 16 hexadecimal values (0-9 and A-F).
Before exploring the functions, it's important to understand a few things about data types and why they are used for these following functions. Printing a pointer address requires a data type very large, capable of storing large integer values that are positive only. This is also to prevent OVERFLOW where the data type is incapable of representing the value, causing truncation and other unwanted results. Pointers often require the largest data type in the hierarchy, such as void*, uintptr_t, and long long. They will ALWAYS be unsigned as they are non-negative values.
Printp prints the '0x' prefix followed by the hexadecimal representation of a pointer. If the value is not 0, then putptr is used to output the hexadecimal digits of the pointer, incrementing the counter. This is done using the ptrlen function. Notice when the ft_putptr is invoked there is no need to type cast. This is because moving from a smaller data type to a larger one will implicitly type cast.??It happens automatically in compile time because the data type is being promoted up the data hierarchy, not down.
Putptr prints the hexadecimal representation of a pointer. Recursively calling itself dividing by 16 (base-16 hex). This is to get down to a digit under 16, then a single digit for output. The data type uintptr_t?is designed specifically for storing a data pointer, and it is used for pointer arithmetic and to hold memory addresses. The size range of a uintptr_t is also guaranteed to be sufficient to hold a pointer.
Print Length function calculates the length of the string that would represent a hexadecimal value of a pointer, and size_t used to represent the size of an object, it allows the code to be compatible across different platforms. This data type is important because we are working with hexadecimal digits, which represent binary data (0s, and 1’s). Again the division of '16' is used because that is the base-16 hexadecimal range.
? Unsigned Integers
This function is much more straightforward and requires less processing power as it is expecting a non-negative integer. Main difference with ft_printu and ft_putnbr_fd is that ft_printu does not have to check for negative values and handle edge cases. This shows the simplicity of ft_printu, and contrasts the complexity of ft_putnbr_fd which must handle negative values.
Again, notice the recursion methodology to get the ASCII value, it must be single digit, validation for greater than 9 means recursively calling printu. The difference after dividing by 10 is what is then parsed to ft_putchar, adding '0' (48) gives the ascii value.
? Hexadecimals
Firstly, what are hex values, why are they used? The hexadecimal numbers are used whenever the binary representation is important to the context. It allows us to see when one byte ends and one byte begins.?Hexadecimal is a more convenient and readable format for binary data, rather than a long list of 1’s and 0’s. A great example of this is when working with memory addresses and network addresses, hexadecimal data allows addresses to be separated into their components.
See example below for using hex values to represent the binary numbers of an IP address, it is referenced from the base-16 table to see how they are represented. The breakdown for calculating the hex value for the number '192', which is 'C0' can be explained through continual division by 2.
Binary numbers are calculated by dividing the number by 2 until you get 0. Reading the remainders in reverse gives the binary number, so ‘192’ is calculated by: 192/2 – 96 (remainder 0), 96/2 = 48, 48 ÷ 2 = 24 (remainder 0), 24 ÷ 2 = 12 (remainder 0), 12 ÷ 2 = 6 (remainder 0), 6 ÷ 2 = 3 (remainder 0), 3 ÷ 2 = 1 (remainder 1), 1 ÷ 2 = 0 (remainder 1). So in reverse, combine all the remainders (00000011) then reverse them = 11000000.
Print Hex Function is printing an unsigned int in hexadecimal format, and tracking the number of characters printed. If the value is not 0, the call the puthex function to print the hexadecimal digits of the number and increment the counter by the length of the numbers hexadecimal representation (hexlen function).
The hex length function calculates the length of the hexadecimal representation of an unsigned integer. Initializes size_t to count the number of hexadecimal digits. The while loop divides the value by 16 (base 16 hex) and increments total until value is 0. it then returns the total count of hexadecimal digits.
These are the key functions used in my recreation of printf in the C programming language. You can see that it is important to understand the nature and context of different data types in making these sorts of functions, and handle any edge-cases.
Thanks for taking the time to read, please share/comment if you found it useful! ??
Code writer & pancakes lover
12 个月Cool, clean and straightforward ????
Founder of '42' schools in Australia | social disrupter | multi award winner | Entrepreneurship Embassador Seaton High School
1 年Loved seeing you back on campus!