What are datatypes in C/C++ and what does typecasting even mean ?
Abdul Hameed Oluwashegu Tade
System Software Engineer @ Turntabl | Computer Engineering Degree
What are datatypes??
From a hardware view of the computer, data is just a contiguous array of bytes in memory. These bytes from the hardware perspective have no meaning or interpretation apart from just being data. To be used by high level software components to solve real life problems we have to find a way to make sense of these contiguous bytes of data. The structured interpretation of data comes from high level software but from the view of the hardware bytes starting from any address is no different from bytes loaded from some different address else where. Therefore datatypes per my understanding are structured entities and ways that high level software interpret binary data. Next we discuss the classification of datatypes.
Classification of datatypes
There three main classifications of data types namely
1 int main(int argc,char* argv[])
2 {
3 if(argc < 2)
4 {
5 puts("Not enough arguments");
6 exit(-1);
7 }
8
9 size_t size = atoi(argv[1]);
10 char* buff = malloc( size );
11
12 memset(buff,-1,size);
13 free(buff);
14 return 0;
15
16 }
17
18
We know that pointers hold addresses for other variables of a given datatype. For the compiler to perform an operation on a pointer it has to know the type of data the pointer is referencing. So what we are essentially trying to say is that the address value stored in the pointer makes sense only if it is associated with a datatype that helps the compiler know how to perform operations on pointers of that type. void pointer is a pointer that is not associated with any type. It just happens to point to a certain byte or bytes in memory, cannot be used directly by the compiler.
3. User defined - User defined because they are like containers that allows users to fill data. Enums are enumerated datatypes and Structures and Unions are compound datatypes.
What is typecasting then ?
Typecasting is changing the view that the compiler has about some bytes of data in memory. Type casting changes the interpretation of data from one view to another. Take the code in the snippet below where I allocate a block of memory and then cast that block to a pointer of different type.
1 #include <stdio.h
2 #include <string.h>
3 #include <stdlib.h>
4
5
6 #define MAX_LEN 50
7
8
9 typedef struct _Sum2Str
10 {
11? double x,y;
12? char* z;
13 } Sum2Str;
14
15
16 int main(int argc, char* argv[])
17 {
18? ? if(argc < 3)
19? ? {
20? ? ? ? fprintf(stderr,"Not enough arguments\n");
21 exit(-1);
22? ? }
23
24
25? ? Sum2Str s2s;
26? ? s2s.x = atof(argv[1]);
27? ? s2s.y = atof(argv[2]);
28
29
30? ? void *block = malloc(MAX_LEN);
31? ? char* str = (char*)block;
32
33
34? ? snprintf(str,MAX_LEN,"%lf",s2s.x + s2s.y);
35
36
37? ? s2s.z = str;
38
39
40? ? printf("Sum is %s\n",s2s.z);
41 free(block);
42 }
43
44
45
In the example above, we declare a struct called Sum2Str. It takes two double precision floating point values and a pointer to a character array. The way program works is that you supply the two double precision floating point values as command line arguments, cast them to type double, adds them and convert the sum to a string, storing the pointer to the string into the z-field of the Sum2Str struct. At line 30 we realize that we are allocating MAX_LEN bytes of memory and storing the pointer to the memory block in the variable "block". We actually can't use the void pointer to the memory block to do anything. If we want to write data to it, we have to specify the type of data that will go into the allocated memory block which will enable the compiler know how interpret data written to this memory block in case we want to use it. So typecasting essentially changes the interpretation of a stream of bytes in memory from the compilers perspective but down to the hardware stream of bytes are still seen as streams nothing else.
Disclaimer
I am not an expert. This is my understanding of types. I am open to corrections and suggestions. Thank you.
Edit
A function pointer is a special type of pointer. But what is a function pointer ? A function pointer holds the address to the location in memory of the first instruction byte of your function body. When a function is compiled, The compiler generates machine code and stores at a given offset on disk corresponding to the compiled file. During the linking process, the linker maintains a sort of symbol table that maps the function names to the corresponding libraries from which the function belongs. On Windows it's called the import directory table (IDT) , and on Linux it's called the Procedure Linkage Table(PLT). When the linker is generating the final compiled executable, it generates a table which maps the function symbols to actual addresses. This table is called the Global offset Table (GOT) on Linux and Import Address Table (IAT) on Windows so that at runtime, the OS loader would parse the compiled binary, extract the library name, resolve the imported function addresses and put the corresponding entry into the Symbol to Address mapping table (IAT on win32 and GOT on Linux). The above scenario only applies for dynamic linking . But the question is if I have a pointer to some memory location can I cast it to a function pointer ? The answer is yes you can. But you have make sure you satisfy the following condition.
Take an example like the one below
1 #include <stdio.h>
2
3
4 char code[] = "Some data expected to be program instructions";
5 typedef void (*add_double)(double x,double y);
6
7
8 int main()
9 {
10? ? add_double add_two;
11? ? add_two = (void*)code;
12
13
14? ? add_two(123.8,24.5);
15? ? return 0;
16 }
17
18
19
20
When we compile with gcc 9.3.0 on Ubuntu-18.04 we get this as output
领英推荐
[1]? ? 212 segmentation fault (core dumped)? ./a.out
But what is happening here ? Basically what is happening is that, the string constant referenced by the variable "code" is stored in a section of its own namely the (read only data section) .rodata section of the binary file. When run the OS loader loads these data into the segment and marks the memory page at which the segment is loaded to as READONLY, so that when we cast these byte block to a function pointer the CPU expects to find program instructions there. The CPU then tries to execute code from a page which is marked as not executable then boom !!! segmentation fault. This is what I have on functions. As always I am no expert. I am open to correction.
Is it possible to make the above program work ? Yes. Here is how.
To make above code work, we have to do two things namely
I tried it on both Linux and Windows. It worked on windows but did not work on Linux for some reason for which I will explain in a later post.
I compiled the function shown below into shellcode then added the compiled shellcode to the test program.
I got the shellcode below after compilation. Shell code is in the white box. Each byte value is something we call an opcode byte. It is an instruction that the CPU directly understands. The corresponding high level assembler instructions is shown to the right.
Now lets implement this on windows and make our shellcode work. Here is my windows test example.
I created an array of unsigned chars and initialized them with the shellcode data. These are the shellcode bytes that I obtained from compiling the add_double function above. That will be our instruction bytes.
In the program, during execution I used the Windows API function VirtualProtect to change the permission of the region in which the shellcode data resides which is the .rdata section, from readonly page permission to read, write, execute permissions. That way when we cast these bytes to a function pointer, and execute it, it will run successfully. When the example above is compiled and run I get this
So it works. Now Lets try the test program on Linux.
To change permissions on Linux we use the mprotect function. If it succeeds it returns 0 if it fails it returns -1. And I added a print statement to print out the string "[-] Unable to change page permissions" in case of a failure and exit the program. When we compile and run it, we should get something like this
You can clearly see from the message that mprotect failed. I will explain later in another post why it did fail. Till then, thanks for reading this article. Like, comment and connect with me on LinkedIn.
In C, a function is not a type - although you can construct a function pointer type.