What are datatypes in C/C++ and what does typecasting even mean ?

No alt text provided for this image


What are datatypes??

From a hardware view of the computer, data is just a contiguous array of bytes in memory. These bytes from the hardware perspective have no meaning or interpretation apart from just being data. To be used by high level software components to solve real life problems we have to find a way to make sense of these contiguous bytes of data. The structured interpretation of data comes from high level software but from the view of the hardware bytes starting from any address is no different from bytes loaded from some different address else where. Therefore datatypes per my understanding are structured entities and ways that high level software interpret binary data. Next we discuss the classification of datatypes.

Classification of datatypes

There three main classifications of data types namely

  1. Primitive data types - The simplest datatypes that the C/C++ compiler supports. Most often than not primitive datatypes have varying degrees of hardware support. On the x86 and x64 architecture there are a host of registers for storing specific types of data e.g. Floating point values are stored in the MMX (64bit) or XMM (128 bit) registers and there are different instructions for manipulating floating point values in such registers. Examples of primitive datatypes are char, float, double, etc. shown in the picture. Only C++ supports wide characters, C does not. Wide characters are basically Unicode characters UTF-16 to be specific. void is the datatype that means nothing.
  2. Derived Datatypes - These are the datatypes that are not types by themselves directly but are derived from the primitive datatypes. They make sense as types when we associate them with other datatypes. For example take the code snippet below

1  int main(int argc,char* argv[])
2  { 
3      if(argc < 2)
4      {
5         puts("Not enough arguments");
6         exit(-1);
7      }
8
9       size_t size = atoi(argv[1]);
10      char* buff = malloc( size );
11      
12      memset(buff,-1,size);
13      free(buff);
14      return 0;
15
16  }
17
18              

We know that pointers hold addresses for other variables of a given datatype. For the compiler to perform an operation on a pointer it has to know the type of data the pointer is referencing. So what we are essentially trying to say is that the address value stored in the pointer makes sense only if it is associated with a datatype that helps the compiler know how to perform operations on pointers of that type. void pointer is a pointer that is not associated with any type. It just happens to point to a certain byte or bytes in memory, cannot be used directly by the compiler.

3. User defined - User defined because they are like containers that allows users to fill data. Enums are enumerated datatypes and Structures and Unions are compound datatypes.

What is typecasting then ?

Typecasting is changing the view that the compiler has about some bytes of data in memory. Type casting changes the interpretation of data from one view to another. Take the code in the snippet below where I allocate a block of memory and then cast that block to a pointer of different type.


1  #include <stdio.h
2  #include <string.h>
3  #include <stdlib.h>
4
5
6  #define MAX_LEN 50
7
8
9  typedef struct _Sum2Str
10 {
11?     double x,y;
12?     char* z;
13 } Sum2Str;
14
15
16 int main(int argc, char* argv[])
17 {
18? ? if(argc < 3)
19? ? {
20? ? ? ? fprintf(stderr,"Not enough arguments\n");
21        exit(-1);
22? ? }
23
24
25? ? Sum2Str s2s;
26? ? s2s.x = atof(argv[1]);
27? ? s2s.y = atof(argv[2]);
28
29
30? ? void *block = malloc(MAX_LEN);
31? ? char* str = (char*)block;
32
33
34? ? snprintf(str,MAX_LEN,"%lf",s2s.x + s2s.y);
35
36
37? ? s2s.z = str;
38
39
40? ? printf("Sum is %s\n",s2s.z);
41    free(block);
42 }
43
44
45        

In the example above, we declare a struct called Sum2Str. It takes two double precision floating point values and a pointer to a character array. The way program works is that you supply the two double precision floating point values as command line arguments, cast them to type double, adds them and convert the sum to a string, storing the pointer to the string into the z-field of the Sum2Str struct. At line 30 we realize that we are allocating MAX_LEN bytes of memory and storing the pointer to the memory block in the variable "block". We actually can't use the void pointer to the memory block to do anything. If we want to write data to it, we have to specify the type of data that will go into the allocated memory block which will enable the compiler know how interpret data written to this memory block in case we want to use it. So typecasting essentially changes the interpretation of a stream of bytes in memory from the compilers perspective but down to the hardware stream of bytes are still seen as streams nothing else.

Disclaimer

I am not an expert. This is my understanding of types. I am open to corrections and suggestions. Thank you.

Edit

A function pointer is a special type of pointer. But what is a function pointer ? A function pointer holds the address to the location in memory of the first instruction byte of your function body. When a function is compiled, The compiler generates machine code and stores at a given offset on disk corresponding to the compiled file. During the linking process, the linker maintains a sort of symbol table that maps the function names to the corresponding libraries from which the function belongs. On Windows it's called the import directory table (IDT) , and on Linux it's called the Procedure Linkage Table(PLT). When the linker is generating the final compiled executable, it generates a table which maps the function symbols to actual addresses. This table is called the Global offset Table (GOT) on Linux and Import Address Table (IAT) on Windows so that at runtime, the OS loader would parse the compiled binary, extract the library name, resolve the imported function addresses and put the corresponding entry into the Symbol to Address mapping table (IAT on win32 and GOT on Linux). The above scenario only applies for dynamic linking . But the question is if I have a pointer to some memory location can I cast it to a function pointer ? The answer is yes you can. But you have make sure you satisfy the following condition.

  1. The bytes you are casting to a function pointer should contain program instructions not just data bytes else trying to cause the processor to decode instruction bytes it doesn't understand would lead to a general protection fault which will crash your program.
  2. Also the memory page that the bytes you are trying to cast belongs to should be executable. Why ? because if you attempt to execute the function code at the given address in page that is marked as NX (Not executable), your program would receive a general protection fault.

Take an example like the one below

1  #include <stdio.h>
2 
3 
4  char code[] = "Some data expected to be program instructions";
5  typedef void (*add_double)(double x,double y);
6 
7 
8  int main()
9  {
10? ? add_double add_two;
11? ? add_two = (void*)code;
12
13
14? ? add_two(123.8,24.5);
15? ? return 0;
16 }
17
18
19
20        

When we compile with gcc 9.3.0 on Ubuntu-18.04 we get this as output

[1]? ? 212 segmentation fault (core dumped)? ./a.out        

But what is happening here ? Basically what is happening is that, the string constant referenced by the variable "code" is stored in a section of its own namely the (read only data section) .rodata section of the binary file. When run the OS loader loads these data into the segment and marks the memory page at which the segment is loaded to as READONLY, so that when we cast these byte block to a function pointer the CPU expects to find program instructions there. The CPU then tries to execute code from a page which is marked as not executable then boom !!! segmentation fault. This is what I have on functions. As always I am no expert. I am open to correction.

Is it possible to make the above program work ? Yes. Here is how.

To make above code work, we have to do two things namely

  • change the permissions on the page where "code" is located as executable.
  • Change the raw data bytes to actual instruction bytes.

I tried it on both Linux and Windows. It worked on windows but did not work on Linux for some reason for which I will explain in a later post.

I compiled the function shown below into shellcode then added the compiled shellcode to the test program.

No alt text provided for this image

I got the shellcode below after compilation. Shell code is in the white box. Each byte value is something we call an opcode byte. It is an instruction that the CPU directly understands. The corresponding high level assembler instructions is shown to the right.

No alt text provided for this image

Now lets implement this on windows and make our shellcode work. Here is my windows test example.

No alt text provided for this image

I created an array of unsigned chars and initialized them with the shellcode data. These are the shellcode bytes that I obtained from compiling the add_double function above. That will be our instruction bytes.

In the program, during execution I used the Windows API function VirtualProtect to change the permission of the region in which the shellcode data resides which is the .rdata section, from readonly page permission to read, write, execute permissions. That way when we cast these bytes to a function pointer, and execute it, it will run successfully. When the example above is compiled and run I get this

No alt text provided for this image

So it works. Now Lets try the test program on Linux.

No alt text provided for this image

To change permissions on Linux we use the mprotect function. If it succeeds it returns 0 if it fails it returns -1. And I added a print statement to print out the string "[-] Unable to change page permissions" in case of a failure and exit the program. When we compile and run it, we should get something like this

No alt text provided for this image

You can clearly see from the message that mprotect failed. I will explain later in another post why it did fail. Till then, thanks for reading this article. Like, comment and connect with me on LinkedIn.




In C, a function is not a type - although you can construct a function pointer type.

回复

要查看或添加评论,请登录

Abdul Hameed Oluwashegu Tade的更多文章

  • Bypassing encapsulation in C++

    Bypassing encapsulation in C++

    One of the most popular programming paradigms in existence is the Object-Oriented Programming. It is a programming…

    12 条评论
  • Function Cache in C++

    Function Cache in C++

    Sometimes, certain functions that perform very expensive computation, you might want to perform the computations only…

    3 条评论
  • Malware Filesystem Redesigned

    Malware Filesystem Redesigned

    Malware File System a redesign File storage is one of the operating system's basic but crucial functions. But an…

  • Memory Leaks "the hell of dynamic memory allocation"

    Memory Leaks "the hell of dynamic memory allocation"

    The computers that we use now a days fall under two general classes of architectures. The Harvard architecture and the…

  • What are opaque types in C ? How do you use them ?

    What are opaque types in C ? How do you use them ?

    Sometimes when working with C, you might want to perform some data abstraction. There is an easy technique to do this…

    5 条评论

社区洞察

其他会员也浏览了