File which is not text file is considered to be as the binary file in the linux. As these file are binary you cant look inside these files but their are the command from which you can explore the information inside binary file which we are going to explore in these newsletter.
GDB
GNU Debugger (GDB) is not only good to debug buggy applications. It can also be used to learn about a program's control flow, change a program's control flow, and modify the code, registers, and data structures
Objdump From GNU Binutils
- Simple gnu tools for disassembly binary code.But when? it deals with hostile software objdump has its? own limitations.
- ?It’s primary weakness is that it depends on the elf section header and does not have a control flow analysis that reduces its robustness.
- Cannot disassemble a binary which does not have a section header.?
- View all data/code in every section of an ELF file:
- Objdump? -D <elf_object>
- objdump -d <elf_object> (view program code in an ELF)
- Objdump -tT <elf_object>(view all symbol)
Objcopy From GNU Binutils
- Objcopy can be used to copy and modify? the elf section
- To copy the .data section from an ELF object to a file, use this line:
- objcopy –only-section=.data? <infile> <outfile>
Strace
- Show information about the system call (also known as syscalls) activity in a running program as well as signals that are caught during execution
- System call trace (strace) is a tool that is based on the ptrace(2) system call, and it utilizes the PTRACE_SYSCALL request in a loop
- The strace command to see the basic command for tracing is?
- strace /bin/ls -o <outfile>
- The strace command used to attach to an existing process is as follows:
- strace? -p <pid> -o daomen.out
- The initial output will show you the file descriptor number of each system call that takes a file descriptor as an argument, such as this:
- SYS_read(3, buf, sizeof(buf));
Itrace
- It is also known as the library trace , its working is proportional to strace.
- It actually parse the shared library linking information used by the? program and print? the library function being used.
- Syntax and flag are similar to strace
Readelf
- It is the most powerful tool for analysing the binaries.
- It provides every penny information of the data required for doing reverse engineering.
- ?To retrieve a section header table:
- readelf -S <elf>
- ?To retrieve a program header table:
- readelf -l <elf>
- ?To retrieve a symbol table:
- readelf -s <elf>
- ?To retrieve the ELF file header data:
- readelf? -e <object>
- ?To retrieve relocation entries:
- readelf? -r <object>
- To retrieve? a dynamic entries:
- readelf? -d <elf>
- To see the header of elf? a :
- readelf -h <elf>
_______________________________________________________________________
ELF Binary Format
- In order to ? reverse engineer linux binaries ,you must have a deep understanding of the format of binaries.
- ELF has become the standard format for the Unix, and Unix flavour oSes.
- In linux ,BSD variants, and other OSes,the elf file is used for the,executables,shared libraries,object file,kernel boot image,core dump files.
- We will explore the following point for knowing ELF binary format?
- ELF file Types
- Program Header
- Section Header
- Symbols
- Relocation
- Dynamic Linking
- Coding An ElF Parser
ELF File Type
- ET_NONE: This is an unknown type. It indicates that the file type is unknown, or has not yet been defined.
- ET_RET:This is the relocatable file or we can say an object file that has not been linked with the executable file.
- ET_EXEC:This is an elf executable file.These are the entry point for how the process is running.
- ET_DYN:This is a shared object.File is marked as a dynamically linkable object file known as shared libraries.?
- ET_CORE: This is an ELF type core that marks a core file. A core file is a dump of a full process image during the time of a program crash or when the process has delivered an SIGSEGV signal (segmentation violation). GDB can read these files and aid in debugging to determine what caused the program to crash
ELF Program Headers
- They are necessary for the program loading and they describe segment in the binary
- Segment are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to? memory
- With the offset found in e_phoff(program header table offset) program header table are accessed
- There are five common? program header type that are:
- PT_LOAD
- PT_DYNAMIC
- PT_NOTE
- PT_INTERP
- PT_PHDR
- PT_LOAD:
- This is used to describe? that the segment is going to be? loaded in memory.
- ELF executable for dynamic linking will contain following two segment
- The text segment for program code
- And the data segment for global variables and dynamic linking information.
- For more you can refer man page of elf in linux
- PT_DYNAMIC
- Is specific to the executable that are dynamically linked and contains information necessary for the dynamic linker(contain tagged value and pointer)
- It also contain :-
?List of shared libraries that are to be linked at runtime
The address/location of the Global offset table (GOT) discussed in the ELF Dynamic linking section.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tag Name ? Description(address of)
DT_HASH Symbol hash tabl
DT_STRTAB String Table
DT_SYMTAB Symbol Table
DT_REL Relocs Table
DT_RELASZ SIze in bytes so Relatable
DT_RELAENT Size in bytes of Rela Table entry
DT_STRSZ Size in bytes of string table
DT_SYMENT Size in bytes of symbol table entry
DT_INIT Address of the initialization function
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- PT_NOTE
- This? section is for the verification purpose of the vendor.
- PT_INTERP
- This small segment only contain only the location and size of the null terminating string
- PT_PHDR
- This segment contains the location and size of the program header table itself.
________________________________________________________________________?
ELF Section Header
- A section is not a segment.Within each segment their lies a code or data divided into the section.Segment are necessary for the program execution
- ?A section header table exists to reference the location and size of these sections and is primarily for linking and debugging purposes.Not necessary for program execution.This is because the program section header does not describe memory layout.
- ?readelf –l command will show which sections are mapped to which segments, which helps to visualize the relationship between sections and segments.
- y, every ELF object has sections, but not all ELF objects have section headers, primarily when someone has deliberately removed the section header table, which is not the default
- Section headers are convenient to have for granular inspection over what parts or sections of an ELF object we are viewing.?
- The text segments will be as follows:
[.text]: This is the program cod
[.rodata]: This is read-only data?
[.hash]: This is the symbol hash table?
[.dynsym ]: This is the shared object symbol data
[.dynstr ]: This is the shared object symbol name?
[.plt]: This is the procedure linkage table?
[.rel.got]: This is the G.O.T relocation data
- The data segments will be as follows:
?
[.data]: These are the globally initialized variables
[.dynamic]: These are the dynamic linking structures and objects
[.got.plt]: This is the global offset table
?
[.bss]: These are the globally uninitialized variables?
- No program headers exist in relocatable objects (ELF files of type ET_REL) because .o files are meant to be linked into an executable, but not meant to be loaded directly into memory; therefore, readelf -l will yield no results on test.o. Linux loadable kernel modules are actually ET_REL objects and are an exception to the rule because they do get loaded directly into kernel memory and relocated on the fly.
ELF Relocations
- Relocation is the process of connecting symbolic references with symbolic definitions
- Relocations are literally a mechanism for binary patching and even hot-patching in memory when the dynamic linker is involved
- An implicit addend occurs when the relocation records are stored in ElfN_Rel type structures that don't contain an r_addend field and therefore the addend is stored in the relocation target itself
Relocatable Code Injection-Based Binary Patching
- Relocatable code injection is a technique that hackers, virus writers, or anyone who wants to modify the code in a binary may utilize as a way to relink a binary after it's already been compiled and linked into an executable
- ?there is an amazing tool called Eresi which is capable of relocatable code injection (aka ET_REL injection).?
- ?Quenya has many features and capabilities, and one of them is to inject object code into an executable. This can be very useful for patching a binary by hijacking a given function
- ?Quenya
- QUENYA
- Quenya has many features and capabilities, and one of them is to inject object code into an executable.?
- Let us pretend we are an attacker and we want to infect a 32-bit program that calls puts() to print Hello World. Our goal is to hijack puts() so that it calls evil_puts():
- What Is An ELF?
- The Executable and Linkable Format is a file format of executable files, libraries, object files in UNIX-like systems.
- ?It is a combination of tightly knit data structures.
- You can see ELF in the Linux manual ELF(5).
- Executable FIle
- ?A file which can be run by the Operating System. It is generated by linking one or more object files. It can also be using Dynamic Linking to access functions of other shared object files. The main difference between other ELF files and an Executable file is that this type of file has an entry-point
- ?Shared Object file
- libraries are present in the form of shared object files. These libraries are even called Shared libraries.
- By using ldd <elf> we can get a shared object file used by the program.
- They are called so because they can be shared among multiple processes.
- ?Object file
- Direct machine code equivalent of a C source file.
- ?It has just a little metadata(which is part of ELF) to keep the code organized
- It still is not part of an executable or a shared object.
- Object file is an intermediate file used by linker as an input for creating executable.
- Object file needs to be passed to a separate linking step to create the executable file.
???C source code (hello.c) ? ----------> | ? Preprocessor? ? | ----------> ? hello.i (Preprocessed C sourcefile)
?? hello.i? ? ----------> | ? ? Compiler? ? ? | ----------> ? hello.s (Assembly code)
? hello.s? ? ----------> | ? ? Assembler ? ? | ----------> ? hello.o (Object code)
???hello.o + Libraries? ? ----------> | ? ? ? Linker? ? ? | ----------> ? hello / a.out (Executable)?????
Generally, output files generated by Preprocessor, Compiler and Assembler are stored temporarily in /tmp directory which are deleted as soon as the executable is generated. But with -save-temps option, we will save those temporary files also, which will help in our analysis. There are 4 sub-processes, so 4 files are generated. code.i, code.s, code.o and code1. code1 is the final executable (.click to know more)
????????????????????????????????????????
?we need to pass the file name of the ELF to be parsed to the library. We can define a function like this.
Coding The ELF Parser?
- We need to pass the file name of the elf to be passed to the library.We can define function like
- char *filepath = "./a.out"; ? ? /* This can be any ELF file */lib_path(filepath)
- ?We encountered something called the ELF Header, which is always the first thing to be present in any ELF file.If programmer want to dump his elf file the library must look like:-
- lib_dump_elf_header();
- Above mentioned API will work only for one elf file.If programmer want to analyse multiple file he can follow following step:
- inst_des=lib_path(“./a.out”);
- lib_dump_elf_header(des);
- free_list_init(): This function initializes the structure free_us
- free_list_fini(): Deinitializes the structure.
- dd_addr_to_list(void *addr): Adds the specified address to the list.
Linux Process Tracing
- With ptrace, we can have full control over a program's execution flow, which means that we can do some very interesting things, ranging from memory virus infection and virus analysis/detection to userland memory rootkits, advanced debugging tasks, hotpatching, and reverse engineering.
- In Linux, the ptrace(2) system call is the userland means of accessing a process address space
- ?The ptrace command is very useful for both reverse engineers and malware authors
- It gives a programmer the ability to attach to a process and modify the memory, which can include injecting code and modifying important data structures such as the Global Offset Table (GOT) for shared library redirection
Ptrace?
- In Linux, the ptrace(2) system call is the userland means of accessing a process address space
- ?The ptrace command is very useful for both reverse engineers and malware authors.
- It gives a programmer the ability to attach to a process and modify the memory, which can include injecting code and modifying important data structures such as the Global Offset Table (GOT) for shared library redirection.
------------------------------------------------------------------------------------------------------