"If it were really the case that terrorists "hate us for our freedoms," we'd be getting more popular with Al Qaeda every month." -- Julian Sanchez of Reason.com
Briefly a buffer overflow is a bug which may affect low-level code (in C and C++).
When a program gets unexpected input from user, it may crash if it has the bug. Programmer may not notice this is a big deal indeed. On the other hand, an attacker may notice the bug and use it to do much worse than just crash the program ,such as; stealing private information, corrupting information, and running random code depends on the attacker.
Even though, hardware/software solutions are being developed or still in progress, these languages are very popular, and vulnerable.
This attack may not be that much dangerous on Microsoft IIS, Apache httpd, or SQL server, etc. What if we imagine an embedded system (industrial control systems, automobiles) which is written with C/C++ which is vulnerable to buffer overflow? Yeah it may not be fun...
As a terminology buffer overflow means any access of outside of a buffer's bound. This can be an over read, or an over write. Let's get started with essentials and learn how codes work with architecture.
We all know that programs are stored in memory, and in order to point physical addresses, hex values are in use starting from 0x0...0 till 0xf...f. Here is an example;
CODE :
0x00000000==========================================0xffffffff
(_compile time__)(_runtime_________________)(set when starts)
--------------------------------------------------------------
Text, data data || HEAP, --- , STACK_______|| cmdline & env
--------------------------------------------------------------
_int y=10;int x;__malloc();__int t(){int x;}________________
==============================================================
IMPORTANT: Stack and heap grow in opposite directions.
CAUTION: Local variables stored in the same order as in the code, but arguments stored in reverse order.
Also, The local variable allocation is up to the compiler. This means, variables could be allocated in any order, or not allocated at all and stored only in registers, depending on the optimization level.
Next checkpoint is to understand how to access those variables.
Let's assume we have such function;
CODE :
void func(char *arg1, int arg2, int arg3)
{
...
loc2++; //increment loc2 by 1;
...
}
int loc2 variable must be incremented by 1, but where is the physical address of loc2 variable? For such function, local variables are stored in the stack frame. The solution in the architecture is using a “frame pointer” which is generally stored in the %ebp register. Location of local variables are estimated with the help of %ebp register. Calculation looks like this;
Compiler knows that loc2 is stored 8Bytes before the %ebp (-8(%ebp)). No matter where is it called from!
So far so good, but how to return from function?
Let's assume;
CODE :
int main()
{
...
func(“Word”,10,-5);
...
}
The point here is we know how %ebp works for function, and yet we do not know what happens before. Main function is the start point of program, so that %ebp should also know where is local variable of main function.
We called the function and %ebp is currently shows the data of caller, such as;
At this point we have another function if we go into this function compiler will forget where is the function, how to come back to the main function. Solution for this problem is using a “stack pointer (%esp)”, such as;
1- Push %ebp before locals
2- Set %ebp to current (%esp)
3- Set %ebp to (%ebp) at return
When program starts, the instruction pointer (%eip) moves from one instruction to another instruction which implements the program. When instruction pointer moves on the call func, %eip will jump into different function, and start executing instructions from different function. What we want is to resume back to where we were before in the main function. The solution for this problem is to store %eip just before the %ebp in the function's stack frame.
1- Push next %eip before call
2-Set %eip to 4Bytes after %ebp (4(%ebp)) at return
[/b]Summary:[/b]
CODE :
Calling function:
1.Push argument onto the stack in reverse order
2.Push the return address (%eip)
3.Jump to the function's addresses
Called function:
4.Push the old frame pointer onto the stack (%ebp)
5.Set frame pointer (%ebp) to where the end of the stack is right now (%esp)
6.Push local variables onto stack
Returnin functiion:
7.Reset the previous stack frame: %esp=%ebp , %ebp=(%ebp)
8.Jump back to return address: %eip=4(%esp)
Now we know how does architecture work, and we have basics to understand how buffer overflow attack can work. From this point I will explain “the attack” with an easy example just to visualize it for better understanding.
int main()
{
char *mystr = “WorkNow”;
func(mystr);
...
}
We see that main function calls function func with mystr as an argument. In the func function we have char array which only contains 4 char (4bytes). We also know that char array must end with “\0” .
For those who doesnt know what does strcpy do; strcpy(arg1, arg2) takes arg2 and copies into arg1 . It is a basic copy process.
When strcpy starts working, 4 bytes for buffer will store letters 'W','o','r','k';
CODE :
0x00000000==========================================0xffffffff
--------------------------------------------------------------
_______|| W o r k ||%ebp||%eip||$arg1||_____________________
--------------------------------------------------------------
_______||__buffer_||________________________________________
==============================================================
strcpy needs more 4 bytes to store remaining letters, so takes space from %ebp, and compiler allows.
CODE :
0x00000000==========================================0xffffffff
--------------------------------------------------------------
_______|| W o r k || N o w \0 ||%eip||$arg1||________________
--------------------------------------------------------------
_______||__buffer_||________________________________________
==============================================================
Note that letters are not stored as what they are, compiler converts to make it understandable in the architecture.
After the strcpy, instruction pointer wants to come back to old %ebp to continue on main function till end of process. Where is %ebp now? Nowhere, because it's been overwritten. It will give SEGFAULT error and program crashes.
As we visualize, we overflow the buffer and this time we will use space of authenticated. Program will not return SEGFAULT error, and continue to work.
CODE :
0x00000000==========================================0xffffffff
--------------------------------------------------------------
_____|| W o r k || N o w \0 ___||%ebp||%eip||$arg1||_______
--------------------------------------------------------------
_____||_buffer__||authenticated||__________________________
==============================================================
Right after the strcpy function, if statement will check if authenticated is true. Since we overflowed the buffer and overwrite authenticated, it is not 0 anymore. A non-zero variable in if statement will succeed, and codes will work.
In this example, we allowed the program to do thing that is not intended.
Can we only overwrite 4byte more than buffer's space? NO. We can overwrite even the whole stack as we want. What does this mean? This means we can change the stack and inject our codes to do whatever we want. While doing this we should be careful not to have “\0” somewhere in the middle of our code. Because char array ends with “\0”.
Heartbleed attack which is kind of overflow attack can work on SSL servers to receive private information (but not to inject code and run),and this attack cannot be detected with usual anti-viruses. Because, we do not run a harmful code, but server sends information to us. There are different types of this attack for different purposes which can be found on the internet. They are all based on this logic.
Cast your vote on this article 10 - Highest, 1 - Lowest
Comments: Published: 9 comments.
HackThisSite is the collective work of the HackThisSite staff, licensed under a CC BY-NC license.
We ask that you inform us upon sharing or distributing.