A buffer overflow vulnerability occurs when you give a program too much data. The excess data corrupts nearby space in memory and may alter other data. As a result, the program might report an error or behave differently. Such vulnerabilities are also called buffer overrun.
Some programming languages are more susceptible to buffer overflow issues, such as C and C++. This is because these are low-level languages that rely on the developer to allocate memory. Most common languages used on the web such as PHP, Java, JavaScript or Python, are much less prone to buffer overflow exploits because they manage memory allocation on behalf of the developer. However, they are not completely safe: some of them allow direct memory manipulation and they often use core functions that are written in C/C++.
Buffer overflow vulnerabilities are difficult to find and exploit. They are also not as common as other vulnerabilities. However, buffer overflow attacks may have very serious consequences. Such attacks often let the attacker gain shell access and therefore full control of the operating system. Even if the attacker cannot gain shell access, buffer overflow attacks may stop running programs and, as a result, cause a Denial of Service.
Types of Buffer Overflow Vulnerabilities
There are two primary types of buffer overflow vulnerabilities: stack overflow and heap overflow.
In the case of stack buffer overflows, the issue applies to the stack, which is the memory space used by the operating system primarily to store local variables and function return addresses. The data on the stack is stored and retrieved in an organized fashion (last-in-first-out), the stack allocation is managed by the operating system, and access to the stack is fast.
In the case of heap buffer overflows, the issue applies to the heap, which is the memory space used to store dynamic data. The amount of memory that needs to be reserved is decided at runtime and it is managed by the program, not the operating system. Access to the heap is slower but the space on the heap is only limited by the size of virtual memory.
How Does a Buffer Overflow Work
In a simple program, you may want the user to enter an email address. Therefore, you create a string variable. You allocate 64 bytes to the variable because you do not expect an email string to be longer than 64 characters. However, you trust the user input too much and do not check if the length of the entered string exceeds the size of the buffer.
As a result, the user enters 100 characters and the remaining 36 characters are stored in memory allocated to another variable. This causes the value of that variable to change and the behavior of the program to change as well. In most cases, this leads to a simple memory segmentation fault but it may have more serious consequences. To understand, how this may influence program execution, we shall assume that the vulnerability is a stack overflow and it appears in a C program.
A C program uses the stack to store a set of data for every function. The set of data is called a stack frame and it includes the function identifier, values of local variables, and the return address. Here is a simple source code example to explain how the stack works:
main() {
int mv1;
int mv2;
func();
}
void func() {
int fv1;
int fv2;
}
When you run the program, it starts with the main()
function. The program stores the values of the main()
function variables on the top of the stack (mv1
and mv2
). Then the main()
function calls the func()
function and it stores the values of its variables on the top of the stack (fv1
and fv2
). When the func()
function finishes running, the top of the stack is forgotten, the current function returns to main()
, and the program has access to mv1
and mv2
again.
For this to be possible, the program remembers the current position on the stack (stack pointer) and the memory location where it needs to return after the current function is finished (return address). The trick behind a stack overflow attack is to overwrite this return address so that the program jumps to the attacker’s malicious code.
The malicious content that the attacker sends to a faulty program is usually composed of three parts:
- A chain of bytes that represent the
NOP
instruction - A new return address that points to the
NOP
bytes - Arbitrary code (usually a shellcode) located somewhere in the middle of the chain of
NOP
bytes
When the buffer overflow occurs in our example, it causes the program to jump to the chain of NOP
bytes (instead of jumping back to the main()
function). The NOP
bytes are ignored, and the program encounters the shellcode in the middle of them. The shellcode executes an operating system shell, giving the attacker full access to the system.
Here is a very simple example of a C program that is vulnerable to a stack overflow:
main(int argc, char *argv[]) {
func(argv[1]);
}
void func(char *v) {
char buffer[10];
strcpy(buffer, v);
}
The strcpy
function in the above example copies the command argument into the destination buffer variable without checking the string length. The program only allocates 10 bytes to the buffer
string and therefore strcpy
causes a buffer overflow. If we compile this program as vulnprog, the following command-line call is harmless:
$ vulnprog AAAAAAAAAA
However, the following call causes a buffer overflow:
$ vulnprog AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
How To Prevent a Buffer Overflow
Preventing buffer overflow errors in not much different than preventing many other vulnerabilities. It all comes down to distrusting user input. In the case of buffer overflow vulnerabilities, the developer must check the input length before using any functions that might cause an overflow to happen.
However, to err is human and it is not uncommon for developers to forget this basic rule. Code reviewers might miss such errors as well. That is why the safest basic method in C is to avoid the following five unsafe functions that can lead to a buffer overflow vulnerability: printf
, sprintf
, strcat
, strcpy
, and gets
.
Unfortunately, the base C language provides only one safe alternative: fgets
(to be used instead of gets
). Various platforms have their non-standard implementations. For example, the Microsoft version of C includes sprintf_s
, strcpy_s
, and strcat_s
. On Linux/UNIX systems, the best choice is to ban unsafe functions and enforce the use of the Safe C Library.
You can also protect against buffer overflows by using an extension of a compiler that uses canaries. The canaries are special values that the compiler places on the stack between the location of the buffer and the location of control data. When a buffer overflow occurs, it is the canary that is corrupted first and this corruption can be immediately detected. There are many compiler extensions that use canaries, for example, StackGuard and ProPolice.
Operating System Buffer Overflow Protection Mechanisms
For a buffer overflow to be possible, the attacker must know exactly where the buffer will be located in the computer memory. In the past, this was as simple as running a debugger on the local computer and checking the memory addresses. Current operating systems make it much more difficult.
All modern operating systems include a protection mechanism called the address space layout randomization (ASLR). Thanks to this mechanism, the executable file may be loaded into many different memory locations. Therefore, the attacker cannot easily predict which memory address to jump to and many buffer overflow attack attempts fail.
Another technique that helps prevent buffer overflow attacks is executable space protection (on Windows: data execution prevention – DEP). Thanks to this technique, the attacker cannot execute code if it is located in the memory space assigned to the stack or heap and in some cases, also other areas. This makes it impossible to directly call a shellcode but attackers may use advanced tricks such as return-oriented programming.
However, an attacker may try to evade both these protection mechanisms on x86 architectures by using a ret2reg attack. What they need to do is to find a module (DLL) that is not protected by ASLR or DEP. If they can find a JMP ESP
instruction (jump to stack, byte combination \FF\E4
) in that module, they can use the location of this instruction as the return address. The program will jump to this location, execute the jump instruction (JMP ESP
), and jump to the current location of the stack, which is right after the return address (before the shellcode).
Detecting Web-Related Buffer Overflows
Web applications and web pages are rarely susceptible to buffer overflow vulnerabilities because they are not written in C or C++. However, these errors happen in underlying software such as web servers, web application servers, or interpreters.
The Acunetix web vulnerability scanner checks for such errors in web software. Take a demo and find out more about running scans against your web server.
Get the latest content on web security
in your inbox each week.