Buffer overflow attacks are popular in the world of hacking as there are so many attack types that are based on it. It is also noted that creating buffer overflows is not only intriguing, but humbling as well.

There are a world of security professionals that research in the efforts to get the word out first. Vulnerability scanners are on a constant upgrade to assess zero-day exploits, (exploits that do not yet have a patch,)and IDS get signature updates in minutes after examples are found in the wild. There are commercial penetration testing products that battle the line of ethics of releasing zero-day exploit modules.

There are arguments as to whether buffer overflows are really preventable, and in reality it might be a fact that it is impossible to completely avoid buffer overflow vulnerabilities. This whole issue is at the heart of the application development process.

Looking at this from the attacker's viewpoint, opportunity is the objective, not the assessment, not the quality of any development program. The security team can be relentless in defending the network, but nothing can truly curtail the efforts of the attacker.

The Theory Behind Buffer Overflows

Buffer Overflow Vulnerabilities Defined

A buffer overflow occurs when in put given to an application exceeds the amount the memory that was set aside to store it. The input is accepted and then overwrites other critical data such as register counters that the Central Processing Unit (CPU) requires to manage the running of the program.

The application could receive input from either direct interaction, such as the application pauses to ask the user a question and then wait for a reply, or it could receive a data file as a graphic file being received and displayed on a Web browser, or it could be a remote request that is supported on an open network service port. Basically, any data received by an application for processing, there must be a place in memory to store it, (buffer) and the program instructions will load and handle it.

Assembly, Machine and Interpreters

When attempting to understand buffer overflows, there are several terms in which you must be comfortable with. These are:

  • Machine, Assembly, Byte Code, High Level Code
  • Disassemblers and Debuggers
  • Compilers and Interpreters
  • Big-Endian, Little-Endian
  • Boolean Logic
  • Stacks, Heaps, Registers

Machine, Assembly, Byte Code, High Level Code

Machine code is what actually runs the CPU. An exploit script will show it as a script consisting of a block of hexadecimal bytes that will load into memory and run exactly as they are.

Assembly is a language that is uses the commands of the CPU architecture directly. Mnemonics are abbreviations of instructions used to represent acutal machine code. Each CPU has mnemonics that are specific to their particular architecture. An assembler is an application that translates the assembly into machine code.

Byte Code is an intermediary language that is used by other languages such as Java and .NET. It requires a virtual machine interpreter to get to the machine code level. Memory and resource management is the task of the virtual machine. The programmer can focus more on the intended function of an application and less on the mechanics of how the computer works.

High Level Code is written in a syntax that is more understandable and must be compiled. Available for a variety of platforms, compilers make the written code portable, in other words, once written it is compiled for multiple systems. High Level Code is similar to having documentation on problem solving if you know how to read the language.

Disassemblers and Debuggers

Disassemblers turn the machine code back into assembly which makes the code easier to analyze.

Debuggers will run the code as a step-by-step process at run time or within a virtual machine that allows the analyst to observe what occurs with each instruction. The data that is stored in memory addresses and registers can be closely monitored as the program executes.

Analyzing the code through disassemblers and debuggers is even better than having the original source code as High Level languages often have obfuscated statements, but looking directly at what is actually happening can be more accurate for the purpose of reverse engineering.

Compilers and Interpreters

Compilers turn High Level code into machine language. Another task is to look for possible errors, dangerous functions or ambiguities. If the syntax of the High Level languages is not perfectly written, the compiler cannot do its job. This is what makes programming frustrating to many people. If a punctuation is missing or wrong, it will cause the compiler to fail.

Interpreters run scripts and assemble the code into machine language while the code is processing. Of course this action causes a performance hit, however this code is easier to maintain. Not knowing if the code can run or not in advance is the biggest danger with the interpreter having to learn the hard way if something will work or not. The try-catch functions will help mitigate this risk, however only if they suspect at the design time there could be an issue.

Big-Endian, Little-Endian

The bits of a bus determines the width of a computer system. If the CPU has physical pins that can handled 32 signals at a time on its three buses, (data, control and address,) then it can be said it can handle 4 bytes of data with each of the timer cycle.

Some data is 32 bits long and are considered double words and the 4 bytes have to load into the memory in some order. For example, in the number 1234, the number '1' is the high order byte, while the number '4' is the low order byte. The value of 1234 can be loaded into memory in either direction without losing meaning as long as how to process it is known. Big-Endian systems would load it 1234 while Little-Endian systems would load it 4321.

The sequential order in which bytes are arranged into larger numerical values when stored in memory or when transmitted over digital links in referred to as Endianness. With Big-Endian the most significant byte, the byte containing the most significant bit is stored first. It has the lowest address or sent first and then the following bytes are stored or sent in decreasing significance order with the least significant byte, the one containing the least significant bit is stored last (having the highest address) or sent last. Little-Endian reverses this format.

All of this matters when in comes to the way it is interpreted and in what analysis tools show you. It is important that the researcher knows the architecture and how the tools they use observe the data will present it. It must also be known whether certain data fields are 1 byte, 2 bytes, 4 bytes or longer. In forensic work this is important and equally important when it comes to reverse engineering the code as well. Lastly, the researched must also keep in mind that the architecture might support switching Endianness as well.

Boolean Logic

Computers are a set of a multi-billion two-state switches known as logic gates. These gates are either on or off or that store an on or off state. To put it in simpler terms, 0 represents FALSE (off) while 1 represents TRUE (on).

Hexadecimal representation of binary signals known as machine language that essentially create circuits within this network of gates. The rules that govern what should happen is given situations are called the logic. Boolean logic is the way to understand how this works and how computer systems are designed.

The functions included in boolean are:

  • AND - If there is any doubt, the answer is false.
  • OR - If there is any truth, the answer is true.
  • XOR - Things that are different get attention while things that are the same are ignored.
  • NOT - Does the opposite, no matter what.

Logic gates have to inputs and depending on the function, the output varies. This table represents the gates.

0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0

The NOT has only input signal and will invert that signal.

Stacks, Heaps and Registers

Data is stored in either one of two places. It is stored on non-volatile storage such as the hard drive, or on volatile storage, the random access memory (RAM). Also needed is a place to process your work. Stacks and heaps are used for this purpose. Heaps are used for long term running of a program while a stacks are used for temporary processes. Registers are memory addresses that hold temporary values suh as counters. They are similar to the sticky notes you find on computer monitors. They help you track the current status of things. In reality, the small stuff does matter. If the registers become corrupted, bad decisions are made. Should the stacks and heaps become trashed, the most likely outcome will be some sort of crash that causes a manual, or automatic reboot of the system. The key to a buffer overflow attack is to get to those "sticky notes" and misinform the processor.

Working with Overflow Exploits

Understanding the Risks

The typical buffer overflow exploit script is written in C and contains the exploit and the payload. The vulnerability is accessed by the exploit which basically is the input the target application is willing to receive. Raw, pure machine language is contained int he payload and can execute when directly inserted into memory by the exploit code.

Both researchers and penetration testers need to know the origin of the code they download from the Internet, however, attackers could care less. It is easy to embed a backdoor code into any exploit. It will do what it is supposed to do but could do a lot more. Unless the code is personally written, you should not fully trust any code.

Common practice is to intentionally bug exploit scripts in order to force the attackers to fix the bug. Another common practice is the include a harmless payload and is intentionally not stealthy in the execution of the attack. This is better known as the Proof of Concept approach.

The question still remains as to what privileges does the shellcode have when the exploit runs? This is answered as it depends on who is running the application or service that was attacked. Should the exploit be successful, the shellcode will assume the security context as the user of the target. Basically, if the user has full administrator rights, this will be what the privileges would be.

Monitoring and Detecting

Here are some things that you need to be aware of when considering buffer overflow vulnerabilities.

  • Requirements of Design
  • Dangerous Functions
  • Bounds Checking
  • Canary Bytes
  • IDS Signatures

Requirements of Design

Any application development project should be managed in accordance with relevant standards for the industry in which the application will be used. In the very least, every application should meet these criteria:

  • No bypassing of authentication is possible.
  • No input from users that can be exploit vulnerabilities.
  • No Shrink-Wrap Code vulnerabilities.

Shrink-wrap code refers to shared program libraries and possibly to entire applications that are utilized in the creation of new products.

Dangerous Functions

Best possible secure coding practices is the most important technique in defending against buffer overflow vulnerabilities. In the "C" language, strcpy() and strncpy() (String Copy) are vulnerable functions. There are compilers that will root out the use of these and other similar code.

Bounds Checking

Making sure that the user's first name should not be entered as 300 bytes is known as either Bounds Checking or Input Sanitizing. Code is used to check that certain criteria must be enforced whenever input is requested. When meeting certain constraints, such as not being larger than what the Memory Allocation (malloc() ) function was that setup the variable size in the first place is known as Clean Input. Special characters must be properly rejected or escaped, meaning they are not to be processed by the interpreter but rather ignored or seen literally as just characters.

Canary Bytes

Placed by the programmer, Canary Bytes are the last four bytes of the variable space. These are also known as the Stack Frame. Their purpose is to see if an attempt has been made to overwrite them which would show a buffer overflow exploit. This technique could also be used as a troubleshooting measure with debugging the code.

IDS Signatures

Intrusion Detection Systems (IDS) can look for NOP sleds. A NOP sled, (also know as NOP slide, NOP or NOP ramp) is a sequence of instructions meant to slide the CPU's instruction execution flow to its final and desired destination whenever the program branches to a memory address anywhere on the slide.

Tools For Performing Exploits

Compiling Scripts

Compiling Scripts are code used to compile and cause a program to run. The command "gcc" (GNU C Compiler) is the classic tool for compiling. The command line tool for disassembly is "gdb" in Linux. Windows has two powerful tools known as "IDA Pro" and "OllyDGB".

Metasploit Framework

Metasploit is a tool used for vulnerability testing. It is time consuming to compile and run scripts with different combinations of exploit, payload, NOP sled encoding and basic parameters such as a destination target IP address and port. Each of these components are made modular by Metasploit and allows the tester to quickly select a combination of different things and let them run. It provides both a Web-based interface as well as a command line interface.

The basic idea of how Metasploit works is:

  • Select an exploit
  • Select a target OS or service version
  • Select a payload
  • Configure option
  • Attack