Introduction to Exploit Development

Shameless plug

This course is given to you for free by The Perkins Cybersecurity Educational Fund: https://perkinsfund.org/

Please consider donating to The Perkins Cybersecurity Educational Fund

You can also support The Perkins Cybersecurity Educational Fund by buying them a coffee

Sponsor

Special thanks to the sponsor of this course: Backyard Bandwidth!

Enterprise-grade performance at a price that makes sense. No hidden fees, no catch—just reliable, affordable services with a commitment to privacy. Janky But Reliable.

NOTE:

This course assumes that you understand the basics of ASM, shellcode, Python, and C. You can find the relevant courses here:

What will be covered?

What is exploit dev?

Exploit development is the process of identifying, understanding and writing code to take advantage of software vulnerabilities. This allows the developer to execute unintended behavior on the target. In edgier terms: exploit development is turning bugs into weapons.

Below is a simple process breakdown on exploit development:

Finding possible vulnerabilities
- Identify a possible issue/weakness in software.
Understand how the bug can be exploited
- Once you have found the bug you can determine how to control the execution of the bug.
Writing and testing the exploit
- Your goal is to develop reliable repeatable code with a high success rate.
Bypassing security protections
- Modern systems have security features to protect against exploits most of these will not be covered in depth in this course.

Tools of the trade

Debuggers

Tool

Description

GDB

The GNU Debugger, used for debugging binaries on Linux. Essential for analyzing crashes and reverse engineering memory corruption bugs.

GDB-PEDA

An enhanced GDB with exploit development tools built-in, making it easier to analyze memory and registers.

GEF

Another powerful GDB plugin, lightweight and packed with heap analysis and ROP tools.

WinDbg

The Windows Debugger, used for analyzing Windows binaries and kernel debugging.

LLDB

Apple's alternative to GDB, useful for debugging on macOS.

x64dbg

GUI debugger useful for debugging and dumping Windows binaries.

Binary Analysis Tools

Tool

Description

IDA Pro

The industry standard for reverse engineering. It provides a graphical disassembler and decompiler.

Ghidra

An open-source alternative to IDA Pro, developed by the NSA. Useful for static analysis and reverse engineering.

radare2

A lightweight disassembler and debugger, packed with scripting capabilities.

Binwalk

Extracts and analyzes firmware and binary blobs to find embedded code and vulnerabilities.

Checksec

Displays the security protections enabled in a binary (e.g., ASLR, NX, RELRO).

Exploitation Frameworks

Tool

Description

Pwntools

A powerful Python library for exploit development, useful for building payloads and automating exploits.

ROPgadget

Finds useful ROP gadgets in a binary for Return-Oriented Programming (ROP) exploits.

OneGadget

Identifies one-shot RCE gadgets in binaries with libc.

angr

A binary analysis framework that can symbolically execute programs to find exploitable conditions.

Metasploit

A penetration testing framework that includes prebuilt exploits for various vulnerabilities.

Shellcode Generation

Tool

Description

MSFVenom

Generates custom shellcode payloads for various architectures.

nasm

Assembler used for writing custom x86 and x86_64 shellcode.

objdump

Disassembles binaries and extracts shellcode from compiled programs.

strace

Traces system calls made by a binary to identify possible vulnerabilities.

Fuzzing

Tool

Description

AFL++

A fast fuzzing tool that automatically finds crashes and exploits.

Honggfuzz

A modern fuzzer with built-in sanitizer integration.

Radamsa

A mutation-based fuzzer that generates random test cases.

zzuf

Corrupts input to test how a program reacts to unexpected data.

Networking & Exploit Testing

Tool

Description

Wireshark

Captures and analyzes network traffic to inspect exploit behavior.

Tcpdump

A command-line alternative to Wireshark for packet analysis.

Netcat

A simple tool for sending and receiving data over TCP/UDP.

Burp Suite

A web security tool for testing web-based exploits and injections.

Virtual Machines & Sandboxes

Tool

Description

QEMU

Lightweight emulator for testing kernel and system-level exploits.

VMware

Used for running Windows and Linux virtual machines for testing exploits.

VirtualBox

A free alternative to VMware for sandboxed environments.

Docker

Runs isolated containers, useful for quick testing environments.

Introduction to fuzzing

Fuzzing is an automated technique that is used to find vulnerabilities in software. It is used to feed random, unexpected, or malformed inputs into a target platform. It is useful for helping to identify things like buffer overflows, memory corruption, or format string vulnerabilities (all of these will be talked about later in this course). Its goal is to observe the response to the data sent to the target program and see how it reacts.

Types of fuzzing

Dumb fuzzing
- Completely random inputs.
- Does not consider the application structure.
- Useful for simple programs.
- Has a low hit rate for complex applications
Smart (structure aware) fuzzing
- Uses knowledge of input structures.
- Generates mutated test cases that follow expected structure.
- Useful for binaries that are expecting certain input types.
Coverage guided fuzzing
- Uses code instrumentation to track execution paths.
- Prioritizes inputs to increase code coverage.
- Useful for finding bugs deep in applications.
Grammar based fuzzing
- Generates syntactically correct inputs.
- Often used for web applications, interpreters, and scripting engines.
Mutation based fuzzing
- Starts with valid inputs and mutates them slightly.
- Useful for programs who require structured input.

Creating patterns for fuzzing

When fuzzing programs, it is essential to have a unique pattern that is easily identifiable especially when it comes to buffer overflows or memory corruptions. Things like cyclic patterns can be generated by a multitude of tools, for this course we will be focusing on using pwntools to do so:

Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pwn import *
[*] Checking for new versions of pwntools
    To disable this functionality, set the contents of /home/salty/.cache/.pwntools-cache-3.8/update to 'never' (old way).
    Or add the following lines to ~/.pwn.conf or ~/.config/pwn.conf (or /etc/pwn.conf system-wide):
        [update]
        interval=never
[*] You have the latest version of Pwntools (4.14.0)
>>> pattern = cyclic(100)
>>> print(pattern)
b'aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa'
>>>

We can use this pattern as the input in order to test crashes in the EIP or RIP value. After we have found the value, we can find the exact offset of the value in the pattern:

>>> # fake offset found
>>> crash = 0x61616172
>>> print(cyclic_find(crash))
68
>>>

This allows us to determine exactly which part of our cyclic pattern produced the crash!

Fuzzing is an essential technique for exploit development. It automates the task of bug discovery, finds memory corruption issues, and provides valuable insights into software security.

Fuzzing mitigations

Use safe memory functions.
Enable compiler protections.
Implement input validation.
Monitor logs.
Fuzz your own software before release.

Buffer overflows

A buffer overflow occurs when software writes more data into a buffer than it can hold, this leads to memory corruption. This allows an attacker to overwrite memory and can potentially lead to code execution. The following code can be downloaded as a compiled binary directly from here, or you can see below for the exact code:

#include <stdio.h>
#include <string.h>

void vulnerable_function(char *input) {
    char buffer[16];
    strcpy(buffer, input);
    printf("You entered: %s\n", buffer);
}

int main(int argc, char *argv[]) {
    if (argc < 2) {
        printf("Usage: %s <input>\n", argv[0]);
        return 1;
    }
    vulnerable_function(argv[1]);
    return 0;
}

What is the issue here?

strcpy blindly copies input into the buffer variable
If the input is longer than the established size (16 bytes) it overwrites adjacent memory
This can allow us to overwrite the functions return address allowing us to control execution

To trigger the overflow all we would need to do is send more data to the buffer than it is expecting. In order to trigger the overflow we will need to compile this program without any protections, we can do so like this:

$ gcc -fno-stack-protector -z execstack -Wall .github/exploit_dev_files/example_c.c -o .github/exploit_dev_files/example_c

And you can trigger the buffer overflow like this:

$ ./example_c
Usage: .github/exploit_dev_files/example_c <input>
$ ./example_c AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault

As you can see above we get a segmentation fault. This means that the program attempted to access memory that it is not allowed to. To prevent buffer overflows the easiest way is to compile the code with protection as well as use safe functions like strncpy over strcpy. We have taken the liberty to make a comprehensive list of compiler safety flags in gcc:

Flag

Purpose

-fstack-protector

Adds stack canaries.

-fstack-protector-all

Adds stack protection to all functions.

-fstack-clash-protection

Detects allocations that could overwrite adjacent memory.

-D_FORTIFY_SOURCE=2

Enhances functions like strcpy(), memcpy(), and sprintf() to detect buffer overflows at runtime.

-Wstack-protector

Enables warnings if stack protection is ineffective.

-Wl,-z,relro

Marks ELF sections as read-only after initialization, preventing memory corruption.

-Wl,-z,now

Forces immediate symbol resolution, reducing attack surfaces like lazy binding exploits.

-Wl,-z,noexecstack

Marks the stack as non-executable, preventing shellcode execution.

-Wl,-z,nodlopen

Disables dynamic library loading, reducing exploit vectors.

-fsanitize=address

Enables AddressSanitizer (ASan) to detect buffer overflows, use-after-free, and memory leaks.

-fsanitize=undefined

Enables Undefined Behavior Sanitizer (UBSan) to detect out-of-bounds memory access.

-fsanitize=leak

Enables LeakSanitizer to detect memory leaks.

-fsanitize=bounds

Detects out-of-bounds array accesses.

-fsanitize=thread

Detects race conditions in multi-threaded programs.

-g

Adds debug symbols for better debugging in tools like gdb.

-ggdb

Provides more detailed debugging info for gdb.

NOTE: There are probably more that will protect you, these are just what we could think of at the time

ROP (Return Oriented Programming)

Return oriented programming (ROP) is an advanced exploitation technique that is used to execute arbitrary code in a program that has security measures. This method allows bypassing of traditional buffer overflow protections by reusing preexisting code called "gadgets". ROP is needed because modern operating systems enforce NX stacks (non-executable) meaning even if you overflow the buffer and write shellcode into the memory it cannot be executed. By using ROP you can bypass this protection using gadgets that already exist within the executable parts of the program instead of trying to inject shellcode into an NX stack.

Core ROP idea

Overwrite return addresses on the stack with the address of a useful instruction sequence.
Each gadget must end with a ret instruction allowing chaining of multiple.
Control flow is hijacked to perform arbitrary operations.

ROP gadgets

A ROP gadget is a small sequence of assembly instructions that ends with a ret instruction. By chaining these together to form a "ROP chain" you can perform complex tasks. As an example in a binary file, you might have:

pop eax
ret

This gadget will pop a value from the stack into eax and return the address. Another example may be:

mov dword prt [ecx], eax
ret

This gadget writes the value in eax to the memory address stored in ecx. By choosing the correct gadgets you can manipulate an entire programs execution flow using preexisting code.

Building a ROP chain

To build a ROP chain we will create a vulnerable program that you can download as a compiled binary here.

#include <stdio.h>
#include <string.h>

void win() {
    printf("You've won! Code execution achieved.\n");
}

void vulnerable_function() {
    char buffer[64];
    gets(buffer);
}

int main() {
    vulnerable_function();
    return 0;
}

This functions gets(buffer) call allows us to overflow the stack and overwrite return addresses. Now we can build the theoretical ROP chain!

Assuming the win() function is at location 0x080484b6 instead of injecting shellcode we want to overwrite the return address with the win() function address. Assuming we want to execute /bin/sh using gadgets in libc we need to do the following:

Find a gadget to setup the execve system call
Find a gadget to control the registers
Find the return address to libc's system() function

NOTE: It is worth noting that in more complex situations you may not have direct access to useful functions like this.

We can do this using theoretical gadgets like this:

pop eax
ret         ;; load syscall

pop ebx
ret         ;; load address of /bin/sh

pop ecx
ret         ;; null pointer for argv

pop edx
ret         ;; null pointer for envp

int 0x80    ;; trigger execve("/bin/sh", NULL, NULL)

The full payload for the exploit might look something like this:

padding + gadget1 + "/bin/sh" + address + gadget2 + NULL + gadget3 + syscall

While a visual representation of this looks like the following:

# Stack before the exploit is crafted
-----------------------------------
|  Return Address (main)          |
|---------------------------------|
|  Saved EBP                      |
|---------------------------------|
|  Buffer (64 bytes)              |
|---------------------------------|
|  Overflow Starts Here           |
-----------------------------------

# Stack after the exploit is crafted
-----------------------------------
|  Address of ROP Gadget 1        | 
|---------------------------------|
|  Argument for Gadget 1          |
|---------------------------------|
|  Address of ROP Gadget 2        |
|---------------------------------|
|  Argument for Gadget 2          |
|---------------------------------|
|  Address of "system"            | 
|---------------------------------|
|  Address of "/bin/sh"           | 
-----------------------------------

The stack "unwinds" through ret instructions executing each gadget in sequence.

Defending against ROP

Modern systems have defenses to prevent ROP such as:

ASLR (address space layout randomization): randomizes the memory addresses to prevent finding gadgets
Stack canaries: detect stack overflows
CFI (control flow integrity): ensures valid execution flow
CET/Shadow Stacks: or ROP mitigation techniques to block ROP by tracking return addresses

TL;DR

ROP chaining allows attacker to bypass DEP/NX by chaining already preexisting instruction sequences to exploit and manipulate stack behavior. There are plenty of ways to prevent ROP from happening by just being a decent developer.

Heap exploitation

Heap exploitation is a class of attacks that targets vulnerabilities in dynamic memory management. You can exploit weaknesses in a memory allocator such as malloc, free, new, delete in C and C++. Heaps are different from stacks which are used for function management. Common heap exploitation techniques include:

Heap overflow
Use-After-Free
Double free

NOTE: This is not an extensive list, but for this course we will not get into all of them.

Heap Spraying

Before we can get into these common techniques, we should touch on heap spraying. Heap spraying is a technique where you allocate many predictable objects in memory, this allows you to increase the likelihood of placing your shellcode at a desired location. For example, it is used in browser exploits a lot to overwrite memory in engines like so:

var shellcode = unescape("%u4141%u4141%u4141%u4141");
var spray = [];
for (var i = 0; i < 10000; i++) {
    spray[i] = shellcode;
}

The above code fills the heap with the spray buffers by allocating a large number of identical heap objects. This predictably places the shellcode within memory so that it can be accessed easier at a later time.

Heap overflows

A heap overflow occurs when a program writes more data into a heap-allocated buffer than its allocated size. This leads to memory corruption. You can download this example as a compiled binary here.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    char *buffer = (char *)malloc(16);
    strcpy(buffer, "AAAAAAAAAAAAAAAAAAAAAAAAAAAA"); 
    printf("Buffer: %s\n", buffer);
    free(buffer);
}

Here, the buffer variable is allocated to 16 bytes. As you can see, we attempt to add more data to it than is allocated. This can overwrite adjacent heap metadata and potentially allow an attack to control execution flow. Assuming this overwritten memory includes functions, pointers or metadata used by the allocator, we can manipulate them to possibly gain code execution.

Use-After-Free

A UAF (use-after-free) occurs when a program continues to use memory after it has been freed. This allows us to exploit things like dangling pointers. You can find a downloadable compiled binary here.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    char *buffer = malloc(16);
    free(buffer); 
    strcpy(buffer, "Exploited!"); 
    printf("%s\n", buffer);
}

This program uses memory after we have freed. This can lead to exploitation by allowing attackers to control memory in an allocation they normally wouldn't have control over.

Double free

Double free's happen when the same memory is freed twice. This can lead to heap corruption. You can download a copy of the double free example as a compiled binary here.

#include <stdio.h>
#include <stdlib.h>

int main() {
    char *buffer = malloc(16);
    free(buffer);
    free(buffer); 
}

As you can see in the above code free is called twice on the same buffer right after one another. This can lead to corruption of the "freelist". We can manipulate the heap allocator into reusing the corrupted memory. Leading to an arbitrary write or the ability to hijack function pointers.

Defense Against Heap Exploitation

Canaries or Heap Cookies: random values placed before chunks to detect corruption
ASLR (address space layout randomization): randomizes heap locations to prevent predictable exploits
DEP (data execution prevention): prevents the execution of heap memory
tcache and hardened allocators: modern allocators like tcmalloc and jemalloc introduce security mechanisms.

Modern Exploit Techniques

Exploitation is an ever-evolving landscape. It is constantly growing as attackers discover new and more creative exploit techniques. While buffer overflows and ROP chains remain fundamental, modern exploits are more sophisticated and require more depth to gain control. This section will cover advanced exploitation practices and topics that are beyond the traditional memory corruptions.

Format String Vulnerabilities

Format string exploits are a powerful yet underrated technique that can provide informational leaks, arbitrary memory writes, and even code execution. C functions like printf, fprintf, and snprintf allow developers to use string format specifiers to display certain data types such as: %s, %d, %p. If developers forget to provide the proper format specifier the string can be manipulated by an attacker. This code can be downloaded as a compiled binary here.

#include <stdio.h>
void vulnerable_function(char *user_input) {
    printf(user_input);
}
int main(int argc, char *argv[]) {
    vulnerable_function(argv[1]);
    return 0;
}

The above code can cause information disclosures or even arbitrary reads. Exploiting this is fairly simple and can be done like so:

$ .github/exploit_dev_files/string_vuln %p
0x7fffffffdc38  # we now have the stack value

By combing the leaks and arbitrary writes it is entirely possible to get a full takeover of this program.

Mitigating this attack

Always use printf("%s", variable); to explicitly specify a format.
Enable FORTIFY_SOURCE, ASLR, and Stack Canaries.

Arbitrary Read/Write Primitives

Exploiting is not always about direct code execution. Sometimes you need some help along the way or places to walk through to act as your steppingstones. An arbitrary read lets an attacker read a memory location; this is often used to leak things such as ASLR protected addresses. An arbitrary write allows modifying memory which can lead to overwriting function pointers, hijacking control flow, or privilege escalation. Take the following pseudocode:

struct user {
    char name[16];
    int admin;
};

void edit_user(struct user *u) {
    printf("Enter new name: ");
    gets(u->name);
}

Using a buffer overflow on the gets() call we are able to overflow the name field to write the number 1 possibly granting us unauthorized access, the above code provides a good explanation of an arbitrary write. The following pseudocode is a good demonstration of an arbitrary read:

struct user {
    char name[16];
    int admin;
};

void leak_address(struct user *u) {
    int index;
    printf("Enter an index to read: ");
    scanf("%d", &index);

    char *ptr = (char *)u + index;
    printf("Leaked value: %p\n", *ptr);
}

By the pointer not checking bounds we can get a leaked value from it.

Kernel exploits

Userland is fun but what happens when we need to dive a little deeper? Eventually you get to kernel exploits! By breaking into the kernel, you officially own that system. In a nutshell: the kernel controls everything including memory, processes, and hardware. Exploiting the kernel means bypassing Ring 0 protections and executing code with root privileges, thus giving you full control over everything.

Common Kernel Exploits

NULL pointer dereference: if the kernel dereferences a NULL pointer, attackers are able to map user memory at 0x0 and trick it into executing code.
Race conditions: a bug where multiple processes access shared memory in an unsafe way that can lead to privilege escalation.
Use-After-Free (UAF): reallocating freed kernel objects with malicious data that you control.

NOTE: This is not an extensive list, we will not be going over all of them in this course.

We can demonstrate a UAF by providing the following simple example:

int fd = open("/dev/path/to/something", O_RDWR);

// free the object
ioctl(fd, FREE_OBJECT, NULL);

// using after it has been freed - possible to hijack
ioctl(fd, USE_OBJECT, NULL);

By performing heap spraying techniques it is possible for attackers to replace freed objects with controlled data and hijack the execution. We would fill the heap with our controlled data, when the kernel attempts to free the object, it will unknowingly access our controlled data, if the original object contains something like a function pointer, structure address, or other critical fields they can be overwritten by our controlled data. This can possibly lead to code execution, privilege escalation, or arbitrary memory access.

Mitigations

KASLR (Kernel Address Space Layout Randomization): randomizes the kernel memory locations.
SMEP (Supervisor Mode Execution Prevention): prevents execution of userland code within the kernel.
Memory tagging: detects UAF and buffer overflows dynamically

That's all folks

You're done! You have completed this introductory course! We have covered a broad spectrum of exploit development, from fundamental memory corruption to advanced ROP techniques. Understanding this topic is essential for offensive and defensive security. As you have seen, this field blends art and science and requires creativity, patience, and a deep understanding of system internals. We hope you received as much out of this course as we gained from writing it!

Support the Bible

Once again, this course is offered for free by The Perkins Cybersecurity Educational Fund! If you found this information valuable and want to support the continued development of the Malware Bible please consider:

Donating to the Malware Bible Fund → Donate Here
Exploring our incredible sponsor Backyard Bandwidth for all your cloud hosting needs!

Become a sponsor

These courses reach thousands of cybersecurity professionals, researchers, students, and teachers worldwide who actively engage in learning and advancing the field. Sponsoring our educational initiative not only supports free cybersecurity education but also places your brand in front of a highly technical and security-conscious audience.

Interested in partnering? Let's talk about how your organization can be featured in our future courses: Contact us today! Please view our Sponsorship Packages for more details!

PreviousSo You Want to Write Malware?NextThe Journey

Last updated 19 days ago