Smallest runnable Mach-O
Saved from my answer on CodeGolf StackExchange
Smallest runnable Mach-O has to be at least 0x1000
bytes. Because of XNU limitation, file has to be at least of PAGE_SIZE
.
See xnu-4570.1.46/bsd/kern/mach_loader.c
, around line 1600.
However, if we don’t count that padding, and only count meaningful payload, then minimal file size runnable on macOS is 0xA4
bytes.
It has to start with mach_header (or fat_header
/ mach_header_64
, but those are bigger).
struct mach_header {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
};
It’s size is 0x1C
bytes.
magic
has to be MH_MAGIC
.
I’ll be using CPU_TYPE_X86
since it’s an x86_32
executable.
filtetype
has to be MH_EXECUTE
for executable, ncmds
and sizeofcmds
depend on commands, and have to be valid.
flags
aren’t that important and are too small to provide any other value.
Next are load commands.
Header has to be exactly in one mapping, with R-X rights – again, XNU limitations.
We’d also need to place our code in some R-X mapping, so this is fine.
For that we need a segment_command
.
Let’s look at definition.
struct segment_command { /* for 32-bit architectures */
uint32_t cmd; /* LC_SEGMENT */
uint32_t cmdsize; /* includes sizeof section structs */
char segname[16]; /* segment name */
uint32_t vmaddr; /* memory address of this segment */
uint32_t vmsize; /* memory size of this segment */
uint32_t fileoff; /* file offset of this segment */
uint32_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
cmd
has to be LC_SEGMENT
, and cmdsize
has to be sizeof(struct segment_command) => 0x38
.
segname
contents don’t matter, and we’ll use that later.
vmaddr
has to be valid address (I’ll use 0x1000
), vmsize
has to be valid & multiple of PAGE_SIZE
, fileoff
has to be 0
, filesize
has to be smaller than size of file, but larger than mach_header
at least (sizeof(header) + header.sizeofcmds
is what I’ve used).
maxprot
and initprot
have to be VM_PROT_READ | VM_PROT_EXECUTE
. maxport
usually also has VM_PROT_WRITE
.
nsects
are 0, since we don’t really need any sections and they’ll add up to size.
I’ve set flags
to 0.
Now, we need to execute some code. There are two load commands for that: entry_point_command
and thread_command
.
entry_point_command
doesn’t suit us: see xnu-4570.1.46/bsd/kern/mach_loader.c
, around line 1977:
1977 /* kernel does *not* use entryoff from LC_MAIN. Dyld uses it. */
1978 result->needs_dynlinker = TRUE;
1979 result->using_lcmain = TRUE;
So, using it would require getting DYLD to work, and that means we’ll need __LINKEDIT
, empty symtab_command
and dysymtab_command
, dylinker_command
and dyld_info_command
. Overkill for “smallest” file.
So, we’ll use thread_command
, specifically LC_UNIXTHREAD
since it also sets up stack which we’ll need.
struct thread_command {
uint32_t cmd; /* LC_THREAD or LC_UNIXTHREAD */
uint32_t cmdsize; /* total size of this command */
/* uint32_t flavor flavor of thread state */
/* uint32_t count count of uint32_t's in thread state */
/* struct XXX_thread_state state thread state for this flavor */
/* ... */
};
cmd
is going to be LC_UNIXTHREAD
, cmdsize
would be 0x50
(see below).
flavour
is x86_THREAD_STATE32
, and count is x86_THREAD_STATE32_COUNT
(0x10
).
Now the thread_state
. We need x86_thread_state32_t
aka _STRUCT_X86_THREAD_STATE32
:
#define _STRUCT_X86_THREAD_STATE32 struct __darwin_i386_thread_state
_STRUCT_X86_THREAD_STATE32
{
unsigned int __eax;
unsigned int __ebx;
unsigned int __ecx;
unsigned int __edx;
unsigned int __edi;
unsigned int __esi;
unsigned int __ebp;
unsigned int __esp;
unsigned int __ss;
unsigned int __eflags;
unsigned int __eip;
unsigned int __cs;
unsigned int __ds;
unsigned int __es;
unsigned int __fs;
unsigned int __gs;
};
So, it is indeed 16 uint32_t
’s which would be loaded into corresponding registers before thread is started.
Adding header, segment command and thread command gives us 0xA4
bytes.
Now, time to craft the payload.
Let’s say we want it to print Hi Frand
and exit(0)
.
Syscall convention for macOS x86_32:
- arguments passed on the stack, pushed right-to-left
- stack 16-bytes aligned (note: 8-bytes aligned seems to be fine)
- syscall number in the eax register
- call by interrupt
See more about syscalls on macOS here.
So, knowing that, here’s our payload in assembly:
push ebx #; push chars 5-8
push eax #; push chars 1-4
xor eax, eax #; zero eax
mov edi, esp #; preserve string address on stack
push 0x8 #; 3rd param for write -- length
push edi #; 2nd param for write -- address of bytes
push 0x1 #; 1st param for write -- fd (stdout)
push eax #; align stack
mov al, 0x4 #; write syscall number
#; --- 14 bytes at this point ---
int 0x80 #; syscall
push 0x0 #; 1st param for exit -- exit code
mov al, 0x1 #; exit syscall number
push eax #; align stack
int 0x80 #; syscall
Notice the line before first int 0x80
.
segname
can be anything, remember? So we can put our payload in it. However, it’s only 16 bytes, and we need a bit more.
So, at 14
bytes we’ll place a jmp
.
Another “free” space is thread state registers.
We can set anything in most of them, and we’ll put rest of our payload there.
Also, we place our string in __eax
and __ebx
, since it’s shorter than mov’ing them.
So, we can use __ecx
, __edx
, __edi
to fit the rest of our payload.
Looking at difference between address of thread_cmd.state.__ecx
and end of segment_cmd.segname
we calculate that we need to put jmp 0x3a
(or EB38
) in last two bytes of segname
.
So, our payload assembled is 53 50 31C0 89E7 6A08 57 6A01 50 B004
for first part, EB38
for jmp, and CD80 6A00 B001 50 CD80
for second part.
And last step – setting the __eip
. Our file is loaded at 0x1000
(remember vmaddr
), and payload starts at offset 0x24
.
Here’s xxd
of result file:
00000000: cefa edfe 0700 0000 0300 0000 0200 0000 ................
00000010: 0200 0000 8800 0000 0000 2001 0100 0000 .......... .....
00000020: 3800 0000 5350 31c0 89e7 6a08 576a 0150 8...SP1...j.Wj.P
00000030: b004 eb38 0010 0000 0010 0000 0000 0000 ...8............
00000040: a400 0000 0700 0000 0500 0000 0000 0000 ................
00000050: 0000 0000 0500 0000 5000 0000 0100 0000 ........P.......
00000060: 1000 0000 4869 2046 7261 6e64 cd80 6a00 ....Hi Frand..j.
00000070: b001 50cd 8000 0000 0000 0000 0000 0000 ..P.............
00000080: 0000 0000 0000 0000 0000 0000 2410 0000 ............$...
00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000a0: 0000 0000 ....
Pad it with anything up to 0x1000
bytes, chmod +x and run :)
Or downloaded padded file here.
P.S. About x86_64 – 64bit binaries are required to have __PAGEZERO
(any mapping with VM_PROT_NONE
protection covering page at 0x0). IIRC they [Apple] didn’t make it required on 32bit mode only because some legacy software didn’t have it and they’re afraid to break it.