Saved from my answer on CodeGolf StackExchange

Smallest runnable Mach-O has to be at least 0x1000 bytes. Because of XNU limitation, file has to be at least of PAGE_SIZE. See xnu-4570.1.46/bsd/kern/mach_loader.c, around line 1600.

However, if we don’t count that padding, and only count meaningful payload, then minimal file size runnable on macOS is 0xA4 bytes.

It has to start with mach_header (or fat_header / mach_header_64, but those are bigger).

struct mach_header {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */

It’s size is 0x1C bytes.
magic has to be MH_MAGIC.
I’ll be using CPU_TYPE_X86 since it’s an x86_32 executable.
filtetype has to be MH_EXECUTE for executable, ncmds and sizeofcmds depend on commands, and have to be valid.
flags aren’t that important and are too small to provide any other value.

Next are load commands. Header has to be exactly in one mapping, with R-X rights – again, XNU limitations.
We’d also need to place our code in some R-X mapping, so this is fine.
For that we need a segment_command.

Let’s look at definition.

struct segment_command { /* for 32-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT */
	uint32_t	cmdsize;	/* includes sizeof section structs */
	char		segname[16];	/* segment name */
	uint32_t	vmaddr;		/* memory address of this segment */
	uint32_t	vmsize;		/* memory size of this segment */
	uint32_t	fileoff;	/* file offset of this segment */
	uint32_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */

cmd has to be LC_SEGMENT, and cmdsize has to be sizeof(struct segment_command) => 0x38.
segname contents don’t matter, and we’ll use that later.

vmaddr has to be valid address (I’ll use 0x1000), vmsize has to be valid & multiple of PAGE_SIZE, fileoff has to be 0, filesize has to be smaller than size of file, but larger than mach_header at least (sizeof(header) + header.sizeofcmds is what I’ve used).

maxprot and initprot have to be VM_PROT_READ | VM_PROT_EXECUTE. maxport usually also has VM_PROT_WRITE.
nsects are 0, since we don’t really need any sections and they’ll add up to size. I’ve set flags to 0.

Now, we need to execute some code. There are two load commands for that: entry_point_command and thread_command.
entry_point_command doesn’t suit us: see xnu-4570.1.46/bsd/kern/mach_loader.c, around line 1977:

1977  	/* kernel does *not* use entryoff from LC_MAIN.	 Dyld uses it. */
1978  	result->needs_dynlinker = TRUE;
1979  	result->using_lcmain = TRUE;

So, using it would require getting DYLD to work, and that means we’ll need __LINKEDIT, empty symtab_command and dysymtab_command, dylinker_command and dyld_info_command. Overkill for “smallest” file.

So, we’ll use thread_command, specifically LC_UNIXTHREAD since it also sets up stack which we’ll need.

struct thread_command {
	uint32_t	cmd;		/* LC_THREAD or  LC_UNIXTHREAD */
	uint32_t	cmdsize;	/* total size of this command */
	/* uint32_t flavor		   flavor of thread state */
	/* uint32_t count		   count of uint32_t's in thread state */
	/* struct XXX_thread_state state   thread state for this flavor */
	/* ... */

cmd is going to be LC_UNIXTHREAD, cmdsize would be 0x50 (see below).
flavour is x86_THREAD_STATE32, and count is x86_THREAD_STATE32_COUNT (0x10).

Now the thread_state. We need x86_thread_state32_t aka _STRUCT_X86_THREAD_STATE32:

#define	_STRUCT_X86_THREAD_STATE32	struct __darwin_i386_thread_state
    unsigned int	__eax;
    unsigned int	__ebx;
    unsigned int	__ecx;
    unsigned int	__edx;
    unsigned int	__edi;
    unsigned int	__esi;
    unsigned int	__ebp;
    unsigned int	__esp;
    unsigned int	__ss;
    unsigned int	__eflags;
    unsigned int	__eip;
    unsigned int	__cs;
    unsigned int	__ds;
    unsigned int	__es;
    unsigned int	__fs;
    unsigned int	__gs;

So, it is indeed 16 uint32_t’s which would be loaded into corresponding registers before thread is started.

Adding header, segment command and thread command gives us 0xA4 bytes.

Now, time to craft the payload.
Let’s say we want it to print Hi Frand and exit(0).

Syscall convention for macOS x86_32:

  • arguments passed on the stack, pushed right-to-left
  • stack 16-bytes aligned (note: 8-bytes aligned seems to be fine)
  • syscall number in the eax register
  • call by interrupt

See more about syscalls on macOS here.

So, knowing that, here’s our payload in assembly:

push   ebx          #; push chars 5-8
push   eax          #; push chars 1-4
xor    eax, eax     #; zero eax
mov    edi, esp     #; preserve string address on stack
push   0x8          #; 3rd param for write -- length
push   edi          #; 2nd param for write -- address of bytes
push   0x1          #; 1st param for write -- fd (stdout)
push   eax          #; align stack
mov    al, 0x4      #; write syscall number
#; --- 14 bytes at this point ---
int    0x80         #; syscall
push   0x0          #; 1st param for exit -- exit code
mov    al, 0x1      #; exit syscall number
push   eax          #; align stack
int    0x80         #; syscall

Notice the line before first int 0x80.
segname can be anything, remember? So we can put our payload in it. However, it’s only 16 bytes, and we need a bit more.
So, at 14 bytes we’ll place a jmp.

Another “free” space is thread state registers.
We can set anything in most of them, and we’ll put rest of our payload there.

Also, we place our string in __eax and __ebx, since it’s shorter than mov’ing them.

So, we can use __ecx, __edx, __edi to fit the rest of our payload. Looking at difference between address of thread_cmd.state.__ecx and end of segment_cmd.segname we calculate that we need to put jmp 0x3a (or EB38) in last two bytes of segname.

So, our payload assembled is 53 50 31C0 89E7 6A08 57 6A01 50 B004 for first part, EB38 for jmp, and CD80 6A00 B001 50 CD80 for second part.

And last step – setting the __eip. Our file is loaded at 0x1000 (remember vmaddr), and payload starts at offset 0x24.

Here’s xxd of result file:

00000000: cefa edfe 0700 0000 0300 0000 0200 0000  ................
00000010: 0200 0000 8800 0000 0000 2001 0100 0000  .......... .....
00000020: 3800 0000 5350 31c0 89e7 6a08 576a 0150  8...SP1...j.Wj.P
00000030: b004 eb38 0010 0000 0010 0000 0000 0000  ...8............
00000040: a400 0000 0700 0000 0500 0000 0000 0000  ................
00000050: 0000 0000 0500 0000 5000 0000 0100 0000  ........P.......
00000060: 1000 0000 4869 2046 7261 6e64 cd80 6a00  ....Hi Frand..j.
00000070: b001 50cd 8000 0000 0000 0000 0000 0000  ..P.............
00000080: 0000 0000 0000 0000 0000 0000 2410 0000  ............$...
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000                                ....

Pad it with anything up to 0x1000 bytes, chmod +x and run :) Or downloaded padded file here.

P.S. About x86_64 – 64bit binaries are required to have __PAGEZERO (any mapping with VM_PROT_NONE protection covering page at 0x0). IIRC they [Apple] didn’t make it required on 32bit mode only because some legacy software didn’t have it and they’re afraid to break it.

See more info and a tool to play with this on GitHub