What’s this about?

I was looking through issues on frida-core, and #124 got my attention. Here’s the question:

How do I print the value of the third argument for this function?

CommonUtils::decodeCStringForBase64(char const*, char const*, std::string &)

Currently I can print the first and second argument by using Memory.readUtf8String.

So let’s have a look on how to print that argument, by digging into how C++ compiler and linker work.

The plan

Most obvious way to print std::string with Frida is to use std::string::c_str() to get char *:

const char* c_str() const;

Returns a pointer to a null-terminated character array with data equivalent to those stored in the string.

Then we could use Memory.readUtf8String on returned pointer. Should be easy, huh?

Setup

All of this was written and tested on x86_64 macOS.

I’m going to intercept calls to interceptMe function in following target.cpp:

#include <string>
#include <iostream>

void interceptMe(std::string &str) {
  std::cout << str << std::endl;
}

int main(void) {
  std::string s;
  while (std::cin) {
    std::getline(std::cin, s);
    interceptMe(s);
  }

  return EXIT_SUCCESS;
}

I’ll use Frida’s JS CLI in this article

Finding the function

Let’s get address of interceptMe first:

[Local::ProcName::target]-> Module.enumerateExportsSync('target').filter(function(exp) { return exp.name.indexOf('interceptMe') !== -1; })
[
    {
        "address": "0x101fc43d0",
        "name": "\_Z11interceptMeRNSt3\_\_112basic\_stringIcNS\_11char\_traitsIcEENS\_9allocatorIcEEEE",
        "type": "function"
    }
]

I’m not using Module.findExportByName here because of name mangling, which is going to be discussed later.

Passing by refrence

Usually passing objects by refrence is actually passing by pointer with “syntactic sugar”. C++ standard doesn’t define how it has to be implemented, but almost always it’s implemented as a wrapped pointer. So we can just interpret the first parameter of interceptMe as pointer to std::string.

Calling convention

Interceptor.attach knows nothing about ABIs and calling conventions, so there’s no easy way to extract function args in frida (using frida-trace via frida-compile can possibly help, but I haven’t tried it yet)

x86_64 macOS uses SystemV AMD64 ABI’s calling convention. Which means that first argument to interceptMe would be passed in RDI register, which is accessible in Interceptor.attach callbacks through this.context.rdi.

Let’s try it:

[Local::ProcName::target]-> Interceptor.attach(ptr("0x103e2c3d0"), function() { console.log(this.context.rdi); })

Now switch back to terminal with target and type something, while looking at terminal with frida. You’ll quickly see some address logged. It would stay the same though, since we are always passing same object to interceptMe.

Getting string contents – The easy way

What if you had some function which accepts pointer to std::string and returns it’s ::c_str()? It’d be cool, right? But sadly our target doesn’t have that function, and finding it in libc++ is hard. But wait, can’t we just “inject” our C++ code into target? Good news: we can – by making a dynamic library of it.

Shared libraries

Dynamic|shared libraries|objects are out of topic of this article. Let me quote Wikipedia:

A shared library or shared object is a file that is intended to be shared by executable files and further shared object files. Modules used by a program are loaded from individual shared objects into memory at load time or run time, rather than being copied by a linker when it creates a single monolithic executable file for the program.

Making a dylib

Let’s make a simple function which accepts refrence to std::string and returns it’s ::c_str():

#include <string>
extern "C" {
const char *toUTF8Ref(std::string &str) {
  return str.c_str();
}
}

extern "C" disables some C++ features, most importantly name mangling (hold on, you’re almost there!) for functions. Which means it would be much easier to load our function in runtime.

Compile it with clang: clang -dynamiclib getstr_dl.cpp -lc++ -o getstr_dl.dylib

Loading the lib

Sadly, frida doesn’t have any module for a convinient work with dynamic libs. POSIX standartizes set of function to work with dynamic libraries, which are declared in dlfcn.h. We’ll need dlopen, dlsym (and dlclose to tidy up). Have a look at dlopen(3):

void *dlopen(const char *filename, int flag); The function dlopen() loads the dynamic library file named by the null-terminated string filename and returns an opaque “handle” for the dynamic library. If filename contains a slash (“/”), then it is interpreted as a (relative or absolute) pathname.

void *dlsym(void *handle, const char *symbol); The function dlsym() takes a “handle” of a dynamic library returned by dlopen() and the null-terminated symbol name, returning the address where that symbol is loaded into memory.

Here’s how you’d load our toUTF8Ref in C++:

void *handle = dlopen("/path/to/getstr_dl.dylib", RTLD_LAZY); // or RTLD_NOW, doesn't really matter here
void *toUTF8Ref_ptr = dlsym(handle, "toUTF8Ref");
/* use toUTF8Ref_ptr */
dlclose(handle);

Same thing can be easily done in frida. Let’s get dlopen and dlsym functions:

const dlopen = new NativeFunction(Module.findExportByName(null, 'dlopen'), 'pointer', ['pointer', 'int'])
const dlsym = new NativeFunction(Module.findExportByName(null, 'dlsym'), 'pointer', ['pointer', 'pointer'])

But what about RTLD_LAZY | RTLD_NOW? Just look them up at your platform’s dlfcn.h. For macOS RTLD_LAZY is defined as 0x1

const RTLD_LAZY = 1;

Now load getstr_dl and get the handle:

var handle = dlopen("/path/to/getstr_dl.dylib", 1);

Or Error: invalid argument value instead of it… We have to alloc all the strings in process memory first, we can’t pass JS strings to NativeFunction!

var path = Memory.allocUtf8String("/path/to/getstr_dl.dylib");
var symb = Memory.allocUtf8String("toUTF8Ref");

Now finally load it and get address of toUTF8Ref:

var handle = dlopen("/path/to/getstr_dl.dylib", 1);
var toUTF8Ref_ptr = dlsym(handle, symb);

And make a NativeFunction of it:

var toUTF8Ref = new NativeFunction(toUTF8Ref_ptr, 'pointer', ['pointer']);

Using the lib

We know how to get pointer to string passed to inteceptMe, and how to get ::c_str() of it using our dylib. Let’s put everything together:

Interceptor.attach(ptr("0x103e2c3d0"), function() {
	console.log(Memory.readUtf8String(
		toUTF8Ref(this.context.rdi)
	)); 
})

Switch back to interceptMe again, and type Frida is cool. See the same string printed in terminal with frida. Viola.

The hard way

Name mangling

C++ has amazing features: classes, templates, namespaces, function overloading and etc. But linker knows nothing about those features, and that’s why name mangling is used by compiler:

Name mangling is the encoding of function and variable names into unique names so that linkers can separate common names in the language.

Unfortunately, C++ does not have a standard mangling scheme, so each compiler uses its own. In fact, C++ has no standard ABI, which introduces other problems for reverse engeneering. However, modern GCC, Clang and Intel complier use the same scheme, compilant with Itanium C++ ABI.

You can read more about C++ Name mangling and demangling here

However, most easy way to get needed symbol is… just to grep libc++ symbols!

Finding the symbol

In C++ std::string is std::basic_string<char>, so we want std::basic_string<char, ...>::c_str(). Let’s get symbols from libc++ and grep them:

$ nm /usr/lib/libc++.dylib | grep basic_string | grep 5c_str
000000000003f450 t __ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5c_strEv
0000000000042882 t __ZNKSt3__112basic_stringIwNS_11char_traitsIwEENS_9allocatorIwEEE5c_strEv

I’m using 5c_str and not just c_str because basic_string contains c_str, and thus latter wouldn’t be effective. (And because I know that `“c_str”.length === 5” :D)

Demangle those names (use c++filt or demangler.com):

_std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::c_str() const
_std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t> >::c_str() const

Ok, so we need first one, which has offset 0x3f450.

Intance methods

C++ instance methods usually have implicit parameter this, which points to instance method is called on. I wasn’t able to find any good references on this, so here’s my horrible example (it’s even more horrible because of my poor knowledge of asm). If you know a better argument/demonstration please share it with me.

class Cls { 
public:
	int z; // for padding
	int y = 0; // to force default ctor
	void bar(int x) {
		y += x;
	};
};

int main(void) {
	Cls inst;
	inst.bar(3);
}

Let’s compile and disassemble it:

_main:
0000000100000f10	pushq	%rbp
0000000100000f11	movq	%rsp, %rbp
0000000100000f14	subq	$0x10, %rsp
0000000100000f18	leaq	-0x8(%rbp), %rdi
0000000100000f1c	callq	__ZN3ClsC1Ev ## Cls::Cls()
0000000100000f21	leaq	-0x8(%rbp), %rdi
0000000100000f25	movl	$0x3, %esi
0000000100000f2a	callq	0x100000f96 ## symbol stub for: __ZN3Cls3barEi
0000000100000f2f	xorl	%eax, %eax
0000000100000f31	addq	$0x10, %rsp
0000000100000f35	popq	%rbp
0000000100000f36	retq
0000000100000f37	nopw	(%rax,%rax)
__ZN3ClsC1Ev:
0000000100000f40	pushq	%rbp
0000000100000f41	movq	%rsp, %rbp
0000000100000f44	subq	$0x10, %rsp
0000000100000f48	movq	%rdi, -0x8(%rbp)
0000000100000f4c	movq	-0x8(%rbp), %rdi
0000000100000f50	callq	__ZN3ClsC2Ev ## Cls::Cls()
0000000100000f55	addq	$0x10, %rsp
0000000100000f59	popq	%rbp
0000000100000f5a	retq
0000000100000f5b	nopl	(%rax,%rax)
__ZN3Cls3barEi:
0000000100000f60	pushq	%rbp
0000000100000f61	movq	%rsp, %rbp
0000000100000f64	movq	%rdi, -0x8(%rbp)
0000000100000f68	movl	%esi, -0xc(%rbp)
0000000100000f6b	movq	-0x8(%rbp), %rdi
0000000100000f6f	movl	-0xc(%rbp), %esi
0000000100000f72	addl	0x4(%rdi), %esi
0000000100000f75	movl	%esi, 0x4(%rdi)
0000000100000f78	popq	%rbp
0000000100000f79	retq
0000000100000f7a	nopw	(%rax,%rax)
__ZN3ClsC2Ev:
0000000100000f80	pushq	%rbp
0000000100000f81	movq	%rsp, %rbp
0000000100000f84	movq	%rdi, -0x8(%rbp)
0000000100000f88	movq	-0x8(%rbp), %rdi
0000000100000f8c	movl	$0x0, 0x4(%rdi)
0000000100000f93	popq	%rbp
0000000100000f94	retq

Let’s start with _main. After leaq on ..f18 line rdi contains address of inst, which is stored on stack. Then Cls::Cls() is called. As we know from calling conventions section, rdi contains first integer or pointer argument. After inst is initialized rdi is reloaded again, and 0x3 is loaded to rsi (esi is the same register but with different size). And Cls::bar(int) is called. And as we know, rdi has first argument and rsi has second argument. So actually Cls::bar(int) has an implicit first argument Cls * const this and second argument int x.

Now look at Cls::Cls(). After some calls it ends at ..f8c and moves 0x0 to address at rdi + 4 bytes. And y has offset of 4 bytes in Cls because first 4 bytes are used for z.

Let’s have a look at Cls::bar(int) too. The only useful instruction is add on ..f72: rsi is added at address calculated as rdi + 4 bytes on ..f72 line.

So we can see once more that this is passed as implicit argument to any non-static member function.

So, std::string’s c_str implementation would have pointer to string as first argument and would return pointer to char array.

Making a NativeFunction

Sadly, frida doesn’t have an API to access “private” symbols currently (Modules.enumerateSymbols would be nice :D). And needed symbol is private – notice the small t in nm output. So we’ll just use offset from nm output, and add it to the base address of libc++.1.dylib:

var string_c_str_ptr = Module.findBaseAddress('libc++.1.dylib').add(0x3f450);
var string_c_str = new NativeFunction(string_c_str_ptr, 'pointer', ['pointer']);

UPD: looks like Module.enumerateSymbols was added into frida. I’m still leaving this to demonstrate how custom offsets can be used.

Attaching

We should use the same code as we had with toUTF8Ref:

Interceptor.attach(ptr("0x103e2c3d0"), function() {
	console.log(Memory.readUtf8String(
		string_c_str(this.context.rdi)
	)); 
})

Switch back to interceptMe once more, and type I love Frida <3. See the same string printed in terminal with frida. We did it once again, yay :D

UPD: Simpliest way

You can also just look how std::string::c_str() works, and replicate the same behaviour. It’s done already for libc++ (and Apple’s libstdc++), and is avaliable from frida-codeshare here

Pass pointer from this.context.rdi to readStdString the same way you did with string_c_str and toUTF8Ref.