Anti-instrumentation techniques: I know you’re there, Frida!

frida-twitter

Some days ago, due to a task I’m still doing, I started using Frida. At first glance, it is a great option, specially the Python bindings, to develop quick scripts to instrument a program. It’s multi-platform, multi-arch, it has binding for Python, Node.js and .NET and many other benefits.

After some days of usage, I found some lack of functionality on some particular APIs but this is understandable due to the fact that has less years of development than, for example, Pin [1] and DynamoRio [2], which are more mature. However, Frida authors are continuously adding new features (in fact, while I’m writing this post, a new version of Frida, 6.0, was released some hours ago), fixing bugs and dealing people like me who are constantly doing questions and throwing useless ideas 😛 and at this point I must thank Ole André Vadla Ravnås, the creator of Frida and main developer, for not killing me after days receiving tons of PMs from me on Twitter bothering him with questions.

What’s Frida?

Ok, now, first thing first, for those who are still wondering WTF is dinamic binary instrumentation and Frida, here’s what the author has to say about it:

So what is Frida, exactly?

It’s Greasemonkey for native apps, or, put in more technical terms, it’s a dynamic code instrumentation toolkit. It lets you inject snippets of JavaScript into native apps on Windows, Mac, Linux, iOS and Android. Frida also provides you with some simple tools built on top of the Frida API. These can be used as-is, tweaked to your needs, or serve as examples of how to use the API.

Some words about DBI

So, what makes Frida, and any other binary instrumentation framework, so special?. Dynamic binary instrumentation is a method to analyze the behavior of a program at runtime injecting instrumented code into the target process. This instrumented code will be executed normally on the process but will allow us to interfere or hook some parts of the target process code in order to analyze it. A more detailed definition can be found here.

For example, let’s say we want to count how many times an INC instruction gets executed while the process is running. Then, we “instrument” all the INC instructions residing on the process code and set a callback to our own function where we are going to increment a variable, starting from 0 to n. This is exactly what this example of Pin does.

DBI frameworks Vs Debuggers

Some readers can say: “Hey, but you can do this using a debugging API and some hooks”. Yes, we can, but … one of the premises of binary instrumentation, or instrumentation in general, is to keep transparent. So, while using a debugger and hooks gives you the same result, at least on this particular example, a debugger is not transparent at all and its presence can be easily detected, as demonstrated by the numerous papers and tutorials you can find on the web. A nice compendium written by Peter Ferrie about anti-debugging techniques can be found here.

Transparency is a good thing for us as reverse engineers because these anti-debugging techniques are commonly used in packers and malware, so, using a DBI framework could be an option to keep in the shadows.

Transparency is provided by the fact that the original code of the program is never executed. A “copy” (instrumented code) of the original code is done and executed somewhere on the memory. The original code of the program gets NEVER executed.

Currently, there many tools out there relying on DBI, for example, to unpack code or analyze malware [3] [4]. By using a DBI framework we avoid anti-debugging techniques, packers and malware were not aware about this new era of DBI. Until now.

Malware updates its defensive line

Some days ago, my friend, Francisco Falcón, let me know that a new malware is using anti-instrumentation techniques. Even though this could be the first malware out there using anti-DBI techniques, anti-DBI techniques are not new. Francisco and I published “Dynamic Binary Instrumentation Frameworks: I know you’re there spying on me” on Recon 2012. On this talk, we enumerated a lot of anti-DBI techniques (mostly focused on Intel Pin DBI framework) and we state that future malware may use this techniques in order to difficult the job of a reverse engineer. We were right.

There was also other presentation of this kind on BlackHat USA 2014, presented by Xiaoning Li and Kang Li, called “Defeating the Transparency Features of Dynamic Binary Instrumentation“. This presentation focused mostly on DynamoRIO.

Following the previous work and given that I found my self using again a DBI framework, I decided to do a little research about how this framework works and how can we detect its presence.

Returning again to Frida

The main idea of this post is to analyze Frida’s behavior and see if we can take advantages from it in order to detect its presence.

From what I was experimenting these days, Frida has two modes of operation:

  1. This first one, I’ll call it the “normal” mode, is the one provided through the Interceptor API. When using Interceptor, Frida will hook the function we want and put some code on our callback in order to, for example, get function arguments. Why do I call this mode “normal”?. I do it because this is a very well known method of hooking code on a live process and has nothing to do with instrumentation per se. When using Interceptor, Frida writes a trampoline (JMP) at the function prologue which redirects the code flow to our callback. After executing our callback code, Frida returns execution to the real function. From the most orthodox point of view, this is not a good method of “instrumentation” because its violates the basic principle of transparency. First, you are really executing original program code and second, you are overwriting program original code.
  2. The second method is provided by the Stalker API. When using Stalker, you a getting instrumentation per se. A copy of the original code is made and copied to a previously allocated memory region. This copy of the code has the instrumentation code with it and is this copy of the code the one that is really executed. However, when using Stalker, you can only receive events when a CALL/RET or ANY instruction is executed. Currently, you can’t instrument a given instruction.

Please, don’t misunderstand what I say in point 1. I’m not saying “Frida is bad, don’t use it” because of the use of hooks, no. I’m saying that if you need real transparency, don’t use the Interceptor because it acts mostly as a debugger and we know that debuggers can be easily be detected.

Taking advantages of the Frida’s behavior

The following lines will describe some inner behavior of Frida that will allow us to detect or assume that is instrumenting our process.

Before starting, you may wonder yourself “Why this guy wants to help the bad guys giving them a tool to improve its defensive line?”. Well, this is not that way. I’m not helping the bad guys, but maybe the bad guys are already using some of these techniques in the wild, in fact, I already pointed you a link where a new malware is taking advantages of anti-DBI techniques to detect Pin/DynamoRIO. So, it is a good idea to give the good guys the same information to be aware of what they can expect when analyzing malware/packers/malicious code.

That said … let’s rock!

Detect Frida by hooks

When we mentioned Frida’s Interceptor API, we highlighted the use of hooks at function’s prologue. Frida writes a trampoline to a stub where the code we wrote on our callback gets executed. For example, take a look at the following script:

import sys
import pefile
import frida

def getExportedFunctionRva(func_name, func_module):
	func_rva = -1
	pe = pefile.PE(r'C:\Windows\System32\%s' % func_module)

	if hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
		if type(func_name) == str:
			for symbol in pe.DIRECTORY_ENTRY_EXPORT.symbols:
				if symbol.name == func_name:
					func_rva = symbol.address
		elif type(func_name) == list:
			func_rva = []
			for name in func_name:
				for symbol in pe.DIRECTORY_ENTRY_EXPORT.symbols:
					if symbol.name == name:
						func_rva.append(symbol.address)
		else:
			raise 'Unknown data type.'
	else:
		print 'Module %s doesn\'t have a DIRECTORY_ENTRY_EXPORT entry'
		sys.exit(0)
	return func_rva

def on_message(message, data):
	print "[%s] -> %s" % (message, data)
	#print '[+] Received msg from process: ' + message['payload']

def main(target_process):
	rtlallocateheap_rva = getExportedFunctionRva('RtlAllocateHeap', 'ntdll.dll')

	session = frida.attach(target_process)
	script = session.create_script("""
var RtlAllocateHeapAddr = Module.findBaseAddress('ntdll.dll').add(0x%x);
console.log('HeapAlloc address: ' + RtlAllocateHeapAddr.toString());

console.log('>> Hooking ntdll!RtlAllocateHeap...');

Interceptor.attach(RtlAllocateHeapAddr, {
	onEnter: function (args){
		console.log('[+] RtlAllocateHeap called from ' + this.returnAddress.sub(6).toString());
		console.log('[+] HeapHandle: ' + args[0].toString());
		console.log('[+] Flags: ' + args[1].toString());
		console.log('[+] Size: ' + args[2].toString());
		},
	onLeave: function (retval){
		console.log('[+] Returned address: ' + retval.toString());
		}
	});
""" % rtlallocateheap_rva) 

	script.on('message', on_message)
	script.load()
	raw_input('[!] Press <Enter> at any time to detach from instrumented program.\n\n')
	session.detach()

if __name__ == '__main__':
	if len(sys.argv) < 2:
		print 'Usage: %s <process name or PID>' % __file__
		sys.exit(1)

	try:
		target_process = int(sys.argv[1])
	except ValueError:
		target_process = sys.argv[1]

	main(target_process)


As you probably guessed, the script intercepts RtlAllocateHeap and prints out its arguments and return value.

If we look at the RtlAllocateHeap function located at ntdll.dll using a debugger, we can see that a trampoline was written there:

frida-rtlallocateheap-hook

Normally, where you see the JMP there is a MOV EDI,EDI. This JMP goes to the Frida stub:

frida-stub

This stub is located in a previously allocated memory region and you can see some calls to the frida-agent-32.dll (we are on a 32-bit platform).

In this case, the detection will be based on the detection of these trampolines. To detect these inline hooks in memory is fairly simple: calculate the address of the function, disassemble the prologue and check for the JMP opcode (0xE9). There are also available tools to accomplish this task, like the one mentioned here.

This is not a Frida-only detection technique but a more general approach. However, it will tell you that something wrong is happening inside your process.

Detect Frida by frida-agent-*.dll

As you probably noted in the previous section, Frida injects its own DLL in order to perform the instrumentation. So, this is a very good thing to take advantage off. The detection is, again, fairly easy: we just enumerate all the module in our process and get its name, if any match with “frida-agent-32.dll” or “frida-agent-64.dll“, then, bingo!.

In other platforms the name of the Frida agent may change, for example, to frida-agent-*.so (on Linux, for example).

In the following screenshot, we can see the injected DLL in the module list of the notepad.exe process:

frida-agent-dll

Detect Frida by string patterns

This is another very basic detection. Given that a module is injected into our process, we can search for possible and unique strings that reveal the presence of Frida.

Luckily for us, there are tons of string we can use in order to make a reliable detection. There are strings that seems to be the names of internal functions used by Frida:

frida-strings

Detection will be to search in memory for some specific string patterns. If we found them, then, Frida is there once more. We must not rely on just one string but some of them to make a more reliable detection.

Detect Frida using exported functions name

The Frida agent module has an IMAGE_DIRECTORY_ENTRY_EXPORT we the export table is located. There, we can find just one exported function used by Frida which is enough for us to perform a detection:

frida-agent-export-table

By enumerating all the modules in our process, looking for an export table on each and see if the “frida_agent_main” is there, we can perform a nice detection.

Detect Frida by parent process

When instrumenting a process with Frida you can attach to a running process using frida.attach or to start it from the beginning by using the frida.spawn/frida.attach combination. When using spawn(), Python will be responsible for loading and executing our Frida script therefore will be our parent process, as the following screenshot demonstrate:

frida-parent-process

During a normal execution, the parent process of an application is explorer.exe. This difference on the parent process of an application ca be used to detect an abnormal behavior.

Detect Frida by named pipe

As we’ve seen, Frida injects a new module (DLL), called frida-agent, into the instrumented process. This agent is the way Frida has to communicate with Python, the agent acts as a server and Python as the client side. In order to communicate Python and the shared library, Frida uses a named pipe. A handle to the named pipe is needed in order to write to-read from, so, be enumerating all the opened handles on our process and checking for its type and name we can detect, once again, the presence of Frida.

In the following image, you can see the named pipe used by Frida:

frida-named-pipe

Some considerations about these techniques

All these techniques were developed using the latest version of Frida at the moment, v6.0, on Windows 7 Ultimate x86 and the frida-python bindings. Perhaps, there might be some differences when running Frida in other platforms or architectures, so, some of the presented techniques could be OS dependent. Honestly, I haven’t tested yet. But I think this is a good starting point.

Conclusion

As you can see, even though dynamic binary instrumentation has more transparency than other dynamic approaches, like a debugger, it’s still has some little “issues” we can take advantage off in order to detect its presence.

One more thing, I wrote some eXait plugins in order to test these techniques automatically, grab them from here.

I hope you enjoyed reading this post as much as I enjoyed writing it.

See you around!

References

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s