22 March 2014

Get Off My Lawn!

Using LD_PRELOAD maliciously as a mechanism to hijack functions which are usually resolved at runtime from a valid shared library isn't a new concept. It's been around for ages, not just for malicious intent, but also as a debugging technique.

For example, LD_PRELOAD came in quite useful for me when I was running gprof on a threaded application (which will not collect statistics on anything but the parent). You could then use a wrappers around the various pthread functions that manually set profile timers. setitimer(ITIMER_PROF, ...);.

This technique though, has had a resurgence in popularity in the rootkit world. This is probably due to the level of skill one must have in order to create such a thing. As hijacking functions in the kernel's syscall table has become a more tedious act now that it is no longer exported and read-only, most people cannot be bothered with writing kernel-based rootkits. This would require an IQ above 50 and a desire to do something proper.

What we are seeing now is a new generation of both hackers and administrators. The success of such rootkits is actually a failure caused by corporations drilling into our heads that to make a good living in security, Microsoft is the only game in town.

This new generation comes forth and asks themselves "OK, how do I make money?". Two doors are open: A creaking dusty one with a few trinkets strewn in front, the other, bright neon lights, sparkles and a carpet lined with money. The first door leads to a job doing Linux security development, the latter being everything Microsoft.

This is not to say that Linux lacks security, far from it. What it does lack, in my opinion, is proper security applications, written by proper developers, with proper knowledge of linux internals. What we have now is a scattered lot of open-source projects, most of which are of the poorest quality in both design and performance. Then some poor sob has to glue them all together in some fashion to create a semi-working analytics system. The result of which is usually far from stellar.

It is this lack of interest that has lead us to such ignorance, that a simple thing like LD_PRELOAD rootkits are being widely used successfully. Think of it this way: There is no real reason for an attacker to try very hard to hide, when nobody is trying very hard to find them.

The level of skill needed to maintain administrative rights on a Windows based operating system has grown exponentially since all efforts to thwart, monitor, detect, and contain, are due to proper security applications, written by proper developers, with proper knowledge of Windows internals.

The level of skill needed to maintain administrative rights on a Linux based system has stagnated. Just last year someone found a remote exploit for telnetd. Really? TELNET?

How is process formed?

Without going into nitty gritty details, an ELF binary contains some sections that inform the dynamic linker what it wants, and where it might be found. Unresolved functions will stay unresolved until they are explicitly called.

The first execution of an unresolved function will jump to a pre-defined area that, in turn, finds the real address of the function being called. After the function has been successfully found, the resolved address is patched into another area, so any subsequent calls will jump directly to the real function.

This is, of course, a very bland way to describe GOT and PLT. A subject which does not need another article written about.

One of the primary things one should know is, the dynamic linker is lazy, it will search for unresolved functions inside shared libraries using (by default) the paths GCC was compiled to look in. (see gcc -print-search-dirs). So if you have the function silly in both libfoo.so, and libbar.so, whichever library was queried first, wins the resolution game, completely ignoring the other.

In the end, no matter what order the dynamic linker searches, the value of LD_PRELOAD will always be searched first (same goes for entries within /etc/ld.so.preload).

Detecting & Avoiding LD_PRELOAD

The following sections describe a few cute tricks to determine if your program is, or has been, executed with an LD_PRELOAD method, and finally how to keep yourself safe from such idiocy.

The Obvious Method

Absurd amounts of debate over this have been made, and I personally find it unmaintainable and quite silly. The method? Statically link everything.

In reality, this heated debate is something of a relic. The last time I thought on this subject was a few nights ago at a friends house. Discussing the "old days" when this was considered relevant. That is: more than 10 years ago. We had some good chuckles, while realizing just how old we were.

Lest us not forget, if you're writing commercial software and using GPL-based licensed libraries, that's a big no-no.

The Lazy Method

A good friend of mine would always ask the following question during an interview: "What is the third argument of main?". While to some this is obvious, a majority would either answer with a question "there is a third argument to main?", or just sit in silence, too scared to answer.

But the question is a good question, and very relevant to this little article. The third argument is simply the local environment.

A quick example of using the environment argument to detect an LD_PRELOAD would look something like this.

This avoids calling the getenv function which could easily be overwritten by a preload. I usually wrap this functionality around some sort of obfuscation method, so it's not easy to see to the naked eye. Usually with XOR or something equally as silly.

So does it work?

$ ./blah
Yay!

$ LD_PRELOAD=stupid ./blah
no thanks

Apparently, but it must be noted that this is only a check local to your application, and I'm not 100% sure if you can even trust envp, maybe someone smarter than me could answer that; as I cannot be bothered to look right now. Also, the strcmp could be already hijacked. If it gets that scary, write a key/value parser using a macro or something.

The Assurance Method

This method involves detecting whether there has been a global LD_PRELOAD installed. As simple as it seems, it's still a viable method of detection.

Using the programming interface to the dynamic linking loader, we can open up specific libraries using the function dlopen(), and load the functions directly using dlsym() using the handle returned by dlopen() as the first argument.

For each of these lookups, we call dlsym() on the function name using the argument RTLD_NEXT. When using RTDL_NEXT, you are essentially emulating what the linker would do at runtime. This would include the LD_PRELOAD path.

Knowing both the "good" address, and the "other" address, we can compare to see if they differ, and if so, terminate the application. Optionally, we can use the funciton dladdr to give us some more information about the potentially malicious hijack.

So does it work?

$ ./blah
Checking malloc
Checking accept
Checking execve
Checking open
Checking rmdir
Checking fopen
yay!

$ LD_PRELOAD=./hijack_stuff.so ./blah
Checking malloc
Function malloc, hijacked from the file '/home/mthomas/hijack_stuff.so' at address '0x7f54403a168c'

Looks as if it worked, but it must be noted I did not load libc using the full path. This means there is a potential for an attacker to sit a libc.so.6 somewhere in the search path. I would suggest passing in the full path to the library you "trust".

The Overload Method

All of the previous methods we looked at were simply reactive, terminate upon detection. This is quite silly as your application has now become useless due to the paranoia of being hijacked with malicious code. Idealy we want our applications to run, but still be guarded from potential malicious hijacks. And to be even more picky, I do not want to bother with modifying a bunch of work I've already done to use "safe" functions. Take a look at VSFTPD at some point in your life, and be horrified by creating wrapper functions around direct system calls.

After implementing this method, I referred to it as "cute". This is another word for "useful-abuse". We abuse several GCC builtins and funny __asm__ hacks. With that said, it should be obvious that this is a very gcc-centric method.

The general flow is as follows

  • Create a new library
  • Choose functions you use that may be targets for a LD_PRELOAD hijack
  • Create an initializer function that uses the dl API as described above, and store the real function addresses.
  • Create wrapper functions around each of these functions, which in turn, call the previously saved versions.
  • Trick your program into thinking the wrapper functions are the real thing. I.e., my_fopen() is exposed as fopen()
  • Hide the initialization phase, so that the only thing one must do is link against this new library.

For the sake of brevity, we will only use open and fopen in this example.

The overloading trick is not included in this section, this will be exposed and explained afterwards. But this is our base layout of the library.

So in the above code, we have used dlsym to load both the fopen and open functions into the global variables fopenfn and openfn respectively. Two new functions were created, simple wrapper functions that call the pointers found from dlsym.

In the header file you will notice the use of the GCC constructor attribute. The number number there is the "priority". 0-100 are reserved, so 101 is the lowest we can use to make sure this is always called first. This code will be executed prior to main(), somewhere in the _start section.

At this point, we would still need to export all of the wrapper functions and modify all of our code to use them instead of the real thing. This is where the previously stated "cute" part comes into the picture.

It's very simple, create a declaration for your wrapper functions using GCC's builtin __asm__(), the argument of which is the name of the function you wish to "redirect" from into the wrapper function.

I implemented a tiny macro to do this for me:

#define OVERLOAD_SYMBOL(x, y) typeof(x)(x)__asm__(y)

The trick here is the y. This is the function you wish to redirect to x. So take the above code, and simply add this to the bottom

Now for the fun part. Create a simple directory tree.

mkdir ~/preload_reload/
mkdir ~/preload_reload/safe_libc
mkdir ~/preload_reload/hijacker

Place the following code into ~/preload_reload/hijacker/hijack.c

Place our test code which simply attempts an fopen and open on a file supplied on the commandline, into ~/preload_reload/main.c

Finally, add the following files into ~/preload_reload/safe_libc

Next, change your working directory to ~/preload_reload and compile everything, initially we will compile our main program WITHOUT safe_libc linked.

# generate the LD_PRELOAD shared object
gcc -fPIC -shared -o hijacker/hijack.so hijacker/hijack.c -ldl

# compile our main.c program without linking against safe_libc
gcc -L./safe_libc -I. main.c -o main

Now attempt to run ./main with the hijacker/hijack.so as an LD_PRELOAD

$ LD_PRELOAD=./hijacker/hijack.so ./main main.c
HIJACKED FOPEN!
HIJACKED OPEN!

This is bad, so let's compile and link our libsafe_libc library

# Generate libsafe_libc
gcc -c -o safe_libc/safe_libc.o safe_libc/safe_libc.c
ar rcs safe_libc/libsafe_libc.a safe_libc/safe_libc.o

# Recompile main, this time linking against libsafe_libc
gcc -L./safe_libc -I. main.c -o main -lsafe_libc -ldl

Let's attempt that LD_PRELOAD hijack again

$ LD_PRELOAD=./hijacker/hijack.so ./main main.c
Initializing safe_libc!
safe_fopen()
safe_open()

As we can see here, we have deflected the attempted LD_PRELOAD hijack of both open and fopen without touching the real code. Pretty neat, yeah?

Closing Remarks

I hope this was of interest to you, and I apologize if I glossed over anything that might be of more interest. I am just attempting to throw out a few methods to protect yourself, and your code from such silly attacks. Also, I started to get lazy towards the end of writing this. So if it seems rushed, that's the reason.

I also apologize for any inaccuracies, feel free to call me out, and or physically bash me with a stick.

While each method here could be used independently, they can also be used in conjunction for both detection and protection. Just my opinion. This is not fool-proof, as libdl can be hijacked too. And nothing is stopping someone from doing a runtime injection. I am only trying to convey simple methods to deter simple attacks.