Learning Go Binary Reverse Engineering through Malware Analysis
Posted 2024/11/06
My last malware related project that I covered on this blog was the setup of my honeypot server and some analytics on the traffic it was receiving. I set up that honeypot server with the intention of collecting malware samples to reverse engineer, and this bot was one sample that piqued my interest. Here is my journey into the world of reverse engineering Go binaries and learning how Go works at the low-level1
Threat Intelligence
Just before we talk about analyzing the binary and Go being weird, here is some basic threat intelligence information.
Indicators of Compromise
1 2 3 4 5 |
|
File Information
File Type: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=HjkOUQdTP3LWmpfPkHeS/c71tNEcp__rNDf2Dahrq/azEhjfHtK4nvNTv5WH_D/B5Ck3lybWWsL94Sp9e2X, stripped
File Size: 2.21 MB (2321304 bytes)
DiE Engine Output: ELF: Go(1.10.x-1.17.x)[EXEC AMD64-64]
GoAnalyzer Output: go1.13.8
MITRE ATT&CK
- Discovery
- T1046: Network Service Discovery
- Credential Access
- T1110: Brute Force
- T1110.001: Password Guessing
- T1110.003: Password Spraying
- T1110: Brute Force
- Initial Access
- T1078: Valid Accounts
- T1078.001: Valid Accounts: Default Accounts
- T1133: External Remote Services
- T1078: Valid Accounts
- Execution
- T1059: Command and Scripting Interpreter
- T1059.004: Command and Scripting Interpreter: Unix Shell
- T1059: Command and Scripting Interpreter
- Defence Evasion
- T1027: Obfuscated Files or Information
- T1027.008: Obfuscated Files or Information: Stripped Payloads
- T1027: Obfuscated Files or Information
- Command and Control
- T1095: Non-Application Layer Protocol
- T1571: Non-Standard Port
- Impact
- T1499: Endpoint Denial of Service
- T1499.001: Endpoint Denial of Service: OS Exhaustion Flood
- T1498: Network Denial of Service
- T1498.001: Network Denial of Service: Direct Network Flood
- T1499: Endpoint Denial of Service
Initial Attack
The initial attack on my honeypot for this particular sample occurred on August 7th, 2024. It appears that an automated scanner at 194.50.16.221
(Alsycon B.V. VPS) spotted my “unsecured SSH server” and attempted to run a series of shell commands on the system. The scanner tried this a few times throughout that week, and most attacks looked like the following
1 2 3 4 5 6 7 8 9 10 |
|
Some analysis on this command; it first extracts CPU information from the system and attempts to change the user password. It then attempts to kill other malware running on the system named bin.x86_64
, this is a pretty common name for malware binaries that I have seen on my honeypot. Afterwards, it tries to download the malware using both wget
and curl
into the /tmp
directory, gives the file the appropriate permissions, and executes it. The final thing which piqued my interest is that the scanner went and blocked the IP addresses of competing malware’s command & control (C2) infrastructure using iptables
. This was really interesting because port scanning these IP addresses helped me find new web servers and FTP servers filled with more samples to analyze in the future.
Like the malware sample, the scanner appears to be written in Go as it connected with the SSH client version string SSH-2.0-Go
.
Recovering Symbols and Other Ghidra Setup
The binary that was acquired is an ELF file that is stripped of debugging symbols. This means we shouldn’t have access to function names and variable names as written by the malware author. However, the Go compiler is very silly and doesn’t strip function names from binaries completely, as they are included in the .gopclntab
section of the binary and used by the Go runtime. This section of the binary is a function table that contains the function address and function metadata like the function name. Luckily, other people have made extensions and scripts for recovering these function names! Personally, I use GolangAnalyzerExtension2 for Ghidra for the task of recovering these function names, string literals, and interface
/struct
type definitions.
Another quirk is that Ghidra doesn’t understand Golang strings normally because the string type in Go is defined like this:
1 2 3 4 |
|
The string char*
pointer is pointed to a part of the .rodata
part of the binary. This .rodata
section contains a binary blob of all string literals used by the program and runtime, appended together into one massive string, and strings used by the program are statically or dynamically initialized into the string
struct with data from this blob by the compiler or runtime. This also requires some scripting to help Ghidra understand, but the GolangAnalyzerExtension does this for us 🎉.
Type definitions for structs are also bundled into the binary for >= Go 1.5 and can be extracted with GolangAnalyzerExtension or GoReSym.
Random Quirks about Go
Most decompilers are designed to decompile binaries to a pseudo-C language, so using them on binaries written in languages like Go usually results in decompiler output that is very messy. Different tasks the Go runtime does like garbage collection and exception handling get mixed into the decompiler code, and reading the code involves picking out which instructions belong to the runtime, and which instructions belong to the actual program you are reverse engineering. Go also does a lot of other things differently than C/C++ like calling conventions and data types.
First, let’s look at the Go compiler’s calling conventions. Calling conventions are how different functions within a program are called, and what the conventions on passing arguments and return values between functions are. Most compilers targeting x86_64 systems push arguments into registers and receive return values through registers. Golang, on the other hand, pushes all arguments onto the stack and returns values using the stack. Most C/C++ compilers also have callee-saved registers that are used for temporary values throughout the execution of a function, whereas Golang spills temporaries onto the stack. For more information about the calling convention on an assembly level, Dr. Raphael Poss has a very thorough article on his blog.
How does this impact us though when doing reverse engineering? Ghidra expects argument variables to passed through registers and doesn’t really understand Go’s calling convention, especially with optimized binaries. Enabling Decompiler Parameter ID analysis helps somewhat in identifying the number of arguments and return values, but many temporary values and variables will have to be manually analyzed by tracking the use of different stack memory addresses.
Now onto Go data types. As mentioned earlier, Go doesn’t use regular C-style strings like C/C++. Each string is a struct of a char*
-like data pointer and an int
value holding the string’s length. This is also true of many array/slice types in Go. Slice types are made up of a data pointer to the values in memory, an int
value holding the array size and an int
capacity value if the size is mutable3. This means that when a function has a string or slice argument, you’ll usually see two or three arguments for each string/slice passed in Ghidra’s decompiler output. Any interface types are also consist of two pointers, with a pointer to a vtable and another to the interface data.
For example, here’s a what a call to net.Dial(network, address string) (Conn, error)
4 from the malware’s code looks like in Ghidra:
1 2 3 4 5 6 7 8 9 10 |
|
With these quirks in mind and remembering that the decompiler output also includes instructions from the Go runtime, reverse engineering Go binaries isn’t too painful with the right tooling, having the Go documentation and standard library source code handy, and a bit of pattern recognition… Most of the work becomes developing an intuition for which instructions belong to the runtime and then ignoring them, translating the rest of the instructions from decompiler output to Go source code. There’s also fun quirks in Go’s concurrent Goroutines, WaitGroup
s5, and defer
statements, but surely that won’t become a problem for us later…6
Back to the malware
We figured out how Go works! Now back to the silly little malware that someone tried attacking me with. GolangAnalyzer and GoReSym managed to find this list of functions (note that the functions ending with .func1
are overloaded versions):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Let’s start by reversing main.main
like any sane person doing reverse engineering would do. This function is pretty simple if you ignore all the random instructions the Go runtime is doing, it runs main.ensureSingleInstance
, sets up a couple variables, and calls the main.listenForCommands
function (note that any Go code in this section is my best-effort guess at translating Ghidra’s gibberish back to Golang and probably won’t compile):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
The function has a reference to the InfectedNight Mirai variant, but from what I could find, there is no resemblance between this sample and the InfectedNight sample that IBM X-Force made a report on other than the string “InfectedNight_did_its_job!”.
main.ensureSingleInstance()
is pretty simple, the function opens a TCP listener on a hard-coded port, and if the listener fails to open because a process is already listening on that port, it exits the program.
main.listenForCommands()
is where the code gets interesting. This function contacts the C2 server over TCP, performs a custom handshake, and then continuously listens for binary encoded commands. My understanding of the handshake and custom TCP protocol is flaky as by the time I stopped procrastinating this project, the C2 server had been taken down 😠and I couldn’t try dynamically analysis on the C2 server. This function clearly describes to us that commands include a command number, a victim IP, and attack duration, and that there are 4 different attack types and a command to kill the malware in a switch statement7:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|
If you know anything about Go, you’ll notice that this is our first function with concurrency! The code above has silly syntax where it says go f()
which is calling that function as a Goroutine.
Super fast intro to Goroutines and WaitGroup
s:
A Goroutine is how you can split a function off into its own little thread so that you can do tasks concurrently. However, if you split a task off and want to wait for it to finish, you need some way of knowing when that thread is complete. This is what WaitGroup
s do, they are basically just a counter that lets you count how many Goroutine threads are active. By passing the WaitGroup
to the Goroutine as an argument, it can decrement the counter when it is finished execution.
Why did I mention that these would become a problem earlier? Because Ghidra struggles to produce comprehensible decompiler output when Goroutines are called! The line go handleAttack(victim, duration)
ends up looking like this in Ghidra:
The runtime.newproc
function is how Goroutines are actually implemented, but this output has no information about the Goroutine’s arguments! Finding the arguments for the Goroutine basically involves reverse engineering the routine’s function signature, staring at the assembly to see what is at the top of the stack at the given moment, and making educated guesses about what the function expects. I said reverse engineering Go wasn’t toooooooooo painful, right? Surely this won’t be the only weird Golang construct that will give us problems…
Anyways, I love the super descriptive function names we are getting, it’s funny that our attacker tried to hide these names from us by stripping the binary and the Go compiler included them anyway. Let’s continue down the rabbit hole of questionable design decisions in these attacks.
handleAttack()
I started looking at this function thinking it would be a pretty standard TCP packet spam attack… but wait, what’s sendMinecraftPackets()
? Also, we have our first use of WaitGroup
s here!
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Turns out our malware author decided the most effective way of doing a TCP packet spam attack was to simultaneously send large TCP packets and simulated Minecraft packets, for some reason. Those two functions look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
You might notice we have another weird Golang construct here, the defer
statement! A defer
statement is used to make a function call happen at the end of the function, and is usually used for function cleanup. These also look very weird in our decompiler output and are something you have to look out for. The defer
statement itself looks like this:
and the return statement for any function with a defer
statement ends up looking like this:
Looking out for these lines in your decompiler output lets you know what function is being deferred and when it eventually gets called. In this case, the function defers calling the Done()
method of our handleAttack()
function’s WaitGroup
, letting the parent function know that the routine is finished by decrementing the WaitGroup
counter when it is done running. defer
statements are also used throughout the program to close network sockets when a function completes execution.
SendRawTCP
and SimulateMinecraftPacket
are pretty simple functions that send TCP data using sockets and generate Minecraft packet data respectively.
handleUDPFlood()
, handleIPIPAttack()
, and handleTCPJunkAttack()
With all the quirks of Golang finally out of the way for the rest of this binary, figuring out what each function and attack does isn’t too hard. None of the other attacks have as questionable or interesting code as handleAttack()
:pensive:.
All the attack functions are broken down into one top level handle[attack_name]()
function that handles WaitGroup
s and calls Goroutines to make attacks run with multiple threads, a handle[attack_name].func1()
overloaded function that handles decrementing the WaitGroup
, and lastly a send[attack_name]()
function that has the implementation of the actual attack.
handleUDPFlood()
is a simple UDP data spam attack that just spams random data at the target.
handleIPIPAttack()
is an implementation of an IP fragmentation attack. Uses a custom IPv4Header
struct that GolangAnalyzer defined for us.
handleTCPJunkAttack()
appears to be a SYN flood attack. Uses a custom TCPHeader
struct.
And just like that, we’ve figured out everything that the malware bot does! It is a simple DDoS bot that communicates with a C2 server over raw TCP and implements 4 different types of DDoS attacks using Goroutines for concurrency.
Overall thoughts
I tried searching around grep.app and GitHub to see if the code for this malware got leaked on GitHub, or it was reused code. The function name SimulateMinecraftPacket
seemed unique enough that if it existed, I would find a match. Nothing showed up, which seems to suggest the malware author wrote this code from scratch.
The lack of any cryptography, hard-coded information like C2 IPs, and the lack of any detection evasion other than self-deletion seems to suggest that the malware author is not very competent. I wish that the C2 server that his malware used still was up so that I could try dynamically testing this sample and dissect its network protocol completely. Alas, that is not the case, so I got as much information as I could statically.
However, it was a fun exercise in learning how Go binaries work and discovering all the weird quirks of the language! Might take a break from malware research for now and look at doing some vulnerability research projects. Thanks for reading!
-
Also was learning Go as a programming language at the same time, never really worked with it before. This is the worst possible way to learn Go, if you are actually interested in the language, decompiling the language is a pretty painful way to do it. ↩
-
An amazing Ghidra extension that I am now the Nixpkgs package maintainer for! ↩
-
string
s in Go are not mutable and as such don’t have a capacity value ↩ -
func Dial(network, address string) (Conn, error)
for those unfamiliar with Go syntax means two arguments namednetwork
andaddress
of typestring
, and two return values that are of typenet.Conn
anderror
↩ -
Definitely did not need to call a friend who works for Google to get him to explain
WaitGroup
s properly to me… /s ↩ -
”…reverse-engineering Go binaries is about as fun as pulling teeth from an adorable blue gopher.” - Emily Trau (2022) ↩
-
The switch statement was basically a really weird double-nested if-else statement for some reason… “Wait till you learn that switch statements is actually syntactic sugar for if-else” - my friend at Google ↩