Debug & Chill #1
Hi there!
Welcome to the first episode in my debugging series. I originally started writing this back in 2020, but only now have I had the time to translate and rewrite it into a fun, hands-on exploration of debugging. Throughout this series, I’ll show you various ways to approach problems, explaining my thought process along the way.
Let’s dive right into the first debugging session!
Background
It’s around 2020, and my coworker has told me about a new technology called OpenNebula (often referred to as “one”). If you’re unfamiliar, here’s a brief explanation:
“OpenNebula is an open source platform delivering a simple but feature-rich and flexible solution to build and manage enterprise clouds for virtualized services, containerized applications, and serverless computing.” — ReadMe.md, 2025
In essence, it’s somewhat similar to the VMware stack (ESXi, vSphere, etc.). It lets you create virtual machines, set up networks, and connect these VMs much like you would with any major cloud provider.
I won’t delve into broader virtualization topics here, but stay tuned: a future episode will cover that in more depth.
The Setup
For this scenario, my coworker created two Ubuntu VMs: Client and Server, both on the latest Ubuntu version. He also set up a network between them and assigned IPs on the same subnet.
So the network looks something like this:
The Problem
When we try to run the following command on the Client VM:
ssh root@1.1.1.2
…it just hangs. No output, no prompt—nothing.
What would you do in this situation?
(Pause here if you want to think about possible issues before reading on.)
From experience, I can think of three main categories of problems:
Something is misconfigured on one of the machines.
There’s a network configuration problem within OpenNebula.
There’s a hardware issue causing packet drops.
To check for network/hardware issues (2 and 3), I ran a simple:
ping 1.1.1.2
and the ping worked flawlessly. That rules out basic connectivity problems.
Next, I tried:
python3 -m http.server
on the Server VM, then from the Client:
curl http://1.1.1.2:8000/
No luck—curl never got a response. This suggested an issue at the transport layer (TCP/UDP) rather than raw connectivity (ICMP ping was fine, but TCP traffic failed).
Digging Deeper With Tools
tcpdump is a personal favorite for troubleshooting network issues because it lets you watch packets in real time. I ran it on both VMs while attempting the curl command again. On the Client side, I could see a SYN packet being sent, but it never appeared on the Server side. If packets aren’t arriving, you need to figure out where they’re getting dropped.
A critical thing to know about tcpdump
is that it captures inbound packets before firewall tools like iptables
, So if you don't see them captured as inbound packets, that can't be related to the firewall.
In general, The “flow” of a packet in Linux goes something like this:
IN: Wire -> NIC -> tcpdump -> netfilter/iptables
OUT: iptables -> tcpdump -> NIC -> Wire
If you’re interested in an in-depth explanation, you can read this article
Another valuable tool is ethtool, which allows you to query and adjust network driver settings. Running:
ethtool -k eth0
on both VMs, I noticed TCP offload was enabled.
TCP Offload is a technique that offloads some IP or TCP tasks to the network interface card (NIC) itself. It can dramatically improve performance under heavy loads, but if the driver configuration is buggy or if there’s an incompatibility, it can result in packet loss or other odd behavior.
I decided to disable it temporarily:
ethtool -K tx off rx off
With offloading disabled, I retried the HTTP request and… success! SSH also worked again.
While I didn’t investigate the deeper cause, I suspect it relates to driver settings or OpenNebula’s own host configuration. Turning off offloads resolved the issue in this case.