eBPF and bcc: Your First Program

Created at 2019-08-31 23:55:02 UTC
Last modified at 2019-09-01 22:44:28 UTC
Tags: ebpf , software

Skellig Michael in County Kerry, West Ireland. I'm going with the CS textbook technique of using unrelated cover pictures

In the last post, I gave a brief summary of what eBPF and bcc is. In this blog I'll summarize how you to install eBPF + bcc and write your first (python) program to do what amounts to a hello world.

Installation

Let's begin with a bit of context. eBPF is not something you install. as mentioned in the last blog entry, eBPF is the new, extended form of a previously existing syscall. The bare minimum kernel version required is 4.4. In my case, I'm running 4.15 on Ubuntu 18.04:

$ uname -a
Linux ########### 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.3 LTS
Release:        18.04
Codename:       bionic

Next up, you need to install bcc, or the eBPF compiler collection (notice how we love to ignore that starting e, it's a common trope). The main repo with bcc can be found in Github, under iovisor/bcc. Their install instructions are actually quite good and extensive but I want to note a couple of pitfalls with the instalation guide for Ubuntu I found.

It turns out that, while Ubuntu does have a package for bcc in upstream, it's both not the most up to date, nor does it have a python 3 package, only python. This is particularly bad considering IT'S ENCOURAGING PEOPLE TO STILL WRITE NEW PYTHON 2 CODE IN 2019 DAMMIT. Anyways, to get around the issue, I recommend using iovisor's upstream & signed packages, and additionally install the python 3 package (yes, it's not installed by default. No, I don't know why). Basically run:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 4052245BD4284CDD
echo "deb https://repo.iovisor.org/apt/$(lsb_release -cs) $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/iovisor.list
sudo apt update
sudo apt install bcc-tools libbcc-examples linux-headers-$(uname -r) python3-bcc

I also recommend installing ipython:

$ sudo apt install python3-ipython

Not because it's necessary, but it's great for playing around with bcc.

First Program

Now to the fun part. How do you actually get a working bcc hello world program? Well, let me show you what the hello world of bcc looks like:

from bcc import BPF

BPF_PROGRAM = r"""
int hello(void *ctx) {
  bpf_trace_printk("Hello world! File opened\n");
  return 0;
}
"""



bpf = BPF(text=BPF_PROGRAM)
bpf.attach_kprobe(event=bpf.get_syscall_fnname("clone"), fn_name="hello")

while True:
    try:
        (_, _, _, _, _, msg_b) = bpf.trace_fields()
        msg = msg_b.decode('utf8')
        if "Hello world" in msg:
            print(msg)
    except ValueError:
        continue
    except KeyboardInterrupt:
        break

Ok, that's quite a bit of code for a hello world. Let's step through it. First off, imports:

from bcc import BPF

This will import the class we use to create the BPF object

BPF_PROGRAM = ...

This large string is what contains the actual BCC code itself. Notice it's basically C code with a few restrictions imposed by the eBPF virtual machine. This takes us down a bit of a tangent to talk about:

The eBPF Verifier

One of the biggest issues with letting anyone with root access write and execute code inside the kernel is that they could, accidentally or maliciously, write code that causes the kernel to panic, seize up, or otherwise act in unsafe ways. To avoid this, before executing your code, the kernel will actually verify that your code is safe by restricting you from reading memory outside a given range, stopping any jumps either outside well defined space, as well as any jump backwards, and, if the flags are set properly, it will also prohibit any pointer arithmetic. You can read a bit more about the verifier here, under The eBPF in-kernel verifier, but the gist of this is that it's extremely cautious, and with the particularly large fallout that you cannot create any loop in your code (lest it becomes an infinite loop and crashes your kernel). Notice that this means that, technically, the BPF language is not turing-complete (which, considering where it runs, might be for the best).

Back to the topic at hand. Let's look at the hello() function in our bcc program:

int hello(void *ctx) {
  bpf_trace_printk("Hello world! File opened\n");
  return 0;
}

There's a few things happening here, but the most important part is the bpf_trace_printk() function. Where did that come from and what does it do? Well, this is one of many BPF helpers, or a function provided by the BPF virtual machine to perform some basic tasks. These are then wrapped in bcc as C functions. This particular helper will let you print out to the trace pipe, found in /sys/kernel/debug/tracing/trace_pipe.

Where do I find all these random helper functions, you may ask? Well, there's a README for that! https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md. Save this link with your life, in your bookmarks, or wherever you might save such links. It's a lifesaver of a reference guide when writing bcc programs.

Next up, the actual BPF initialization:

bpf = BPF(text=BPF_PROGRAM)
bpf.attach_kprobe(event=bpf.get_syscall_fnname("clone"), fn_name="hello")

Here we do two things. First, we compile and start executing our BPF program (first line). Secondly, we attach a kprobe to our function hello().

What the hell does that mean? Well, this brings us to the next topic:

What exactly am I tracing with BPF?

Broadly speaking, there's 4 categories of things you can attach your program to and trace:

kprobes
uprobes
tracepoints
USDT (User Statically-Defined Tracing)

Is this absolutely everything you can do with eBPF? Well, no, but for now let's limit ourselves to these 4 things. I've chosen this ordering for a reason. We can put these 4 in a table like so:

	User programs	Kernel
Old (ad-hoc)	uprobes	kprobes
New (stable)	USDT	tracepoints

So what does this separation mean? Well, it turns out that, with eBPF, not only can you trace kernel functions and entry/exit points, but you can also trace user-space code as well. Now, how you do either depends on whether you want to use the old or new functionalities. The old ones, such as uprobes and kprobes, have existed for longer than either eBPF or BPF have been around. kprobes in particular existed before to allow kernel module developers to debug issues in the kernel using module code that had to be baked into the kernel itself.

However, the biggest downside to kprobes is they don't have a defined interface, so whenever a new version of the kernel comes along, a subset of all kprobes could completely change and become incompatible. There is no guarantee or maintenance of either API nor ABI. To go around this issue, kernel developers decided to create a more formalized tracing API, which makes actual guarantees on consistency and compatibility. They peppered these tracepoints all around the kernel, at useful-to-trace points, like syscall entries and exits, interrupts, TCP events, and signals. You can find most of these in /sys/kernel/debug/tracing/events.

Meanwhile, in user-space, we have another two, parallel concepts: uprobes and USDT's, they serve a analogous purposes to kprobes and tracepoints respectively. The difference is these events get published from user space as opposed to coming from the kernel. They're extremely useful debugging tools.

Now that we have this context, we can finally explain our line of code:

bpf.attach_kprobe(event=bpf.get_syscall_fnname("clone"), fn_name="hello")

Basically, this will tell BPF that, whenever the kprobe with the name specified by the parameter event, it should call the function hello() in our BPF program. Here, we're trying to associate our code with the kprobe for the clone() syscall, so every time said syscall gets called anywhere on the system, hello() will get executed. The way we figure out the specific event that corresponds to that syscall, we fetch the name of the kprobe using a function provided by the python bcc library (which, on my system evaluates to "sys_clone").

I can't emphasize enough what we just did here, so I'll say it again. Every time any program on the machine calls the clone() system call, the hello() function of our program gets executed. Basically, every time you type a command into bash and press enter, our code gets executed. Every time a background process kicks off grep because the programmers couldn't be bothered to put a regex library on their binary, our code gets executed. Every time systemd decides to kick off some new process, our code gets executed. This gives us a truly awe-inspiring amount of control and visibility over our machine. Whereas you might miss a process that starts and ends quickly in a program that polls for process information (like top or ps), with an eBPF-powered program, you won't miss a single process going through your system.

Continuing on with our code (almost done here, just bear with me a bit more), we get our infinite loop with this line:

(_, _, _, _, _, msg_b) = bpf.trace_fields()
msg = msg_b.decode('utf8')
if "Hello world" in msg:
    print(msg)

So, what does the trace_fields() method do? Basically, it traces all the messages output in /sys/kernel/debug/tracing/trace_pipe and provides us with a 6-tuple of data (sometimes, it fails to unpack values due to different formats getting published, hence the except ValueError). It unpacks very useful information, such as the relevant PID, timestamp, and message printed out. We then search for "Hello world" in the message, and if it's present, we print out the message and call it a day.

Conclusion

To summarize, we managed to create a working Hello World program in eBPF. We were able to write a program that prints out "Hello world!" after every single clone() syscall going through our system (a bit of a strange use of eBPF, but ok). Next up, we'll talk about how you might make a more useful eBPF program (probably in C++, since performance might be key in eBPF programs running in production). If you want to get the code I'm using for these tutorials, go to my Github repo here. You can read the next entry in this tutorial here.