eBPF and bcc: An Introduction
Whenever you look around at acronyms in technology, specially in systems, you find all these legacy names based off of uses for those tools and systems that don't really make sense anymore. Instead of fixing the names, we've all collectively agreed to ignore that the unhelpful names and pretend like they make perfect sense. eBPF is not only a prime example of this, but I'd go so far as to say that it is the example of this issue. No other acronym feels so woefully inadequate to describe what something does.
eBPF stands for the (extended) Berkeley Packet Filter. It is basically a Linux syscall that allows you to run (almost) arbitrary bytecode in a virtualized environment inside the kernel. This bytecode can do an extremely wide range of tasks, from tracing syscalls, to catching and analyzing signals, memory mappings, interrupts, and, as a bonus, even filter packets every now and then (actually, it has a very good set of tracing tools for the TCP/IP stack, but that doesn't quite the same punch does it?). It's an extremely powerful tool for extracting a lot of information from your system without actually disrupting it much in the process. The bytecode gets optimized and almost compiled down to x86, making it leave a minimal footprint on the kernel in the process.
So, you may ask, why should you care? Simply put there's no tool that will give you as good access to metrics on the kernel. It basically lets you extract the data out of the kernel, and send it elsewhere, be that a file, someone's terminal, or even some external service, which, maybe, could aggregate data from your fleet of thousands of machines and let you investigate issues with an impressive level of detail. Basically, eBPF is like that magical toolbox filled with tools you have no idea how to use but you know they will somehow get you out of some issue in some nebulous concept of the future.
Let's talk a little bit about how you would get started on eBPF. Everything related to eBPF goes back to the bpf()
syscall. Give that man page a quick read (or even better, read the one on your linux system), but note that most development for eBPF does not use the syscall. Most users who don't feel a sudden urge of masochism will use bcc
. bcc
, or the BPF Compiler Collection, is a set of tools that let you write eBPF bytecode by writing C-style code and compiling it down to eBPF bytecode. It also provides a lot of tooling and wrappers to make it easier to manage your code, and control how you'll get the data back. Overall, it makes eBPF development a breeze. It has multi-language support, including both C++ and Python, making it very versatile. I recommend taking a look at the GitHub repo, as well as at some of the examples, to get an idea of what you can do with bcc
.
Now that I've summarized what eBPF is and why you might be interested, the next step will be getting started on the setup and trying out a few examples. I'll finish this post here, but click here to read the next entry.