Exploring GDB's Python API with Jupyter

GDB — the most common console debugger on Linux systems — has a Python API for adding new debugger commands and pretty-printers for complex data structures, or to automate debugging tasks.

While Python scripts can be loaded from files, it is nice to interactively explore the API or the debugged program. IPython would be perfect for the job, but starting it directly inside GDB doesn't work well. Fortunately, it's easy to launch an IPython kernel and connect with an external Jupyter console.

Launching the kernel from the gdb prompt:

(gdb) python
>import IPython; IPython.embed_kernel()
>end

Gives the following message:

To connect another client to this kernel, use:
    --existing kernel-12688.json

We can start Jupyter on a separate terminal and connect to this kernel:

$ jupyter console --existing kernel-12688.json
In [1]: gdb.newest_frame().name()
Out[1]: 'main'

The GDB Python API is then available from the gdb module within this Python session. To get started, I'd suggest the API documentation or this series of tutorials.

Currently, only the console client can connect to existing kernels. Support in the Notebook or in Jupyter Lab is tackled in this Github issue. Even with the limited capabilities of the console client, it's a great way to explore the API and to tackle more complicated debugging problems that require automation to solve.

PyDays 2017

PyDays 2017 was Austria's first conference dedicated to the Python programming language. It took place on May 5 and May 6 and was graciously hosted by the Linuxwochen Wien at FH Technikum in Vienna. It was great on many levels: meeting new people excited about Python and where it's headed, over 20 talks, interesting hallway conversations, etc.

I helped out with the organization of the conference, mostly taking care of catering. We (the organization team) are very happy with how everything went. The atmosphere was welcoming and open, the quality of the talks and workshops was very good, attendance was great, and people seemed to have a really good time. From an organizational perspective, the talks mostly stayed on-time, the audio/video hardware worked fine, and the PyDays booth was well-received. Giving out snacks, coffee, soda, cake during breaks worked out nicely and was received very well. The feedback, both at and after the conference, was very positive.

In addition to the co-organization of the Conference, I also held a talk about Dask, a Python library for parallel computing. It provides data structures similar to NumPy Arrays or Python Dataframes that process operations in parallel and scale to out-of-memory datasets. Dask is exciting since it provides the analytical power and familiar mental model of Dataframes, but handles hundreds of gigabytes of data and allows seamless scaling from a single laptop to a computing cluster. The talk was well-received, and there was a lot of interest at the end and in the hallway afterwards. The slides are online, if you're interested as well.

Finally, I'd like to thank my co-organizers, Claus, Kay, Helmut and Sebastian, as well as the people who helped out during the conference, for all their great work to make the PyDays a success. I'd also like to thank our sponsors: the Python Software Foundation, the Python Software Verband e.V., T-Mobile, UBIMET, and Jetbrains.

Overall it was great to see so much community interest in Python. If you're passionate about Python as well, join us on the Python Austria Slack and our meetups!

Working with angles (is surprisingly hard)

Due to the periodic nature of angles, especially the discontinuity at 2π / 360°, working with them presents some subtle issues. For example, the angles 5° and 355° are "close" to each other, but are numerically quite different. In a geographic (GIS) context, this often surfaces when working with geometries spanning Earth's date-line at -180°/+180° longitude, which often requires multiple code paths to obtain the desired result.

I've recently run into the problem of computing averages and differences of angles. The difference should increase linearly, it should be 0 if they are equal, negative if the second angle lies to one side of the first, positive if it's on the other side (whether that's left or right depends on whether the angles increase in the clockwise direction or in counter-clockwise direction). Getting that right and coming up with test cases to prove that it's correct was quite interesting. As Bishop notes in Pattern Recognition and Machine Learning, it's often simpler to perform operations on angles in a 2D (x, y) space and then back-transform to angles using the atan2() function. I've used that for the averaging function; the difference is calculated using modulo arithmetic.

Here's the Python version of the two functions:

def average_angles(angles):
    """Average (mean) of angles

    Return the average of an input sequence of angles. The result is between
    ``0`` and ``2 * math.pi``.
    If the average is not defined (e.g. ``average_angles([0, math.pi]))``,
    a ``ValueError`` is raised.
    """

    x = sum(math.cos(a) for a in angles)
    y = sum(math.sin(a) for a in angles)

    if x == 0 and y == 0:
        raise ValueError(
            "The angle average of the inputs is undefined: %r" % angles)

    # To get outputs from -pi to +pi, delete everything but math.atan2() here.
    return math.fmod(math.atan2(y, x) + 2 * math.pi, 2 * math.pi)


def subtract_angles(lhs, rhs):
    """Return the signed difference between angles lhs and rhs

    Return ``(lhs - rhs)``, the value will be within ``[-math.pi, math.pi)``.
    Both ``lhs`` and ``rhs`` may either be zero-based (within
    ``[0, 2*math.pi]``), or ``-pi``-based (within ``[-math.pi, math.pi]``).
    """

    return math.fmod((lhs - rhs) + math.pi * 3, 2 * math.pi) - math.pi

The code, along with test cases can also be found in this GitHub Gist. Translation of these functions to other languages should be straight-forward, sin()/cos()/fmod()/atan2() are pretty ubiquitous.

Resource management with Python

There should be one – and preferably only one – obvious way to do it.

There are multiple ways to manage resources with Python, but only one of them is save, reliable and Pythonic.

Before we dive in, let's examine what resources can mean in this context. The most obvious examples are open files, but the concept is broader: it includes locked mutexes, started client processes, or a temporary directory change using os.chdir(). The common theme is that all of these require some sort of cleanup that must reliably be executed in the future. The file must be closed, the mutex unlocked, the process terminated, and the current directory must be changed back.

So the core question is: how to ensure that this cleanup really happens?

Failed solutions

Manually calling the cleanup function at the end of a code block is the most obvious solution:

f = open('file.txt', 'w')
do_something(f)
f.close()

The problem with this is that f.close() will never be executed if do_something(f) throws an exception. So we'll need a better solution.

C++ programmers see this and try to apply the C++ solution: RAII, where resources are acquired in an object's constructor and released in the destructor:

class MyFile(object):
    def __init__(self, fname):
        self.f = open(fname, 'w')

    def __del__(self):
        self.f.close()

my_f = MyFile('file.txt')
do_something(my_f.f)
# my_f.__del__() automatically called once my_f goes out of scope

Apart from being verbose and a bit un-Pythonic, it's also not necessarily correct. __del__() is only called once the object's refcount reaches zero, which can be prevented by reference cycles or leaked references. Additionally, until Python 3.4 some __del__() methods were not called during interpreter shutdown.

A workable solution

The way to ensure that cleanup code is called in the face of exceptions is the try ... finally construct:

f = open('file.txt', 'w')
try:
    do_something(f)
finally:
    f.close()

In contrast to the previous two solutions, this ensures that the file is closed no matter what (short of an interpreter crash). It's a bit unwieldy, especially when you think about try ... finally statements sprinkled all over a large code base. Fortunately, Python provides a better way.

The correct solution™

The Pythonic solution is to use the with statement:

with open('file.txt', 'w') as f:
    do_something(f)

It is concise and correct even if do_something(f) raises an exception. Nearly all built-in classes that manage resources can be used in this way.

Under the covers, this functionality is implemented using objects known as context managers, which provide __enter__() and __exit__() methods that are called at the beginning and end of the with block. While it's possible to write such classes manually, an easier way is to use the contextlib.contextmanager decorator.

from contextlib import contextmanager

@contextmanager
def managed_resource(name):
    r = acquire_resource(name)
    try:
        yield r
    finally:
        release_resource(r)

with managed_resource('file.txt') as r:
    do_something(r)

The contextmanager decorator turns a generator function (a function with a yield statement) into a context manager. This way it is possible to make arbitrary code compatible with the with statement in just a few lines of Python.

Note that try ... finally is used as a building block here. In contrast to the previous solution, it is hidden away in a utility resource manager function, and doesn't clutter the main program flow, which is nice.

If the client code doesn't need to obtain an explicit reference to the resource, things are even simpler:

@contextmanager
def managed_resource(name):
    r = acquire_resource(name)
    try:
        yield
    finally:
        release_resource(r)

with managed_resource('file.txt'):
    do_something()

Sometimes the argument comes up that this makes it harder to use those resources in interactive Python sessions – you can't wrap your whole session in a gigantic with block, after all. The solution is simple: just call __enter__() on the context manager manually to obtain the resource:

cm_r = managed_resource('file.txt')
r = cm_r.__enter__()
# Work with r...
cm_r.__exit__(None, None, None)

The __exit__() method takes three arguments, passing None here is fine (these are used to pass exception information, where applicable). Another option in interactive sessions is to not call __exit__() at all, if you can live with the consequences.

Wrap Up

Concise, correct, Pythonic. There is no reason to ever manage resources in any other way in Python. If you aren't using it yet - start now!

libconf - a Python reader for libconfig files

This weekend, I uploaded my first package to PyPI: libconf, a pure-Python reader for files in libconfig format. This configuration file format is reminiscent of JSON and is mostly used in C/C++ projects through the libconfig library. It looks like this:

version = 7;
window: {
   title: "libconfig example"
   position: { x: 375; y: 210; w: 800; h: 600; }
};
capabilities: {
   can-do-lists: (true, 0x3A20, ("sublist"), {subgroup: "ok"})
   can-do-arrays: [3, "yes", True]
};

There are already two Python implementations: pylibconfig2 is a pure-Python reader licensed under GPLv3 and python-libconfig provides bindings for the libconfig C++ library. The first one I didn't like because of it's licensing, the second one I didn't like because of the more involved installation procedure. Also, I kind of enjoy writing parsers.

So, I set down during the easter weekend and wrote libconf. It's a pure-Python reader for libconfig files with an interface similar to the Python json module. There are two main methods: load(f) and loads(string). Both return a dict-like data-structure that can be indexed (config['version']), but supports attribute access as well (config.version):

import libconf
>>> with open('example.cfg') as f:
...     config = libconf.load(f)
>>> config['window']['title']
'libconfig example'
>>> config.window.title
'libconfig example'

It was a fun little project. Creating a recursive descent parser is pretty straightforward, especially for such a simple file format. Writing documentation, packaging and uploading to GitHub and PyPI took longer than coding up the implementation itself.