Debian's most annoying warning message: "Setting locale failed"

You ssh into a server or you enter a chroot, and the console overflows with these messages:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US:en",
        LC_ALL = (unset),
        LC_TIME = "en_US.utf8",
        LC_CTYPE = "de_AT.UTF-8",
        LC_COLLATE = "C",
        LC_MESSAGES = "en_US.utf8",
        LANG = "de_AT.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory

Gaaaaarrrrh!

Fortunately the fix is easy (adjust the locale names for your situation):

locale-gen en_US.UTF-8 de_AT.UTF-8

Finally, peace on the console.

Profiling

Profiling is hard. Measuring the right metric and correctly interpreting the obtained data can be difficult even for relatively simple programs.

For performance optimization, I'm a big fan of the poor man's profiler: run the binary to analyze under a debugger, periodically stop the execution, get a backtrace and continue. After doing this a few times, the hotspots will become apparent. This works amazingly well in practice and gives a reliable picture of where time is spent, without the danger of skewed results from instrumentation overhead.

Sometimes it's nice to get a more fine-grained view. That is, not only find the hotspot, but get an overview how much time is spent where. That's where 'real' profilers come in handy.

Under Windows, I like the built-in "Event Tracing for Windows" (ETW), which produces files that can be analyzed with Xperf/Windows Performance Analyzer. It is a really well thought out system, and the Xperf UI is amazing in the analyzing abilities that it offers. Probably the best place to start reading up on this is ETW Central.

Under Linux, I haven't found a profiler I can really recommend, yet. gprof and sprof are both ancient and have severe limitations. OProfile may be nice, but I haven't had a chance to use it yet, as it wasn't available for my Ubuntu LTS release.

I have used Callgrind from the Valgrind toolkit in combination with the KCachegrind GUI analyzer. I typically invoke it like this:

valgrind --tool=callgrind --callgrind-out-file=callgrind-cpu.out ./program-to-profile
kcachegrind callgrind-cpu.out

Callgrind works by instrumenting the binary under test. It slows down program execution, often by a factor of 10. Further, it only measures CPU time, so sleeping times are not included. This makes it unsuitable for programs that wait a significant amount of time for network or disk operations to complete. Despite these drawbacks, it's pretty handy if CPU time is all that you're interested in.

If blocking times are important (as they are for so many modern applications - we generally spend less time computing and more time communicating), gperftools is a decent choice. It includes a CPU profiler that can be run in real-time sampling mode, and the results can viewed in KCachegrind. It is recommended to compile libprofiler.so into the binary to analyze, but using LD_PRELOAD works decently well:

CPUPROFILE_REALTIME=1 CPUPROFILE=prof.out LD_PRELOAD=/usr/lib/libprofiler.so ./program-to-profile
google-pprof --callgrind ./program_to_profile prof.out > callgrind-wallclock.out
kcachegrind callgrind-wallclock.out

If it works, this gives a good overall profile of the application. Unfortunately, it sometimes fails: on amd64, there are sporadic crashes from within libunwind. It's possible to just ignore those and rerun the profile, at least interesting data is obtained 50% of the time.

The more serious problem is that CPUPROFILE_REALTIME=1 causes gperftools to use SIGALARM internally, conflicting with any applications that want to use that signal for themselves. Looking at the profiler source code, it should be possible to work around this limitation with the undocumented CPUPROFILE_PER_THREAD_TIMERS and CPUPROFILE_TIMER_SIGNAL environment variables, but I couldn't get that to work yet.

You'd think that perf has something to offer in this area as well. Indeed, it has a CPU profiling mode (with nice flamegraph visualizations) and a sleeping time profiling mode, but I couldn't find a way to combine the two to get a real-time profile.

Overall, there still seems to be room for a good, reliable real-time sampling profiler under Linux. If I'm missing something, please let me know!

Returning generators from with statements

Recently, an interesting issue came up at work that involved a subtle interaction between context managers and generator functions. Here is some example code demonstrating the problem:

@contextlib.contextmanager
def resource():
    """Context manager for some resource"""

    print("Resource setup")
    yield
    print("Resource teardown")


def _load_values():
    """Load a list of values (requires resource to be held)"""

    for i in range(3):
        print("Generating value %d" % i)
        yield i


def load_values():
    """Load values while holding the required resource"""

    with resource():
        return _load_values()

This is the output when run:

>>> for val in load_values(): pass
Resource setup
Resource teardown
Generating value 0
Generating value 1
Generating value 2

Whoops. The resource is destroyed before the values are actually generated. This is obviously a problem if the generator depends on the existence of the resource.

When you think about it, it's pretty clear what's going on. Calling _load_values() produces a generator object, whose code is only executed when values are requested. load_values() returns that generator, exiting the with statement and leading to the destruction of the resource. When the outer for loop (for val) comes around to iterating over the generator, the resource is long gone.

How do you solve this problem? In Python 3.3 and newer, you can use the yield from syntax to turn load_values() into a generator as well. The execution of load_values() is halted at the yield from point until the child generator is exhausted, at which point it is safe to dispose of the resource:

def load_values():
    """Load values while holding the required resource"""

    with resource():
        yield from _load_values()

In older Python versions, an explicit for loop over the child generator is required:

def load_values():
    """Load values while holding the required resource"""

    with resource():
        for val in _load_values():
            yield val

Still another method would be to turn the result of _load_values() into a list and returning that instead. This incurs higher memory overhead since all values have to be held in memory at the same time, so it's only appropriate for relatively short lists.

To sum up, it's a bad idea to return generators from under with statements. While it's not terribly confusing what's going on, it's a whee bit subtle and not many people think about this until they ran into the issue. Hope this heads-up helps.

published November 12, 2015
tags python

A better way for deleting Docker images and containers

In one of my last posts, I described the current (sad) state of managing Docker container and image expiration. Briefly, Docker creates new containers and images for many tasks, but there is no good way to automatically remove them. The best practice seems to be a rather hack-ish bash one-liner.

Since this wasn't particularly satisfying, I decided to do something about it. Here, I present docker-cleanup, a Python application for removing containers and images based on a configurable set of rules.

This is a rules file example:

# Keep currently running containers, delete others if they last finished
# more than a week ago.
KEEP CONTAINER IF Container.State.Running;
DELETE CONTAINER IF Container.State.FinishedAt.before('1 week ago');

# Delete dangling (unnamed and not used by containers) images.
DELETE IMAGE IF Image.Dangling;

Clear, expressive, straight-forward. The rule language can do a whole lot more and provides a readable and intuitive way to define removal policies for images and containers.

Head over to GitHub, give it a try, and let me know what you think!

Using Python slice objects for fun and profit

Just a quick tip about the hardly known slice objects in Python. They are used to implement the slicing syntax for sequence types (lists, strings):

s = "The quick brown fox jumps over the lazy dog"

# s[4:9] is internally converted (and equivalent) to s[slice(4, 9)].
assert s[4:9] == s[slice(4, 9)]

# 'Not present' is encoded as 'None'
assert s[20:] == s[slice(20, None)]

slice object can be used in normal code too, for example for tracking regions in strings: instead of having separate start_idx and end_idx variables (or writing a custom class/namedtuple) simply roll the indices into a slice.

# A column-aligned table:
table = ('REPOSITORY   TAG      IMAGE ID       CREATED       VIRTUAL SIZE',
         '<none>       <none>   0987654321AB   2 hours ago   385.8 MB',
         'chris/web    latest   0123456789AB   2 hours ago   385.8 MB',
        )
header, *entries = table

# Compute the column slices by parsing the header. Gives a list of slices.
slices = find_column_slices(header)

for entry in entries:
    repo, tag, id, created, size = [entry[sl].strip() for sl in slices]
    ...

This is mostly useful when the indices are computed at runtime and applied to more than one string.

More generally, slice objects encapsulate regions of strings/lists/tuples, and are an appropriate tool for simplifying code that operates on start/end indices. They provide a clean abstraction, make the code more straight-forward and save a bit of typing.