You ssh into a server or you enter a chroot, and the console overflows with
perl: warning: Setting locale failed.perl: warning: Please check that your locale settings: LANGUAGE = "en_US:en", LC_ALL = (unset), LC_TIME = "en_US.utf8", LC_CTYPE = "de_AT.UTF-8", LC_COLLATE = "C", LC_MESSAGES = "en_US.utf8", LANG = "de_AT.UTF-8" are supported and installed on your system.perl: warning: Falling back to the standard locale ("C").locale: Cannot set LC_CTYPE to default locale: No such file or directorylocale: Cannot set LC_MESSAGES to default locale: No such file or directorylocale: Cannot set LC_ALL to default locale: No such file or directory
Fortunately the fix is easy (adjust the locale names for your situation):
Profiling is hard. Measuring the right metric and correctly interpreting the
obtained data can be difficult even for relatively simple programs.
For performance optimization, I'm a big fan of the poor man's
profiler: run the binary to analyze under a debugger,
periodically stop the execution, get a backtrace and continue. After doing this
a few times, the hotspots will become apparent. This works amazingly well in
practice and gives a reliable picture of where time is spent, without the
danger of skewed results from instrumentation overhead.
Sometimes it's nice to get a more fine-grained view.
That is, not only find the hotspot, but get an overview how much time
is spent where. That's where 'real' profilers come in handy.
Under Windows, I like the built-in "Event Tracing for Windows" (ETW),
which produces files that can be analyzed with Xperf/Windows Performance
Analyzer. It is a really well thought out system, and the Xperf UI is amazing
in the analyzing abilities that it offers. Probably the best place to start
reading up on this is ETW Central.
Under Linux, I haven't found a profiler I can really recommend, yet. gprof
and sprof are both ancient and have severe limitations. OProfile
may be nice, but I haven't had a chance to use it yet, as it wasn't available
for my Ubuntu LTS release.
I have used Callgrind from the Valgrind toolkit in combination with the
KCachegrind GUI analyzer. I typically invoke it like this:
Callgrind works by instrumenting the binary under test. It slows down
program execution, often by a factor of 10. Further, it only measures CPU time,
so sleeping times are not included. This makes it unsuitable for programs that wait
a significant amount of time for network or disk operations to complete.
Despite these drawbacks, it's pretty handy if CPU time is all that you're
If blocking times are important (as they are for so many modern applications - we
generally spend less time computing and more time communicating),
gperftools is a decent choice. It includes a CPU profiler that can be run in
real-time sampling mode, and the results can viewed in KCachegrind. It is
recommended to compile libprofiler.so into the binary to analyze, but using
LD_PRELOAD works decently well:
If it works, this gives a good overall profile of the application.
Unfortunately, it sometimes fails: on amd64, there are sporadic crashes
from within libunwind. It's possible to just ignore those and rerun the profile, at
least interesting data is obtained 50% of the time.
The more serious problem is that CPUPROFILE_REALTIME=1 causes gperftools
to use SIGALARM internally, conflicting with any applications that want to
use that signal for themselves. Looking at the profiler source code, it should
be possible to work around this limitation with the undocumented
CPUPROFILE_PER_THREAD_TIMERS and CPUPROFILE_TIMER_SIGNAL environment
variables, but I couldn't get that to work yet.
You'd think that perf has something to offer in this area as well.
Indeed, it has a CPU profiling mode (with nice flamegraph visualizations)
and a sleeping time profiling mode, but I couldn't find a way to combine the
two to get a real-time profile.
Overall, there still seems to be room for a good, reliable real-time sampling
profiler under Linux. If I'm missing something, please let me know!
Recently, an interesting issue came up at work that involved a subtle
interaction between context managers and generator functions. Here is some
example code demonstrating the problem:
@contextlib.contextmanagerdefresource():"""Context manager for some resource"""print("Resource setup")yieldprint("Resource teardown")def_load_values():"""Load a list of values (requires resource to be held)"""foriinrange(3):print("Generating value %d"%i)yieldidefload_values():"""Load values while holding the required resource"""withresource():return_load_values()
Whoops. The resource is destroyed before the values are actually generated.
This is obviously a problem if the generator depends on the existence of the
When you think about it, it's pretty clear what's going on. Calling
_load_values() produces a generator object, whose code is only executed when
values are requested. load_values() returns that generator, exiting the
with statement and leading to the destruction of the resource. When the outer
for loop (for val) comes around to iterating over the generator, the
resource is long gone.
How do you solve this problem? In Python 3.3 and newer, you can use the yield
from syntax to turn load_values() into a generator as well. The
execution of load_values() is halted at the yield from point until the
child generator is exhausted, at which point it is safe to dispose of the
defload_values():"""Load values while holding the required resource"""withresource():yield from_load_values()
In older Python versions, an explicit for loop over the child generator is
defload_values():"""Load values while holding the required resource"""withresource():forvalin_load_values():yieldval
Still another method would be to turn the result of _load_values() into a
list and returning that instead. This incurs higher memory overhead since all
values have to be held in memory at the same time, so it's only
appropriate for relatively short lists.
To sum up, it's a bad idea to return generators from under with statements.
While it's not terribly confusing what's going on, it's a whee bit subtle and
not many people think about this until they ran into the issue. Hope this
In one of my last posts, I described the current (sad) state of
managing Docker container and image expiration. Briefly, Docker creates
new containers and images for many tasks, but there is no good way to
automatically remove them. The best practice seems to be a rather hack-ish
Since this wasn't particularly satisfying, I decided to do something
about it. Here, I present docker-cleanup, a Python application
for removing containers and images based on a configurable set of rules.
This is a rules file example:
# Keep currently running containers, delete others if they last finished# more than a week ago.
KEEP CONTAINER IF Container.State.Running;
DELETE CONTAINER IF Container.State.FinishedAt.before('1 week ago');# Delete dangling (unnamed and not used by containers) images.
DELETE IMAGE IF Image.Dangling;
Clear, expressive, straight-forward. The rule language can do a whole lot more
and provides a readable and intuitive way to define removal policies for images
Head over to GitHub, give it a try, and let me know what you
Just a quick tip about the hardly known slice objects in Python.
They are used to implement the slicing syntax for sequence types (lists,
s="The quick brown fox jumps over the lazy dog"# s[4:9] is internally converted (and equivalent) to s[slice(4, 9)].asserts[4:9]==s[slice(4,9)]# 'Not present' is encoded as 'None'asserts[20:]==s[slice(20,None)]
slice object can be used in normal code too, for example for tracking
regions in strings: instead of having separate start_idx and end_idx
variables (or writing a custom class/namedtuple) simply roll
the indices into a slice.
# A column-aligned table:table=('REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE','<none> <none> 0987654321AB 2 hours ago 385.8 MB','chris/web latest 0123456789AB 2 hours ago 385.8 MB',)header,*entries=table# Compute the column slices by parsing the header. Gives a list of slices.slices=find_column_slices(header)forentryinentries:repo,tag,id,created,size=[entry[sl].strip()forslinslices]...
This is mostly useful when the indices are computed at runtime and applied to
more than one string.
More generally, slice objects encapsulate regions of strings/lists/tuples,
and are an appropriate tool for simplifying code that operates on start/end
indices. They provide a clean abstraction, make the code more straight-forward
and save a bit of typing.