Top.Mail.Ru
A sysadmin diary — LiveJournal
? ?

Sat, Oct. 28th, 2017, 10:29 am
Apache Kafka at Meetic (2015)

Recording of talk Matthieu Robin and I gave at Forum PHP 2015 (in French)

https://www.youtube.com/watch?v=K0nfQyvcxXM

Sun, Sep. 11th, 2011, 07:34 pm
Reducing Stud memory usage

I owe a big thanks to Bumptech for freeing Stud, a robust, high performances SSL unwrapper. HTTPS is almost getting cheap now :).

Until Stud, we had no very good options to unwrap many connexions (HTTPS or otherwise) at infrastructure edge :
  • Pound, Apache, Squid, Stunnel, ... are slow, and either consuming too much resource per connection, or even forking at each connexion, or using select or poll on linux, or breaking websocket, ...
  • Nginx (and lighttpd for that matter) can't proxy websockets (which I need). And nginx may not perform so well when stuffed with many HTTPS connections (see the nice Vincent Bernat benchmarks).
  • HAProxy and Varnish are great reverse proxy, but they don't deal with SSL.
But Stud's memory footprint, while remaining way above Stunnel or Apache, is still not far from 1MB per connexion. Acceptable, but not ideal when you have many persistent connexions.  This is mostly due to a known difficulty with OpenSSL (see how Paul Querna fixed HTTPS memory consumption for nodejs ).

Recent (since 1.0.0 release) OpenSSL versions offers an (undocumented) SSL context options, SSL_OP_NO_COMPRESSION, to get rid of compression (and his huge buffers altogether).  After all, compression would rather occur on the backend application server rather than at the edge. OpenSSL 1.0.0 also brings SSL_MODE_RELEASE_BUFFERS, which, albeit less spectacular, helps to reduce memory usage too.

So I patched Stud and wrote a couple of small python scripts to measure the memory usage.

Here's the SSL_OP_NO_COMPRESSION savings:
Stud with SSL_OP_NO_COMPRESSION
And Stud patched for SSL_OP_NO_COMPRESSION compared to Stud with SSL_OP_NO_COMPRESSION+SSL_MODE_RELEASE_BUFFERS:
Stud with SSL_OP_NO_COMPRESSION + SSL_MODE_RELEASE_BUFFERS

196 KB from the remaining allocated memory come from the two (one to frontend, one to backend) ringbuffers per connection, each containing 3 slots of 1024 * 32 bytes.

How much throughput would we sacrifice by lowering ring slots to 16 KB ? That's for another benchmark ;).

Sat, Feb. 20th, 2010, 02:06 pm
The Cache Directory Tagging Standard

I stumbled upon this useful specification recently, and thought it deserves some more publicity.

The Cache Directory Standard describes how to tag a directory content as "cache" (practically,  "unimportant data") by just creating a CACHEDIR.TAG file.
Say, you don't want Firefox' cache to bloat your daily desktop backups. Create the CACHEDIR.TAG like this:
echo Signature: 8a477f597d28d172789f06886806bc55 > \
  ~/.mozilla/firefox/<your profile>/Cache/CACHEDIR.TAG
Then use "--exclude-caches" (GNU) tar option to backup your home.

Some well behaved applications (like libdvdcss and ccache) already create CACHEDIR.TAG in their cache folders. Indeed it would be much more useful if more of them could be taught to tag their cache or to ignore tagged content. Would be especially useful for rsync, cpio, git/svn, backuppc, bacula, etc.

Even though linux standards (FHS and XDG) already define two different pathes to store caches (~/.cache/ and /var/cache/), this  specification remains relevant: tags can be used for non cache directories, and anyway many applications ignore those standards.

From my experience, php web developers seems to have hard time following any unix FS spec, when they don't simply consider the write bit as granted on their own code folder. Mixing everything - logs, libs, confs, application code, cache, shared data - in the same spaghetti fs hierarchy seems to be a common pattern here... So specifications like cachedir tagging allows sysadmins to workaround those strong MS Windows habits.

Sun, Jul. 15th, 2007, 05:47 pm
All praise systemtap

Systemtap is an easy and powerful - yet kludgy - framework to instrument (linux) kernel internals. It allows one to define probes triggered at function entry or exit, and even permit dereferencing functions arguments (ie. you can dig all the way down through structures members). Those probes are scriptables with a concise language, and can be loaded at runtime (you can execute a new script tracing the kernel without rebooting).

It proven quite useful for my current needs (detecting power hogs on a running linux desktop). Here's an example. I wanted to pinpoint all applications spinning block devices for no reason. This not only includes applications reading and writing files, but also those causing inodes metadata changes (like atime), as far as it happens to actually spin the disks (reading cached metadata is ok).

Linux offers an ugly procfs interface for this purpose: 
echo 1 > /proc/sys/vm/block_dump

This will log all applications causing block devices accesses in ... the kernel ring buffer. So you end up with a polluted dmesg  and klogd/syslogd logging like a mad (causing new disks activity, and so on). Knowing nothing about kernel's internals, I just grepped for block_dump to find every instrumented functions, and emulated this with the following systemtap script :

#! stap
# Display block I/O consumers (doing reads, writes and dirtied inodes),
# exactly as "echo 1 > /proc/sys/vm/block_dump"
# but on stdout rather than polluting kernel ring buffer (dmesg).

probe kernel.function("submit_bio") {
        op = $rw & 1 ? "write" : "read"
        printf("%s(%d) %s on device %s\n", execname(), pid(), op,
                kernel_string($bio->bi_bdev->bd_disk->disk_name))
}

probe kernel.function("__mark_inode_dirty") {
        s_id = kernel_string($inode->i_sb->s_id)
        if (($inode->i_state & $flags) != $flags && ($inode->i_ino || s_id == "bdev")) {
                printf("%s(%d) dirtied inode %d on device %s\n",
                        execname(), pid(), $inode->i_ino, s_id)
        }
}

Simple, isn't it ? So I started cooking a top(1) like utility to trace the same things. Problem: I don't know how to clear the screen without this ugly system("clear"). Any thoughts ?

ps: would it be acceptable to convert the block_dump interface to something more like /proc/timer_stats ?

Sun, Jul. 15th, 2007, 12:15 pm
Legs on the road to power efficiency

Intel's PowerTOP utility made me aware of the Linux power consumption mess. There's a lot of low hanging fruits here. A short list of things I'll investigate:
  • NetworkManager. Freackin' power drain, hard to fix. More on this in a later post.
  • SCIM. This one sucks power on all Asians' linux desktops. Bad SCIM, bad.
  • Red Hat bug 204948 aka "Userspace sucks (wakeups)"
  • Ubuntu's misnamed power-management-in-ubuntu blueprint
  • thinkpad-keys, in Ubuntu's hotkey-setup. Unneeded with kernel 2.6.22 upward: ensure it's replaced by proper ACPI events handling.
  • High Resolution Timers patchset. The "force enable hpet" series makes quite a difference on my ICH4-M system.
  • Sensible defaults on distros setups. Like AC97 power saving feature, efficient frequency scaling governor, ...
  • Why the hell isn't the thinkpad_acpi (formerly ibm_acpi) kernel module autoloaded ?
  • Write tools to track relevant things that PowerTOP doesn't show (like block I/O and DMA activity). Systemtap will be of use.
On my X40 laptop, the default Ubuntu Gutsy desktop drains ~14W (thanks to NetworkManager behavior, it can't even enter C3 or C4 ACPI c-states). Manual tweaks brings it down to ~11W: this should be the default setup. Despite Arjan's recent LKML post, there's still room for improvement.

Sun, Jul. 15th, 2007, 11:53 am
First post

First post