Sean McLemon

Fedora - Jupyter on a Linux remote using systemd

When you want to do some experimentation or put together a simple code-based presentation Jupyter notebooks are a powerful tool to have at your disposal. But if you use a number of devices over a few locations it can be useful to have a single instance hosted somewhere central (Linode, Digital Ocean, wherever) that you can access from any device wherever you are. There are a handful of ways that you can achieve this:

  1. log in to your remote machine, set Jupyter up and run jupyter notebook (perhaps in a tmux session) then log out - do this whenever your machine reboots
  2. as above but using an existing docker image
  3. spin up an Azure notebook
  4. ... or we could do something like #1 - but have it setup under a separate user and administered via a systemd service

All four of the above are fine for different reasons and use-cases but here I'll talk about how I put #4 together in a little Linode instance running Fedora 25 - it's relatively simple, you can control over the kernels installed, and it's another excuse to get a bit more in-depth with another Linux subsystem (systemd).

Requirements

All you need is a Linux system which uses systemd (Fedora 15.0 or newer, Debian 8.x or newer, Ubuntu 15.04 or newer, for example) which you have sudoer level access on, and Python 3.x. It's probably pretty straight-forward to set this up on systems using the SysV init but I won't cover them here.

Install and Set Up Jupyter 

First thing we need to do is install Jupyter and set up the user context which the Jupyter will be run under - which is a user called "jupyter":

$ sudo python3 -m ensurepip
$ sudo pip install jupyter
$ sudo useradd jupyter
$ sudo passwd jupyter

Next we should switch to the new jupyter user, create the directory our notebooks will live in and generate the Jupyter config we'll mess around with:

$ su - jupyter
$ mkdir notebooks
$ jupyter notebook --generate-config

The last command will create a new file ~/.jupyter/jupyter_notebook_config.py which we'll do a little messing around with shortly, but before this we'll set up a password 

$ python3
Python 3.5.2 (default, Sep 14 2016, 11:28:32) 
[GCC 6.2.1 20160901 (Red Hat 6.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from notebook.auth import passwd
>>> passwd() # below I enter "password123"
Enter password: 
Verify password: 
'sha1:2eff88aac285:385c87867bd18fe852ee1d56b1010d4beed96969'

This will be used to log in to the application when its running. Open up the ~/.jupyter/jupyter_notebook_config.py file in a text editor and add/modify the following lines (using the SHA1 hash returned by the above):

c.NotebookApp.port = 8888
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.password = 'sha1:2eff88aac285:385c87867bd18fe852ee1d56b1010d4beed96969'

Setting up a Jupyter systemd service

Now we want to create a new systemd service so we can make sure our Jupyter notebook runs on startup, handles logging nicely and has all the other bells-and-whistles afforded to us by systemd. This is surprisingly simple - we want to create a new file jupyter.service in /usr/lib/systemd/system - this will tie together our newly installed Jupyter software and our newly setup jupyter user - using your favourite text editor create it so it looks like the below:

$ sudo cat /usr/lib/systemd/system/jupyter.service
[Unit]
Description=Jupyter

[Service]
Type=simple
PIDFile=/var/run/jupyter.pid
ExecStart=/usr/bin/jupyter notebook --no-browser
WorkingDirectory=/home/jupyter/notebooks
User=jupyter
Group=jupyter

[Install]
WantedBy=multi-user.target%

Now all that's left to do is cross our fingers, enable our services, kick them off and browse to our remote box and login with our password:

$ sudo systemctl daemon-reload
$ sudo systemctl enable jupyter
$ sudo systemctl start jupyter

And if you want you can stop here - bookmark your http://www.xxx.yyy.zzz:port address and you're all set!

Conclusion

This was initially just an experiment - an excuse to test out my ability to put together a systemd .service file and do something more with a mostly-idle linux server sitting in a facility somewhere in Amsterdam. However I have found that I really like using this setup. When I was first shown Jupyter (née IPython) I was unimpressed and didn't see the point. However over the last few days I've been working through Project Euler problems again while teaching myself F# (using the IfSharp kernel) and I have found that it lends itself very well to my problem solving workflow on Project Euler.

Fedora - getting Tor and SELinux to play nice

tl;dr 

If you have weird SELinux permissions issues using Tor on Fedora, skip to "The Solution" we're basically gonna add a couple of custom SELinux policies and update the permissions on the /var/lib/tor directory.

Intro

I'm trying to get a little bit out of my cosy Debian comfort zone, and since I have a few friends working at Red Hat figured I'd try out Fedora. While I was teaching myself about systemd however I ran into an issue - starting up a Tor hidden service using systemd was fine immediately after it was installed, but after a reboot it'd repeatedly fail - the following is what is displayed when I ran systemctl status tor.service:

    tor.service - Anonymizing overlay network for TCP
       Loaded: loaded (/usr/lib/systemd/system/tor.service; enabled; vendor preset: disabled)
       Active: failed (Result: start-limit-hit) since Mon 2017-02-20 11:16:44 CET; 2min 42s ago
      Process: 1150 ExecStartPre=/usr/bin/tor --runasdaemon 0 --defaults-torrc /usr/share/tor/defaults-torrc -f /etc
    
    Feb 20 11:16:44 localhost.localdomain systemd[1]: tor.service: Service hold-off time over, scheduling restart.
    Feb 20 11:16:44 localhost.localdomain systemd[1]: Stopped Anonymizing overlay network for TCP.
    Feb 20 11:16:44 localhost.localdomain systemd[1]: tor.service: Start request repeated too quickly.
    Feb 20 11:16:44 localhost.localdomain systemd[1]: Failed to start Anonymizing overlay network for TCP.
    Feb 20 11:16:44 localhost.localdomain systemd[1]: tor.service: Unit entered failed state.
    Feb 20 11:16:44 localhost.localdomain systemd[1]: tor.service: Failed with result 'start-limit-hit'.

The Problem

Looking a little closer at the logs in journalctl it seems that the tor process is not able to access the directory structure under /var/lib/tor - the toranon user's home directory. 

    Feb 20 11:16:43 localhost.localdomain tor[1150]: Feb 20 11:16:43.033 [warn] Directory /var/lib/tor/ssh/ cannot be read: Permission denied
    Feb 20 11:16:43 localhost.localdomain tor[1150]: Feb 20 11:16:43.033 [warn] Failed to parse/validate config: Failed to configure rendezvous options. See logs for details.
    Feb 20 11:16:43 localhost.localdomain tor[1150]: Feb 20 11:16:43.034 [err] Reading config failed--see warnings above.

This appears to be down to SELinux kicking in and telling us the process is trying to do something it's not explicitly permitted to do according the the SELinux policies currently loaded. A quick google search for this error turns up a handful of results from other Fedora users:

In each of these a couple of workarounds are proposed, like disabling SELinux and setting the hidden service directory to be the /var/lib/tor directory. Disabling SELinux would probably work fine, but I'd rather not do that - I'd rather understand what's going on and eventually fix it properly. I also don't want to use the other workaround - since that would prevent me from running two separate hidden services, and what if I want to run ssh and operate a cryptocurrency-based online drugs supermarket[1]?

After a bit more digging I found a bug report on Red Hat's Bugzilla which described exactly the problem I saw (working, ten failing after reboot). However it also confirmed that as-at 14th February 2017 this was still an open issue (poor Kyle Marek spent his Valentines Day debugging Tor) - https://bugzilla.redhat.com/show_bug.cgi?id=1375369 so there's no "proper" fix yet.

The Solution

Until there's an "official" solution there is a semi-smart workaround proposed by the SELinux Alert Browser - to generate a local policy module to permit the access that SELinux restricted. The following steps assume that you've setup a hidden service in your /etc/tor/torrc and that it's failing to start with some weird permissions error.

Firstly let's sort out the permissions for the toranon user's home directory - some people reported that the root user owned some folders in this directory which isn't really desirable:

So let's do this, and sort out the permissions for the toranon user's home directory too.

    $ sudo find /var/lib/tor ! -user toranon
    $ sudo chown toranon /var/lib/tor/some/folder
    $ sudo find /var/lib/tor ! -group toranon
    $ sudo chown :toranon /var/lib/tor/some/folder

In my case /var/lib/tor itself was owned by root - I moved it to toranon just in case. Next let's add an SELinux policy to give the Tor service the permissions it wants:

    $ sudo ausearch -c 'tor' --raw | audit2allow -M tor-workaround
    ******************** IMPORTANT ***********************
    To make this policy package active, execute:
    
    semodule -i tor-workaround.pp
    $ sudo semodule -i tor-workaround.pp

Now after a reboot we should see that the service has successfully started up without any errors

    $ sudo systemctl reboot
    (later...)
    $ sudo systemctl status tor.service
     tor.service - Anonymizing overlay network for TCP
       Loaded: loaded (/usr/lib/systemd/system/tor.service; enabled; vendor preset: 
       Active: active (running) since Sun 2017-02-19 15:49:42 CET; 18min ago
      Process: 768 ExecStartPre=/usr/bin/tor --runasdaemon 0 --defaults-torrc /usr/s
     Main PID: 825 (tor)
        Tasks: 1 (limit: 4915)
       CGroup: /system.slice/tor.service
               └─825 /usr/bin/tor --runasdaemon 0 --defaults-torrc /usr/share/tor/de
    
    Feb 19 15:49:42 localhost.localdomain Tor[825]: Parsing GEOIP IPv6 file /usr/sha
    Feb 19 15:49:42 localhost.localdomain Tor[825]: Bootstrapped 0%: Starting
    Feb 19 15:49:42 localhost.localdomain Tor[825]: Bootstrapped 80%: Connecting to 
    Feb 19 15:49:42 localhost.localdomain systemd[1]: Started Anonymizing overlay ne
    Feb 19 15:49:42 localhost.localdomain Tor[825]: Signaled readiness to systemd
    Feb 19 15:49:43 localhost.localdomain Tor[825]: Opening Control listener on /run
    Feb 19 15:49:43 localhost.localdomain Tor[825]: Bootstrapped 85%: Finishing hand
    Feb 19 15:49:43 localhost.localdomain Tor[825]: Bootstrapped 90%: Establishing a
    Feb 19 15:49:44 localhost.localdomain Tor[825]: Tor has successfully opened a ci
    Feb 19 15:49:44 localhost.localdomain Tor[825]: Bootstrapped 100%: Done

Conclusion

It was a little bit unfortunate that I bumped into this when I was trying to familiarise myself with systemd, but it was good to have it sorted and I think that the next thing I should explore is SELinux. Perhaps I could understand and contribute the proper fix since this is just a little temporary workaround. 

[1] - note: I do not want to run a cryptocurrency-based online drugs supermarket

XMonad - Issues after dist-upgrade

Every other time I run apt-get dist-upgrade my XMonad fails to rebuild with the error "Failed to load interface for âXMonad.Actions.Volumeâ"

The annoying thing is happens infrequently enough that each time I forget what the resolution is, google a bunch of things, then eventually stumble on the answer and move on. Anyway the reason it's a little tricky is that most pages or posts suggest that the correct resolution is to issue the following command

    $ sudo cabal update && sudo cabal install xmonad-extras

However for some reason I have a bunch of these Haskell packages installed in ~/.ghc and running this command using sudo doesn't update these, so instead I have to run it normally:

    $ cabal update && cabal install xmonad-extras

And that's it!

IronPython - Integer performance improvements

When I was puddling around in IronPython ahead of an upcoming project I spotted something interesting - when we want to deal with integers within the IronPython interpreter we frequently call a function in Microsoft.Scripting.Runtime.ScriptingRuntimeHelpers called Int32ToObject:

    public static object Int32ToObject(Int32 value) {
        // caches improves pystone by ~5-10% on MS .Net 1.1, this is a very integer intense app
        // TODO: investigate if this still helps perf. There's evidence that it's harmful on
        // .NET 3.5 and 4.0

        if (value < MAX_CACHE && value >= MIN_CACHE) {
            return cache[value - MIN_CACHE];
        }
        return (object)value;
    }
For integers in the range -100 and 1,000 we maintain a cached array of objects already converted from Int32. This is called a lot even in relatively innocuous looking programs - to see this we can a simple Console.Out.WriteLine("Int32ToObject") in there and count the number of times it occurs in a program that simply prints "Hello, world!" we can see this:
    $ cat hello.py
    print "Hello, world!"
    $ ./bin/Debug/ipy.exe hello.py | grep -c "Int32ToObject"
    1317

The code itself specifically references the pystone benchmark (which I found here) in a comment suggesting that we could see a performance improvement on pystone with versions of .NET newer than 3.5 - which appears to be the minimum version later versions of IronPython supports.

I built the Release configuration of IronPython both before and after removing this cache functionality, and tested the default pystone benchmark on my work computer (a pretty hefty 8-core Xeon E3-1275 @ 3.6 GHz, with 32GB RAM) - the results are below where I ran the test 10 times and took the average. The values output by the benchmark are "pystones per second" - where one "pystone" is an iteration through the main loop inside the Proc0() function which performs a number of integer operations and function calls:


Before  After 
1. 238262 234663
2. 239115 234595
3. 245149 245931
4. 237845 302562
5. 228906 295027
6. 248294 275535
7. 258694 297271
8. 246650 282791
9. 235741 296104
10. 233604 274396
Average 241226 273887

So with the fix we see 32,661 more of these iterations per-second, which is roughly a 13.5% improvement. This makes sense - presumably casting each int to object has been improved so that it's nearly free, leaving the overhead being a simple function call.

2016 and 2017

Reflecting on 2016

It's easy to lose track of achievements and get bogged down in stress in the short term so at the end of the year I like to look back at what I've done and think about what I liked or what went well.

Debian/Linux

First off not a huge one but when I replaced my ailing MacBook Air with a Thinkpad X250 I moved to using Linux and Xmonad as my daily driver (posts one, two and three on this). This relatively minor switch actually kicked off a relatively productive year in terms of professional development - I ended up having to re-learn a lot about Linux that I had forgotten, and bit-by-bit ended up learning and creating more and more. Many times on macOS I'd find some library or utility wouldn't work without a lot of config, and I'd often just give up - once I got past the initial "setup my environment" step Debian ended up smoother than macOS.

Python

I created and released a handful of Python packages on PyPI (em73xx, rbcz, xnm, xsms) - none of them have huge widespread appeal but they're fairly well written and let me work through the process of releasing on PyPI, producing documentation. I also wrote a fair amount about Python, and worked through Philip Guo's excellent CPython Internals course which I thoroughly enjoyed.

Blog

I wrote a total of 28 blog posts in 2016, which is slightly more than one every fortnight. They've only had a few thousand views in total but I've thoroughly enjoyed learning and creating something new, then writing it up. It's good to practice any sort of technical writing (even if it's an informal blog post) and it's pretty rewarding to look back on what you've done. Here's a handful of the ones I enjoyed writing most

Homepage

I also put together a new homepage @ www.mclemon.io. My company use ASP.NET MVC for all new UIs, and since I'm not always involved in the front-end development I felt like I was falling behind. Instead of shying away from this and resolving to look into it "later" I tackled it head on and set about creating a simple home page with MVC and deploy it to Azure. What started off as a simple static "Contact" page grew into a neat little aggregator of my Posthaven blogs, which I can extend pretty easily to pull in other stuff (I've tried out GitHub and Twitter, but my commits push everything out of the way, and I rarely post anything useful on Twitter!). It should degrade reasonably well on mobile, but as you can see from the screenshot there's a couple of whitespace issues I clearly need to tackle!

2017

I don't like resolutions, but I do think it's good to think ahead about some things I'd like to work on - even if I ultimately get sidetracked on something else.

IronPython

I've already written a couple of articles about IronPython, but I'd like to keep diving deeper. I have a really nice series of posts in the works which could end up being useful to the community at large so I'm pretty stoked about getting them completed. I'd ultimately like to start contributing to the project, hopefully by the time I've finished these I'll be knowledgeable enough to be of use to Alex Earl et al.

Czech

My Czech has stagnated - I've got some friends I write in Czech to but I'd like to read and write far more regularly. I've found that if I sit down I can break down most sentences with a bit of patience but my vocabulary and sentence construction is still pretty poor. 

Books

There are so many books I'd like to read - but in terms of professional development I'd like to revisit some fundamentals - so working my way through the classic SICP is top of the list, but I'd also like to work through Grokking Algorithms In Python as it looked like a nice read.

Alfie

My little dog is adorable, but he's way too boisterous - I'd like him to:

  1. return on command every time I call him (without resorting to tricking him with biscuits and the frisbee!)
  2. walk to "heel" better
  3. not jump on strangers
  4. chill out in the pub

Of course #4 is the most important one :)

Clock

I mostly cracked the Brno astronomical clock (aka the cock clock) at summer, and started writing a blog post about it - but I needed to create a couple more timelapses and visualisations to complete the picture and never found time to do it. This clock is kinda famous for being unintelligible so it'd be nice to share the knowledge with the world!

General Focus

In general I'd like to focus more on a couple of disciplines or topics and become more of an expert in them. Looking back at my posts I've covered a fairly broad spectrum of things, but I never really went into much detail on any of them. I'm spreading myself a little thinly and it doesn't help that I've got the attention span of a goldfish!


Discover a Linux Utility - xssstate

To learn more about Debian and Linux in general I'm selecting utilities at random from my PATH using the command below, learning what they do and writing a blog post about it. Previously: Part 1Part 2, Part 3

    $ (for folder in `echo $PATH | sed "s/:/\\n/g"`; do ls -1 $folder; done; ) | shuf -n 1 | xargs man

Today's utility is xssstate, which lets your check the status of X window system's screensaver. It's written by the suckless guys, who've created a number of very good tools, such as surf (a minimalist web browser) and dmenu (autocompleting program launcher), both of which I use regularly.

The utility itself is pretty simple, there are only four command line switches including -v, so this will be pretty short post. First we can check if the screensaver is currently enabled using -t switch:

    $ xssstate -s
    off

Obviously the screensaver is off, since I am actively using this computer - however if the screensaver was active it'd print "on" and if it was disabled altogether you'd see "disabled".

To check the time idle in milliseconds, use the -i switch:

    $ xssstate -i
    2
    $ sleep 5 && xssstate -i
    4947

And to get time in milliseconds until the screensaver activates, invoke it with -t:

    $ xssstate -t
    599998
    $ sleep 10 && xssstate -t
    590052

The way the utility does this is by using some functionality provided by a X11 library, wrapped in a handful of switch statements (they have their own neat little github-style source browser if you want to check out xssstate.c in its entirety):

    // ...
    info = XScreenSaverAllocInfo();
    XScreenSaverQueryInfo(dpy, DefaultRootWindow(dpy), info);

    if (showstate) {
    	switch(info->state) {
	case ScreenSaverOn:
		printf("on\n");
		break;
	case ScreenSaverOff:
		printf("off\n");
		break;
	case ScreenSaverDisabled:
		printf("disabled\n");
		break;
	}
    } else if (showtill) {
	switch(info->state) {
	case ScreenSaverOn:
		printf("0\n");
		break;
	case ScreenSaverOff:
		printf("%lu\n", info->til_or_since);
		break;
	case ScreenSaverDisabled:
		printf("-1\n");
		break;
	}
    } else if (showidle) {
	printf("%lu\n", info->idle);
    }
    // ...

When I do these articles I like to show some practical real-life usage of the utility - in this case I decided to add a little timer to my xmobar showing how long my computer had been idle. To this I added a Run Com entry to my xmobarrc:

    -- also stick %xssstate% into the template
    Run Com "xssstate" [ "-t" ] "xssstate" 10,

This ends up showing with something like the below - apologies for shaky-cam!

IronPython - Efficient string concatenation

Previously I'd done a bit of fiddling around with Python, revisiting some string concatenation benchmarks from an old-ish article and trying to explain some unexpected results. After playing around a bit with IronPython I was curious whether it'd be faster or slower than CPython on windows.

I installed the latest versions of both IronPython (2.7.5) and CPython (2.7.12) into my Windows 10 VM and re-ran the same set of tests.

The most interesting thing I learned was that some changes to how memory was allocated for the new buffer caused the "naive" method to be on par with the optimised version. As it turns out, IronPython doesn't actually have this - so running stest.py we get the following:

    $ ipy64 stest.py 1
    method 1
    time 2406.60858154 ms
    output size  86 kb

    $ ipy64 stest.py 6
    method 6
    time 46.9284057617 ms
    output size  86 kb

IronPython is a totally different beast to CPython, so my previous method of debugging - taking the code and examining it with the dis module doesn't yield anything useful:

This is because it compiles it into a different set of instructions to be executed using the .NET CLR (it's important to note that it does not go directly to .NET IL, there's still a level of instructions above this similar to CPythons opcodes).

However since we're on Windows with .NET we do have Visual Studio - which is arguably easier than working through python bytecode instructions in a text editor. To begin with it's extremely simple to find out where exactly we spend most of our execution time using dotTrace by JetBrains:


So the program execution is split with roughly 50% spent in initialisation (ImportSite, again!) but that's not included in our benchmark, however the remaining 50% is spent in String.Concat() in mscorlib (source here) which is what we're interested in:

    [System.Security.SecuritySafeCritical]  // auto-generated
    public static String Concat(String str0, String str1) {
        Contract.Ensures(Contract.Result() != null);
        Contract.Ensures(Contract.Result().Length ==
            (str0 == null ? 0 : str0.Length) +
            (str1 == null ? 0 : str1.Length));
        Contract.EndContractBlock();

        if (IsNullOrEmpty(str0)) {
            if (IsNullOrEmpty(str1)) {
                return String.Empty;
            }
            return str1;
        }

        if (IsNullOrEmpty(str1)) {
            return str0;
        }

        int str0Length = str0.Length;
        
        String result = FastAllocateString(str0Length + str1.Length);
        
        FillStringChecked(result, 0,          str0);
        FillStringChecked(result, str0Length, str1);
        
        return result;
    }

This explains why things are so slow - when concatenating two strings there are no realloc-based tricks like CPython had - we allocate a new memory buffer every time, copy both strings into it, and let the garbage collector handle the old buffers.

Sadly it's pretty non-trivial for someone like me to implement a similar optimisation here - since we depend on the underlying string implementation in .NET we're stuck with how string concatenation was implemented there. I toyed with the idea of re-writing a hacky reimplementation of FastAllocateString as FastReallocateString specifically for the += operator (it's possible to do - we need to change PythonBinaryOperationBinder.BindDelegate() to handle Add and AddAssign differently) this would've involved getting stuck into the mscorlib sources in coreclr - and I'd rather stay in Python-land for the time being.

However since it's possible to access the .NET standard libraries from IronPython we can at least test how System.Text.StringBuilder performs, since it is designed to solve this very problem. So I setup the stest.py code I previously used, and re-ran them on my Windows 10 VM for both CPython 2.7.12 and IronPython 2.7.5. Just to quickly recap, here are the string concatenation methods I tested:

Method 1: simple concatenation: s1 += s2

Method 2: concatenation using MutableString (s1 += s2, where s1 is a MutableString)

Method 3: appending to a long array of char

Method 4: building a list of strings, then calling "".join()

Method 5: writing to a cStringIO buffer in memory using write()

Method 6: same as Method 4 but using a list comprehension inline

Method 7: using System.Text.StringBuilder (IronPython only)

runtime (ms) concatenations per second
method 1 16.00 1,250,000
method 2 108.99 183,503
method 3 14.99 1,334,222
method 4 3.00 6,666,666
method 5 6.00 3,333,333
method 6 2.00 10,000,000

runtime (ms) concatenations per second
method 1 2517.44 7,944
method 2 3968.87 5,039
method 3 25.39 787,711
method 4 42.13 474,721
method 5 35.56 562,429
method 6 33.22 602,046
method 7 22.43 891,662

Conclusion

So in IronPython the most efficient way to join strings together is to hook into .NET's System.Text library and use StringBuilder, no surprises there. What was surprising was how much slower IronPython was than CPython. I'm curious if this is just a freak result or if IronPython is known to be pretty slow. I'll probably not attempt to speed up the AddAssign op in IronPython, however I would like to investigate why IronPython is so slow, and if there are any plans to improve things. In addition I was surprised that the realloc trick had little-to-no effect on CPython in Windows (i.e. method 1 was slow even on 2.7.12).

I am a little sick of this benchmark now - I might revisit it one final time to compare it across CPython, IronPython, PyPy and Pyjion to complete the picture, but only if I'm really bored :)

IronPython - tackling some unexpected Exceptions

After I listened to an episode of Talk Python To Me featuring Alex Earl from the IronPython project I learned that not only is IronPython not dead/dying, but it's actually seeing a bit of a resurgence recently. I grabbed the sources from the IronLanguages GitHub, setup my dev environment, opened it up and launched the IronPythonConsole project hoping to see the familiar python REPL.

However instead I saw that it had hit an exception:

I was frustrated at first, thinking I'd done something wrong, but realised that getting to the bottom of an Exception was a fine way to introduce yourself to a new codebase.

The exception itself is a ZipImportError with the text "not a Zip file" and is thrown in the constructor for zipimporter.

Python Confession #1: I had never heard of or used zipimporter before.

Since I'd never heard of the class before I had no idea why the IronPython runtime would be calling this and especially on something which didn't appear to exist. So it's time to dig through the call stack to see where this comes from:

It appears that PythonCommandLine.ImportSite kicks this process off so that's where I started looking:

    private void ImportSite() {
        if (Options.SkipImportSite)
            return;
    
        try {
            Importer.ImportModule(PythonContext.SharedContext, null, "site", false, -1);
        } catch (Exception e) {
            Console.Write(Language.FormatException(e), Style.Error);
        }
    }

It turns out that site is a special Python module which is imported by default when the interpreter starts (for all implementations - IronPython, JythonPyPy and good old vanilla CPython). It's responsible for doing some platform-specific module path setup.

Python Confession #2: I had never heard of the site module before.

So how does importing site cause us to execute some code relating to zipimporter? Searching through the call stack at the point of the Exception shows that all this seems to come from FindImporterForPath which takes every function in path_hooks and attempts to apply it to the path we're importing.

    /// 
    /// Finds a user defined importer for the given path or returns null if no importer
    /// handles this path.
    /// 
    private static object FindImporterForPath(CodeContext/*!*/ context, string dirname) {
        List pathHooks = PythonContext.GetContext(context).GetSystemStateValue("path_hooks") as List;

        foreach (object hook in (IEnumerable)pathHooks) {
            try {
               object handler = PythonCalls.Call(context, hook, dirname);

                if (handler != null) {
                    return handler;
                }
            } catch (ImportException) {
                // we can't handle the path
            }
        }

So we call every path_hook with the module we're importing as an argument using PythonCalls.Call()The path_hooks themselves come from the sys module:

    sys.path_hooks

A list of callables that take a path argument to try to create a finder for the path. If a finder can be created, it is to be returned by the callable, else raise ImportError.

Python Confession #3: I had never heard of or used path_hooks before.

So what is in path_hooks? If I keep hitting continue on the Visual Studio debugger the Exception is caught, I reach the python REPL and can inspect what is in sys.path_hooks:

And there it is - zipimporter. Now we're approaching an explanation - when the IronPython interpreter is initialised it imports the site module which takes the everything in path_hooks and applies them to all the modules in our path - but since there are no .zip files anywhere in our path zipimporter (the only path hook) cannot find anything to operate on, so throws an exception which is normally caught and handled.

So this is normal behaviour - the exception is even expected, since path_hooks' documentation states that if a path_hook fails it raises an Exception.

Conclusion

OK nothing special has happened here since IronPython is behaving exactly as it should, however unexpected it was to me. That said, this is a very nice way to learn some new Python concepts, like:

  1. the zipimport module
  2. the site module
  3. sys.path_hooks 

I even have a half-working clone of zipimporter for tarballs called tgzimporter, but there's little need for such functionality as I suspect that even zipimporter is seldom used. 

It would've been easy to just keep hitting the "F5" key until I hit the REPL, but then I would likely have struggled to find a way to approach the source code and perhaps would've put it off indefinitely. Hopefully now I'll find some way to continue to improve my understanding and contribute to the IronPython project.

Thinkpad X250 - SMS and GPS via Python

When I was messing around trying to get the 4G modem working on my Thinkpad X250 for this post I ended up doing a little debugging by connecting to the modem over serial using cu:

    $ cu -h -l /dev/ttyACM0
    Connected.
    AT^M
    OK

It turns out that the Sierra EM7345 modem in my Thinkpad can be controlled through AT commands sent over serial. Over at zukota.com there's a guy who's done a seriously good job of experimenting with this modem and documenting what was previously not very well documented.  

For example if we wanted to use a SIM locked with the PIN "1234", then send the message "Hello, world!" to the Czech number 775123456, we would connect as above and enter the following:

    AT+CPIN="1234"^M

    OK
    AT+CMGS="+420775356278"^M
    > Hello, world!^Z^M
    +CMGS: 40

    OK

Getting the GPS position involved issuing the XLCSLSR at command and parsing it's output. This is a little more complicated, since there's a slight delay in the response, and the response contains dozens of fields (I've included the request/response in its entirety below so you can see):

    AT+XLCSLSR=1,1,,,,,,,,,,^M
    +XLCSLSR: request id 2

    OK

    +XLCSLSR: 2, 49.195669 N, 16.606075 E, 119.996932, 48.743179, 39.616302,143, 100.997169,67,2016/05/10,18:38:22,0,1,75.45,2.28,-0.25,2.20,0.64,239919,239919.74,,,4.50,2.50,3.50,118.92,62.80,100.98,,,1,1896,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4,73.04,70.04,21.00,1,,,,21 ,61.03,283.14,41.00,1,,,,16 ,20.01,173.09,30.00,1,,,,10 ,17.01,100.05,32.00,1,,,,29 

    OK

I was delighted when I found this out, as I had recently discovered the pyserial library, and realised I could interact with the modem and explore/expose some of the modem's functionality via a Python library.

em73xx

I rolled up my sleeves and wrote a library called em73xx (PyPI, GitHub) which will issue the necessary AT commands to send and receive SMS messages, and retrieve the current location using the modem's GPS functionality.

To install the library either retrieve from pypi using pip:

    $ pip install em73xx

... or you can clone the smcl/em73xx repo and install using setup.py:

    $ git clone https://github.com/smcl/py-em73xx
    $ cd py-em73xx
    $ python setup.py install

Once it's installed you can use em73xx by importing the Modem class and instantiating it with the path to the serial device on your system (probably /dev/ttyACM0, as below) and optionally specify the PIN for the sim inserted:

    from em73xx import Modem
    modem = Modem("/dev/ttyACM0", pin="1234")

To retrieve all the SMS messages on the SIM, and print them to stdout we can use the getSMS method:

    messages = modem.getSMS()
    for m in messages:
        print("from %s: % (m.sender))
        print("\t%s" % (m.message))

If we want to send a message we can use the sendSMS method:

    # request an 60 minute tram ticket in brno!
    modem.sendSMS("90206", "BRNO")

And finally if we want to retrieve the current GPS position, we can use the getGPS method - note that the modem can fail to get a GPS fix, and will return None if so:

    gps = modem.getGPS()
    if gps:
        print(gps.latitude)
        print(gps.longitude)

xsms

Ultimately I started this because I wanted to have a simple utility that integrated with xmobar which would inform me whether I had received a new SMS, and would allow me to reply to existing messages or compose an entirely new one. To achieve this I wrote xsms which is a utility that will either print to stdout the number of unread messages (for xmobar) or launch a GUI, depending on the command line switches used.

Again you can either retrieve xsms from pypi using pip:

    $ pip install xsms

... or clone the smcl/xsms repo from github and install using setup.py:

    $ git clone https://github.com/smcl/xsms
    $ cd xsms
    $ python setup.py install

Once xsms is installed you can either launch it standalone.

    $ python -m xsms --device=/dev/ttyACM0

And if you want to have a little indicator in your xmobar you can use something like the below (which takes advantage of the ability to specify the font via tags to easily get some icons from Font Awesome):

    -- assumes you have Font Awesome installed and used here:
    -- additionalFonts = ["xft:FontAwesome-10"],
    Run Com "/usr/bin/python" [ "-m", "xsms", "-d", "/dev/ttyACM0", "-p", "1234", "-r", "", "-u", " %d" ] "xsms" 600,

So when you add %sms% to your xmobarrc's template and restart you'll see something like this:

... and if you want to be able to click the icon to raise the GUI, you can surround it with an <action> which invokes xsms:

  template = "%StdinReader% }{ ... stuff ... <action=`python -m xsms -g -d /dev/ttyACM0 -p 1234`>%xsms%</action> ... "

Conclusion

This is an idea that sat half-finished in my ~/dev folder for about 6 months, and I'm really glad that I was able to take it from a single hacky one-shot script to two fully-fledged packages released on PyPI and ready to be used. There is still some additional functionality I'd like to add to em73xx, for example I only grab longitude/latitude from the GPS data and I'd like to be able to reset or power-cycle the modem in case of problems, however it's in pretty good shape overall.

As for xsms, it's the second Tkinter application I've put together, and while I'm finding it easier each time I write something (with fewer cut-paste lines from effbot.org's Tkinter docs) it's starting to be a little unwieldy. For example the ttk module allows you to add widgets to your application with a unified theme, but it's missing a multi-line text entry widget - which is something I use for inputting and displaying the SMS messages. This meant that I had to either decide to put off styling xsms or add some hacks to customise the Tkinter.Text widget that I ultimately used. In addition programatically constructing the UI using the grid() and pack() layout managers feels a little like creating a webpage laid out using nested <table> elements in the late 90's. Ultimately if I find myself writing a Python desktop app in future I'll spend a little more time investigating the frameworks available and weighing them up against using Tkinter, now that I'm broadly familiar with it.

Useful Links

Discover a Linux Utility - jjs

To learn more about Debian and Linux in general I'm selecting utilities at random from my PATH using the command below, learning what they do and writing a blog post about it. Previously: Part 1, Part 2

    $ (for folder in `echo $PATH | sed "s/:/\\n/g"`; do ls -1 $folder; done; ) | shuf -n 1 | xargs man

The random command I'm looking at this time is jjs - whose summary is simply "Invokes the Nashorn engine" which is a little vague and as it turns out is underselling things slightly. Nashorn is a Javascript engine written in Java for the Java VM with very neat integration and access to the range of libraries and functionality provided by the JDK.

While I'm not a Java or Javascript developer by trade I am surprised that I had never seen this pop up on Hacker News or lobste.rs before, and I'm sure many professional Java devs aren't particularly familiar with it either. I was even more surprised how quick it was to get productive (the man page suggests using the println function, which doesn't exist) since my hazy memories from using Java at university involved fiddling around with the CLASSPATH env variable and launching things in weird ways. 

Entering jjs takes you to a REPL prompt where you can muck around with javascript to your heart's content while you get familiar - here you'll see what I mean about the example in the manpage, println should be print:
    $ jjs
    jjs> println("hello, world!")
    :1 ReferenceError: "println" is not defined
    jjs> print("hello, world!") 
    hello, world!
    jjs> function add(x, y) { return x + y }
    function add(x, y) { return x + y }
    jjs> add(10, 20)
    30
    jjs> function fib(n) { if (n < 1) { return 0 } else if (n <= 2) { return 1 } else { return fib (n - 1) + fib (n - 2)} }
    function fib(n) { if (n < 1) { return 0 } else if (n <= 2) { return 1 } else { return fib (n - 1) + fib (n - 2)} }
    jjs> fib(3)
    2
    jjs> fib(50)
    12586269025
    jjs>
What I really like is that there's a fuss-free interface to the entire JDK through an object conveniently called java:
    jjs> var str = new java.lang.String("Hello, world!")
    jjs> java.lang.System.out.println(str)
    Hello, world!
    jjs> var dict = new java.util.HashMap()
    jjs> dict.put("foo", "bar")
    null
    jjs> dict.put("baf", "baz")
    null
    jjs> dict.forEach(function(k,v) { print(k + v); }) 
    bafbaz
    foobar
That final line pretty much sums up why I think this is cool - having created an instance of a java.util.ashMap we iterate over it using its forEach method but we can give it a javascript lambda function as an argument to apply to each key/value pair. There's no denying that this is neat :)

I wanted to do a little measurement between the V8 compiler used by Chrome and Node.js but it turns out this has already been done: http://blog.nidi.guru/tech/java/2015/10/05/graphviz-with-java/


So Nashorn is actually a good deal slower than V8 (and that's with a cold VM, warmed up the difference is more stark) - which isn't a huge problem I suppose unless you're doing some really heavy lifting using Nashorn. I don't think many people are.

My previous "Discover ..." posts generally had to lay a bit of groundwork to introduce various OS concepts before the utility could be understood. However since Java and Javascript are both pretty commonplace there's no introduction needed, and the best way to show it off and understand what it's capable of is to write some code.

So I implemented four demo programs which are small enough to comprehend in a minute or so, explore the interesting relationship between js/JDK and demonstrates some relatively common use-cases for programming languages:
  1. shell scripting
  2. unix system utilities
  3. visualisation
  4. web services

Demo 1 - a simple shell script

I'd recently read a blog post at IBM developerWorks tracing this history of UNIX shells (csh, tcsh, bash, etc) and implementing the same task in each one, I figured that this was as good a task as any to start with. I implemented this as findexec.js below:


The code is a little less elegant than the bash version, we rely pretty heavily on java.io.File to accomplish some of the things built into bash, but realistically we're using the wrong tool for the job here.
    $ jjs findexec.js -- /usr/local/bin
    /usr/local/bin/fsv
    /usr/local/bin/n-m
    /usr/local/bin/behave
    /usr/local/bin/tzupdate
    /usr/local/bin/dotnet

Demo 2 - a unix-y utility

The next program I wrote was a unix-style system utility that reverses the order of its input, which is either
  1. a list of lines read in from stdin (piped or input)
  2. a list of files supplied as arguments, which are each written reversed
This was a little more fun to write - couple of interesting things here. First was ability to use generics - could just create new ArrayList without specifying the type. Second was polymorphism between a Java stdlib class BufferedReader and a javascript class I wrote MultipleFileReader which both happen to implement a method readLine() but which don't explicitly implement any common interface or abstract class. 

I implemented this as filerev.js, which is a wee bit long for this blog but can be found at this Gist on GitHub. Below is a little snippet showing its usage:
    $ cat > foo << EOF
    > herp
    > derp
    > blorp
    > EOF
    $ jjs filerev.js -- foo
    blorp
    derp
    herp
    $ cat foo | jjs filerev.js 
    blorp
    derp
    herp

Demo 3 - a JavaFX utility

In the manpage I noticed the -fx which "launches the script as a JavaFX application". I hadn't used Java since university so I had no clue what JavaFX was, but it's apparently a set of libraries for writing graphical applications in Java, and mostly replaces Swing/AWT toolkits among other things.

After I read a bunch of documentation and puddled around a bit I decided that I wanted an application which can quickly produce little line graphs from input piped via stdin (and dynamically resizes according to the max/min values in the dataset).

This was a little trickier than the previous two examples but after finding the documentation on extending abstract classes (for the AnimationTimer) the rest was surprisingly straight forward. I created a file containing a a few repetitions of a sine wave, and piped it in to generate the visualisation below:


Again, the code is a Gist on GitHub as plotstdin.js along with the input data (and the code that generated it).

Demo 4 - a web service

For my final example I wanted to spin up a quick and simple web service, however this apparently not so straight-forward in Java. While the JDK has plenty of libraries available, there's no equivalent to SimpleHttpServer in Python or http.createServer() in Node.js - it seems that you need to use a third party package like Tomcat or Jetty. This presents a bit of a problem, since I want to create a little self-contained example without having to resort to resolve any dependencies using Gradle and Maven, which I'm unfamiliar with and would be tricky to work with using Nashorn.

However I found a nice project on GitHub which handles most of this called Nasven which wraps Maven and let's me easily add a dependency for Spark (lightweight web framework) and launch it.

I created a repo called thumbooo which will spin up a simple web service running on port 8080, with a single POST resource that produces < 100 x 100 pixel thumbnails for submitted images.

While Nasven handled the toughest part, I still ran into trouble since Nashorn was unable to determine which of the three ImageIO.write() functions it should call since the image resizing produced an object of type ToolkitImage and it expected a RenderedImage

You can ran the following to start the server:

    $ git clone https://github.com/smcl/thumbooo
    Cloning into 'thumbooo'...
    remote: Counting objects: 42, done.
    remote: Compressing objects: 100% (37/37), done.
    remote: Total 42 (delta 18), reused 17 (delta 4), pack-reused 0
    Unpacking objects: 100% (42/42), done.
    Checking connectivity... done.
    $ cd thumbooo
    $ jjs -scripting nasven.js -- src

and in another xterm use cURL to POST an image to the /thumb endpoint:

    $ curl -L -o thumb.png --form "file=@lenna.png" http://localhost:8080/thumb
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  494k    0 32358  100  462k   638k  9353k --:--:-- --:--:-- --:--:-- 9447k

This resized the original 512x512 Lenna image ...


... to a 100x100 thumbnail:

 

Conclusion

I'm impressed with the integration between Javascript and Java that Nashorn provides, as demonstrated by the snippets above it's relatively flexible and easy to use (if a tad slow!). As a technical exercise it's very impressive, however I'm still not 100% sure what it was originally meant to be used for. Perhaps as a non-js/Java developer I'm missing something, but I feel like there are already better options available for individual hackers all the way to Enterprise-scale projects. I certainly had fun using it though!