Posts for Tag: internals

IronPython - Integer performance improvements

When I was puddling around in IronPython ahead of an upcoming project I spotted something interesting - when we want to deal with integers within the IronPython interpreter we frequently call a function in Microsoft.Scripting.Runtime.ScriptingRuntimeHelpers called Int32ToObject:

    public static object Int32ToObject(Int32 value) {
        // caches improves pystone by ~5-10% on MS .Net 1.1, this is a very integer intense app
        // TODO: investigate if this still helps perf. There's evidence that it's harmful on
        // .NET 3.5 and 4.0

        if (value < MAX_CACHE && value >= MIN_CACHE) {
            return cache[value - MIN_CACHE];
        }
        return (object)value;
    }
For integers in the range -100 and 1,000 we maintain a cached array of objects already converted from Int32. This is called a lot even in relatively innocuous looking programs - to see this we can a simple Console.Out.WriteLine("Int32ToObject") in there and count the number of times it occurs in a program that simply prints "Hello, world!" we can see this:
    $ cat hello.py
    print "Hello, world!"
    $ ./bin/Debug/ipy.exe hello.py | grep -c "Int32ToObject"
    1317

The code itself specifically references the pystone benchmark (which I found here) in a comment suggesting that we could see a performance improvement on pystone with versions of .NET newer than 3.5 - which appears to be the minimum version later versions of IronPython supports.

I built the Release configuration of IronPython both before and after removing this cache functionality, and tested the default pystone benchmark on my work computer (a pretty hefty 8-core Xeon E3-1275 @ 3.6 GHz, with 32GB RAM) - the results are below where I ran the test 10 times and took the average. The values output by the benchmark are "pystones per second" - where one "pystone" is an iteration through the main loop inside the Proc0() function which performs a number of integer operations and function calls:


Before  After 
1. 238262 234663
2. 239115 234595
3. 245149 245931
4. 237845 302562
5. 228906 295027
6. 248294 275535
7. 258694 297271
8. 246650 282791
9. 235741 296104
10. 233604 274396
Average 241226 273887

So with the fix we see 32,661 more of these iterations per-second, which is roughly a 13.5% improvement. This makes sense - presumably casting each int to object has been improved so that it's nearly free, leaving the overhead being a simple function call.