“Final” logo Chandler migration - happy surprise

Performance

Last night before heading to bed, I left my Mac running two series of performance tests. One with my wx.GraphicsContext changes turned off, and another with them turned on. The command line was:

date
cp -f parcels/osaf/framework/blocks/calendar/CollectionCanvas{-off,}.py \
&& perftest off \
&& cp -f parcels/osaf/framework/blocks/calendar/CollectionCanvas{-on,}.py \
&& perftest on
date

The tests started at Thu Apr 26 02:59:25 EDT 2007 and finished at Thu Apr 26 05:28:06 EDT 2007. So they ran for about 2½ hours.

You don’t want to do this kind of thing during the day! This ended up being a long post with code, etc, so click to see more.

As a side note, this entry looks a lot better if you do a print preview on it. The current “screen” style I’m using has a fixed width (*sigh*), while the “print” style does not.
Well, to be fair, I was being lazy a bit. I just used the straight-up performance test given by “tools/rt.py –perf“, which loads in a huge data set every time it’s run, which takes a little while. If I was being smart about it, listening to what Andi said, I would load the data set once, and preload all the items.

When I run these tests I use a little csh script called “perftext“. I tend to use csh instead of bash. I used to write everything in ksh, which was pretty much what bash is now. But that is a topic for another time..

Here’s my script, minus the #! line

cd ~/work/osaf/chandler/chandler

set count=10

if($#argv > 0) then
    set suffix="-$argv[1]"
else
    set suffix=""
endif

set i=0
while($i < $count)
    @ i++
    echo ''; echo " Running perf test #$i"; echo ''
    tools/rt.py --perf |& tee ~/Desktop/perf/`date +%F-%H%M`-perf-${suffix}${i}.txt
end

This basically runs the performance test 10 times, saving the output in a date-stamped and optional suffix-stamped file. As you can see by my usage above, I use the suffices of “on” and “off” to represent using and not using my changes.

This leaves me with a bunch of performance output, which I like to load into Excel. I use a little Python script called perf2csv for this job:

#!/usr/bin/env python

from sys import *
from os import *
from re import *

if len(argv) < 2:
    print "Usage: %s file [file..]" % argv[0]
    exit(1)

printedHeader = False

def printHeader(in_file):
    try:
        f = file(in_file)
    except:
        return False
    print 'filename,',
    for line in f:
        d = None
        if line.count('- + - + -'):
            f.close()
            break
        m = match(r'^(?P<testName>\S+)', line)
        if m is not None:
            d =m.groupdict()
            print "%s, " % d['testName'],
            if d['testName'] == 'Startup' or d['testName'] == 'Startup_with_large_calendar':
                print "%s, %s, %s, %s, " % ('', '', '', ''),
    print "\n",
    return True

for arg in argv[1:]:
    if not printedHeader:
        printedHeader = printHeader(arg)
    try:
        f = file(arg)
    except:
        print "Can't read \"%s\"" %arg
        continue
    starts = { }
    print "%s, " % arg,
    for line in f:
        d = None
        if line.count('- + - + -'):
            f.close()
            break
        m = match(r'^(?P<testName>\S+)\s+(?P<time1>[\d\.]+)\s+\|\s+(?P<time2>[\d\.]+)', line)
        if m is not None:
            d = m.groupdict()
        else:
            m = match(r'^(?P<testName>\S+)\s+(?P<time1>[\d\.]+)\s+'
                r'(?P<time2>[\d\.]+)\s+(?P<time3>[\d\.]+)\s+'
                r'\|\s+(?P<time4>[\d\.]+)\s+\S+\s+(?P<time5>[\d\.]+)\s+', line)
            if m is not None:
                d = m.groupdict()
            else:
                continue
        if d is not None:
            if len(d) == 3:
                print "%s," % d["time1"],
                #times[d["testName"]] = d["time1"]
            else:
                starts[d['testName']] = "%s, %s, %s, %s, %s" % (d["time1"], d["time2"], d["time3"], d["time4"], d["time5"])
    # print a newline
    try:
        print "%s, " % starts['Startup'],
    except:
        print "<%s> 0.0," % ','.join(starts.keys()),
    try:
        print starts['Startup_with_large_calendar']
    except:
        print "0.0"
    # finished parsing the file

This creates a csv file suitable for loading into Excel.

Aside:
I would use an open source spreadsheet program, but Open Office et al are too heavy. I wish there was something like AbiWord for spreadsheets. AbiData? AbiNumbers? Alas..

It passes through a file potentially twice. The whole printHeader function is just to parse a file to get at the names of the test at the top of the file, so as to have column headers. Then lines are read in until ‘- + - + -‘ is seen, which is as far as it n eeds to go.

I wasn’t a happy camper when I found out that the way you prevent Python’s print function from appending a newline to its output was to add a trailing comma to the statement. What an ugly, ugly hack.

Some may object to my “from xxx import *” style at the top, but this makes Python behave more like other sripting languages for me. Maybe it’s from almost 20 years of Perl experience.

Once I have a csv file, I read it into Excel and average the numbers across the 10 runs. I use this formula to throw out the highest and lowest values, always a good idea when collecting staistical samples:

=(SUM(B57:B66)-LARGE(B57:B66,1)-SMALL(B57:B66,1))/(COUNT(B57:B66)-2)

The LARGE and SMALL functions return the sum of the nth largest/smallest values. Very useful in this case.

So what do I end up with? Well,l not as good a result as I had been hoping, alas. PerfLargeDataScrollCalendar is actually 17.95% *slower*, while PerfLargeDataNewEventFileMenu is 17.77% *faster*. Some tradeoffs happening, obvisouly. I’m still working on it though.

Leave a Reply