Performance
Filed in: work, development, software, optimization Add comments
Last night before heading to bed, I left my Mac running two series of performance tests.
One with my wx.GraphicsContext changes turned off, and another with them turned on.
The command line was:
I would use an open source spreadsheet program, but Open Office et al are too heavy. I wish there was something like AbiWord for spreadsheets. AbiData? AbiNumbers? Alas.. It passes through a file potentially twice. The whole
date
cp -f parcels/osaf/framework/blocks/calendar/CollectionCanvas{-off,}.py \
&& perftest off \
&& cp -f parcels/osaf/framework/blocks/calendar/CollectionCanvas{-on,}.py \
&& perftest on
date
The tests started at Thu Apr 26 02:59:25 EDT 2007 and finished at Thu Apr 26 05:28:06 EDT 2007.
So they ran for about 2½ hours.
You don’t want to do this kind of thing during the day!
This ended up being a long post with code, etc, so click to see more.
As a side note, this entry looks a lot better if you do a print preview on it. The current “screen” style I’m using has a fixed width (*sigh*), while the “print” style does not.Well, to be fair, I was being lazy a bit. I just used the straight-up performance test given by “
tools/rt.py –perf“,
which loads in a huge data set every time it’s run, which takes a little while.
If I was being smart about it, listening to
what Andi said,
I would load the data set once, and preload all the items.
When I run these tests I use a little csh script called “perftext“.
I tend to use csh instead of bash.
I used to write everything in ksh, which was pretty much what bash is now.
But that is a topic for another time..
Here’s my script, minus the #! line
cd ~/work/osaf/chandler/chandler
set count=10
if($#argv > 0) then
set suffix="-$argv[1]"
else
set suffix=""
endif
set i=0
while($i < $count)
@ i++
echo ''; echo " Running perf test #$i"; echo ''
tools/rt.py --perf |& tee ~/Desktop/perf/`date +%F-%H%M`-perf-${suffix}${i}.txt
end
This basically runs the performance test 10 times, saving the output in a date-stamped and optional suffix-stamped file.
As you can see by my usage above, I use the suffices of “on” and “off” to represent using and not using my changes.
This leaves me with a bunch of performance output, which I like to load into Excel.
I use a little Python script called perf2csv for this job:
#!/usr/bin/env python
from sys import *
from os import *
from re import *
if len(argv) < 2:
print "Usage: %s file [file..]" % argv[0]
exit(1)
printedHeader = False
def printHeader(in_file):
try:
f = file(in_file)
except:
return False
print 'filename,',
for line in f:
d = None
if line.count('- + - + -'):
f.close()
break
m = match(r'^(?P<testName>\S+)', line)
if m is not None:
d =m.groupdict()
print "%s, " % d['testName'],
if d['testName'] == 'Startup' or d['testName'] == 'Startup_with_large_calendar':
print "%s, %s, %s, %s, " % ('', '', '', ''),
print "\n",
return True
for arg in argv[1:]:
if not printedHeader:
printedHeader = printHeader(arg)
try:
f = file(arg)
except:
print "Can't read \"%s\"" %arg
continue
starts = { }
print "%s, " % arg,
for line in f:
d = None
if line.count('- + - + -'):
f.close()
break
m = match(r'^(?P<testName>\S+)\s+(?P<time1>[\d\.]+)\s+\|\s+(?P<time2>[\d\.]+)', line)
if m is not None:
d = m.groupdict()
else:
m = match(r'^(?P<testName>\S+)\s+(?P<time1>[\d\.]+)\s+'
r'(?P<time2>[\d\.]+)\s+(?P<time3>[\d\.]+)\s+'
r'\|\s+(?P<time4>[\d\.]+)\s+\S+\s+(?P<time5>[\d\.]+)\s+', line)
if m is not None:
d = m.groupdict()
else:
continue
if d is not None:
if len(d) == 3:
print "%s," % d["time1"],
#times[d["testName"]] = d["time1"]
else:
starts[d['testName']] = "%s, %s, %s, %s, %s" % (d["time1"], d["time2"], d["time3"], d["time4"], d["time5"])
# print a newline
try:
print "%s, " % starts['Startup'],
except:
print "<%s> 0.0," % ','.join(starts.keys()),
try:
print starts['Startup_with_large_calendar']
except:
print "0.0"
# finished parsing the file
This creates a csv file suitable for loading into Excel.
Aside:I would use an open source spreadsheet program, but Open Office et al are too heavy. I wish there was something like AbiWord for spreadsheets. AbiData? AbiNumbers? Alas.. It passes through a file potentially twice. The whole
printHeader function is just to parse a file to get at the names of the test at the top of the file, so as to have column headers.
Then lines are read in until ‘- + - + -‘ is seen, which is as far as it n eeds to go.
I wasn’t a happy camper when I found out that the way you prevent Python’s print function from appending a newline to its output was to add a trailing comma to the statement.
What an ugly, ugly hack.
Some may object to my “from xxx import *” style at the top, but this makes Python behave more like other sripting languages for me.
Maybe it’s from almost 20 years of Perl experience.
Once I have a csv file, I read it into Excel and average the numbers across the 10 runs.
I use this formula to throw out the highest and lowest values, always a good idea when collecting staistical samples:
=(SUM(B57:B66)-LARGE(B57:B66,1)-SMALL(B57:B66,1))/(COUNT(B57:B66)-2)
The LARGE and SMALL functions return the sum of the nth largest/smallest values.
Very useful in this case.
So what do I end up with?
Well,l not as good a result as I had been hoping, alas.
PerfLargeDataScrollCalendar is actually 17.95% *slower*, while
PerfLargeDataNewEventFileMenu is 17.77% *faster*.
Some tradeoffs happening, obvisouly.
I’m still working on it though.