the parsing process is not inherently slow
Eric Wilhelm
scratchcomputing at gmail.com
Tue Oct 2 21:31:42 BST 2007
# from Andy Armstrong
# on Tuesday 02 October 2007 10:10:
>On 2 Oct 2007, at 17:49, Eric Wilhelm wrote:
>> Even if an improved parser speed becomes a bottleneck (at around 900
>> cores), we can deal with that in better ways than adding
>> conditionals to the harness.
>
>For that to be true the parser would have to be capable of consuming
>TAP 900 times faster than the tests produced it. Do you believe
>that's possible?
Of course. The dotReader test suite has 7326 subtests and takes 83
seconds of CPU time to run. That's 88 subtests output per second.
(for comparison, Scalar::Util outputs 380/1.8=211 subtests per second
but has only two modules, so won't spend as much time compiling the way
large test suites tend to do (and some of the dotReader tests are
loading Wx.))
`parses_tap.pl` time: 0.08s => 91,575 subtests/s
`runtests --exec cat` time: 1.72s => 4,259 subtests/s
900 cores * 88 subtests / s => 79,200 subtests/s
So, my stupid parser starts blocking 900 cores pretty soon because I
need to add correctness, but we're talking about 900 cores here!
Let's pretend I can make a proper parser which hits the requisite 80k
subtests/s. At Scalar::Util's 211 subtests/s, I can still keep up with
about 380 cores.
Aside: running my stupid parser on an archive like that of
Regexp::Common gives a slightly better numbers because it is analyzing
more tests in one process: 223259 subtests/0.49s = 455,630!
( But it spits out 7,442 subtests/s, which means it is doing about this
much calculation between outputs:
my $q = 5*10+6 for(1..500); print "foo $q\n";
That is, this takes 1s on my machine:
time perl -e '
for(1..8_000) {my $q = 5*10+6 for(1..500); print "foo $q\n";}
' > /dev/null
)
How many cores can my stupid parser handle at that rate? Using the slow
80k st/s, rather than the actual 455k st/s, we get 11 cores, but part
of the reason that it parses that faster than dotReader is because each
test file is dumping a huge load of subtests (~4k st/test), so the
parser is in the "next ok" loop a lot, which implies that it might be
fair to apply the 455k st/t number, which gives us 455000/7442 = 61
cores. Anyway, we only had 56 tests, so some cores are bored ;-)
>Insisting that all the parsing happen on a single core is always
>going to hit a brick wall at some point.
My observations imply that the wall is way over there
------------------------------------------------------------------------------------------------->.
And I'm not *insisting* on a single-threaded parser. I'm suggesting
that cleaner parallel architecture is better even with the currently
dirt-slow parser (because the parse time is already negligible on most
real-world tests.)
These numbers are ridiculous only in that 900 cores would present a
ridiculous amount of other challenges. However, taking the 80k st/s
parse rate as an upper bound on speed, I can think we can easily shoot
for filling 40 cores with "normal" suites (40 cores * 500 st/s = 20k
st/s parse rate) and (conservatively) 5 with pathological ones like
Regexp::Common.
So, a forking parser optimizes for *pathological* test suites (7k st/s)
if you have about 8 cores. Given that a vast install-base of more than
two cores is a year or more away (the quads currently being the pricier
"flagship" cpu), I say messing with a forking parser this year is not
worth it.
The brick wall I see on the not-so-distant horizon is the one where
TAP::Harness is not as extensible or flexible as promised and
subclasses or plugins (and users) will be dealing with API
compatibility troubles while it tries to adapt.
BTW, A simple distributed harness subclass was only a short hop away
from TAP::Harness::Parallel, requiring a bit of one-time Unionfs+NFS
setup and sticking ssh $node before $^X (or roughly so.) It might have
even been a quite simple subclass of T::H::P.
--Eric
--
Turns out the optimal technique is to put it in reverse and gun it.
--Steven Squyres (on challenges in interplanetary robot navigation)
---------------------------------------------------
http://scratchcomputing.com
---------------------------------------------------
More information about the tapx-dev
mailing list