the parsing process is not inherently slow

Eric Wilhelm scratchcomputing at gmail.com
Tue Oct 2 21:31:42 BST 2007


# from Andy Armstrong
# on Tuesday 02 October 2007 10:10:

>On 2 Oct 2007, at 17:49, Eric Wilhelm wrote:
>> Even if an improved parser speed becomes a bottleneck (at around 900
>> cores), we can deal with that in better ways than adding
>> conditionals to the harness.
>
>For that to be true the parser would have to be capable of consuming  
>TAP 900 times faster than the tests produced it. Do you believe  
>that's possible?

Of course.  The dotReader test suite has 7326 subtests and takes 83 
seconds of CPU time to run.  That's 88 subtests output per second.  
(for comparison, Scalar::Util outputs 380/1.8=211 subtests per second 
but has only two modules, so won't spend as much time compiling the way 
large test suites tend to do (and some of the dotReader tests are 
loading Wx.))

`parses_tap.pl` time:       0.08s  => 91,575 subtests/s
`runtests --exec cat` time: 1.72s  =>  4,259 subtests/s

900 cores * 88 subtests / s        => 79,200 subtests/s

So, my stupid parser starts blocking 900 cores pretty soon because I 
need to add correctness, but we're talking about 900 cores here!

Let's pretend I can make a proper parser which hits the requisite 80k 
subtests/s.  At Scalar::Util's 211 subtests/s, I can still keep up with 
about 380 cores.

Aside:  running my stupid parser on an archive like that of 
Regexp::Common gives a slightly better numbers because it is analyzing 
more tests in one process:  223259 subtests/0.49s = 455,630!

( But it spits out 7,442 subtests/s, which means it is doing about this
  much calculation between outputs:

    my $q = 5*10+6 for(1..500); print "foo $q\n";

  That is, this takes 1s on my machine:

    time perl -e '
     for(1..8_000) {my $q = 5*10+6 for(1..500); print "foo $q\n";}
    ' > /dev/null
)

How many cores can my stupid parser handle at that rate?  Using the slow 
80k st/s, rather than the actual 455k st/s, we get 11 cores, but part 
of the reason that it parses that faster than dotReader is because each 
test file is dumping a huge load of subtests (~4k st/test), so the 
parser is in the "next ok" loop a lot, which implies that it might be 
fair to apply the 455k st/t number, which gives us 455000/7442 = 61 
cores.  Anyway, we only had 56 tests, so some cores are bored ;-)

>Insisting that all the parsing happen on a single core is always  
>going to hit a brick wall at some point.

My observations imply that the wall is way over there
------------------------------------------------------------------------------------------------->.

And I'm not *insisting* on a single-threaded parser.  I'm suggesting 
that cleaner parallel architecture is better even with the currently 
dirt-slow parser (because the parse time is already negligible on most 
real-world tests.)

These numbers are ridiculous only in that 900 cores would present a 
ridiculous amount of other challenges.  However, taking the 80k st/s 
parse rate as an upper bound on speed, I can think we can easily shoot 
for filling 40 cores with "normal" suites (40 cores * 500 st/s = 20k 
st/s parse rate) and (conservatively) 5 with pathological ones like 
Regexp::Common.

So, a forking parser optimizes for *pathological* test suites (7k st/s) 
if you have about 8 cores.  Given that a vast install-base of more than 
two cores is a year or more away (the quads currently being the pricier
"flagship" cpu), I say messing with a forking parser this year is not 
worth it.

The brick wall I see on the not-so-distant horizon is the one where 
TAP::Harness is not as extensible or flexible as promised and 
subclasses or plugins (and users) will be dealing with API 
compatibility troubles while it tries to adapt.

BTW, A simple distributed harness subclass was only a short hop away 
from TAP::Harness::Parallel, requiring a bit of one-time Unionfs+NFS 
setup and sticking ssh $node before $^X (or roughly so.)  It might have 
even been a quite simple subclass of T::H::P.

--Eric
-- 
Turns out the optimal technique is to put it in reverse and gun it.
--Steven Squyres (on challenges in interplanetary robot navigation)
---------------------------------------------------
    http://scratchcomputing.com
---------------------------------------------------


More information about the tapx-dev mailing list