Results of profiling parallel tests (was Re: [1194] grep in scalar context doesn't short-circuit (it can' t - side effects) so the)

Nicholas Clark nick at ccl4.org
Sat Aug 30 21:48:13 UTC 2008


On Sat, Aug 30, 2008 at 09:03:56PM +0000, nicholas at hexten.net wrote:
> Revision: 1194
> Author:   nicholas
> Date:     2008-08-30 21:03:56 +0000 (Sat, 30 Aug 2008)
> 
> Log Message:
> -----------
> grep in scalar context doesn't short-circuit (it can't - side effects) so the
> implicit loop with a recursive call in the grep in _not_empty() was expensive.
> Replace it with a foreach, that short-circuits by returning early. This takes
> the time in this method down from 52 seconds to 8. Win!

Which basically gets us from

http://www.wu-wien.ac.at/usr/ma/nclark/blead_harness_34237_r1189/

Calls  	 P  F  Inclusive  Exclusive  Subroutine
9515804  2  1  621.50612  175.03842  TAP::Parser::Scheduler::_gather
7525708  2  1  316.94195  129.37304  TAP::Parser::Scheduler::_not_empty
  70049  2  2   27.81537   20.28561  IO::Select::can_read
  92717  2  1  195.24093   17.34687  TAP::Parser::Scheduler::_find_next_job
   1549  1  1   16.98888   16.98888  IPC::Open3::xfork
 209137  1  1   54.29413   16.58686  TAP::Parser::__ANON__[../lib/TAP/Parser.pm:1234]
 961494 42  6   15.23568   15.23568  TAP::Parser::__ANON__[../lib/TAP/Parser.pm:72]
 235565  1  1   91.79267   15.19162  TAP::Parser::Grammar::tokenize
to

http://www.wu-wien.ac.at/usr/ma/nclark/blead_harness_34237_r1195/

Calls  	 P  F  Inclusive  Exclusive  Subroutine
  77782  2  2   39.29334   30.89907  IO::Select::can_read
   1549  1  1   21.23763   21.23763  IPC::Open3::xfork
 209137  1  1   55.51446   16.78572  TAP::Parser::__ANON__[../lib/TAP/Parser.pm:1234]
 962506 42  6   15.91717   15.91717  TAP::Parser::__ANON__[../lib/TAP/Parser.pm:72]
 235556  1  1   95.39925   15.67889  TAP::Parser::Grammar::tokenize

ie time spent in the gubbins of the parallel test job scheduler has dropped
from "lots" to "bugger all".

I'm not entirely convinced that the timing slowdown of Devel::NYTProf is
linear, because I don't see anywhere near as large a CPU second drop when
running on an optimised build without profiling.

1: Does NYTProf slow down subroutine calls way more than most other statements?
2: Is NYTProf getting confused about inclusive times in the first profile,
   with the inefficient recursive methods _gather() and _not_empty()?


Anyone else can play along with these simple steps:

$ rsync -za rsync://ftp.linux.activestate.com/perl-current/ bleadperl
$ cd bleadperl
$ rm -rf ext/Test/Harness
$ svn co https://svn.hexten.net/tapx/trunk ext/Test/Harness
$ svn co http://perl-devel-nytprof.googlecode.com/svn/trunk ext/Devel/NYTProf
$ ./Configure -des -Dusedevel && make -j3 test_prep
$ cd t
$ export TEST_JOBS=3
$ ./perl -I../lib -d:NYTProf harness

and then generate your HTML with something like

$ ./perl -I../lib ../ext/Devel/NYTProf/bin/nytprofhtml --out ~/public_html/blead_harness_34237_r1195

Nicholas Clark


More information about the tapx-dev mailing list