Discussion:
[farm-report] Failed to build python-SunOS-5.8-sun4d-fafner
Per Cederqvist
2002-07-23 07:39:24 UTC
Permalink
Build test failed.
--
[...]
find ../python/dist/src/Lib -name '*.py[co]' -print | xargs rm -f
./python -E -tt ../python/dist/src/Lib/test/regrtest.py -l
sh: cannot fork: no swap space
sh: cannot fork: no swap space
make: *** [test] Segmentation Fault (core dumped)
./python -E -tt ../python/dist/src/Lib/test/regrtest.py -l
make: *** [test] Bus Error (core dumped)
Something is very wrong with the sfarmer scripts. There are currently
106 processes owned by sfarmer running on fafner, many of them running
identical commands. There are 21 each of the following four commands:

./python -E -tt ../python/dist/src/Lib/test/regrtest.py -l
/bin/sh ./buildit.sh
make test
sh -c (PATH=/sw/local/bin:/usr/bin:/bin:/usr/ccs/bin; export PATH; cd $HO

(The last command is probably truncated by ps).

A new test run should never be started before the previous test run
has finished. You could send a warning if a test run doesn't finish
when expected.

I will start to kill a few of the processes now to reclaim some memory
on fafner, but will leave a few of them for your debugging.

/ceder
Anders Qvist
2002-07-23 18:43:34 UTC
Permalink
Post by Per Cederqvist
Build test failed.
--
[...]
find ../python/dist/src/Lib -name '*.py[co]' -print | xargs rm -f
./python -E -tt ../python/dist/src/Lib/test/regrtest.py -l
sh: cannot fork: no swap space
sh: cannot fork: no swap space
make: *** [test] Segmentation Fault (core dumped)
./python -E -tt ../python/dist/src/Lib/test/regrtest.py -l
make: *** [test] Bus Error (core dumped)
Something is very wrong with the sfarmer scripts. There are currently
106 processes owned by sfarmer running on fafner, many of them running
./python -E -tt ../python/dist/src/Lib/test/regrtest.py -l
/bin/sh ./buildit.sh
make test
sh -c (PATH=/sw/local/bin:/usr/bin:/bin:/usr/ccs/bin; export PATH; cd $HO
(The last command is probably truncated by ps).
A new test run should never be started before the previous test run
has finished. You could send a warning if a test run doesn't finish
when expected.
I will start to kill a few of the processes now to reclaim some memory
on fafner, but will leave a few of them for your debugging.
This is a known problem. Something in the tests bogs down on sunos
5.8 (at least on proton and fafner).

I try to keep these from doing too much harm by ulimit, but I have not
found a viable way of deciding when the build script has executed long
enough. Any suggestions would be very welcome. buildit.sh, line 146:

(make $target 9>&1 1>&2 2>&9 | tee $makeerrlog) > $makelog 2>&1

When some predetermined time has passed, if still on this line,
all children should be killed.

Hmm. Maybe I should just put a stamp file somewhere, to check for
before we start building targets in a particular module. If that file
is still present at next execution, we report that build got stuck?
--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky
Per Cederqvist
2002-07-23 19:12:56 UTC
Permalink
Post by Anders Qvist
Post by Per Cederqvist
A new test run should never be started before the previous test run
has finished. You could send a warning if a test run doesn't finish
when expected.
This is a known problem. Something in the tests bogs down on sunos
5.8 (at least on proton and fafner).
I try to keep these from doing too much harm by ulimit, but I have not
found a viable way of deciding when the build script has executed long
(make $target 9>&1 1>&2 2>&9 | tee $makeerrlog) > $makelog 2>&1
Do you use "ulimit -t" to set the maximum CPU time? However, that
would not have helped in this case, since the processes used very
little CPU. They seemd to be stuck waiting for something.
Post by Anders Qvist
When some predetermined time has passed, if still on this line,
all children should be killed.
Hmm. Maybe I should just put a stamp file somewhere, to check for
before we start building targets in a particular module. If that file
is still present at next execution, we report that build got stuck?
Yes, that would be better, and allow the computers to continue with
other tasks even when you do not babysit it. :-)

Killing the processes makes it hard to find out what was wrong. If
you allow them to continue to run, and report an "already running"
error to the list, somebody can attach a debugger or strace to the
hung process and see what it is doing.

/ceder

Loading...