Discussion:
Can we reproduce this?
Laura Creighton
2003-01-04 11:41:36 UTC
Permalink
--------- Forwarded Message

Return-Path: python-list-***@python.org
Delivery-Date: Sat Jan 4 00:10:32 2003
From: Inyeol Lee <***@siimage.com>
To: python-***@python.org
Subject: 2.3a1 Solaris build trouble - make test failure
When I try 'make test' during building python2.3a1 with gcc2.95.3 in
Solaris8, it stops at 'test_signal' - it stays there until the process
is killed via kill -9.
If running test_signal only, the message is;

$ python test_signal.py
+ sleep 2
starting pause() loop...
call pause()...
+ kill -5 6825
+ sleep 2
handlerA (5, <frame object at 0x173f18>)
pause() returned
call pause()...
+ kill -2 6825
+ sleep 2
handlerB (2, <frame object at 0x173f18>)
HandlerBCalled exception caught
call pause()...
+ kill -3 6825
KeyboardInterrupt (assume the alarm() went off)
$

My build system is;

$ uname -a
SunOS abbey 5.8 Generic_108528-07 sun4u sparc SUNW,Sun-Blade-1000

$ python
Python 2.3a1 (#1, Jan 2 2003, 22:36:29)
[GCC 2.95.3 20010315 (release)] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
Is it reproducible with other Solaris installation? If it is, I'll
submit a bug report.

Inyeol Lee...

- - --
http://mail.python.org/mailman/listinfo/python-list

- ------- End of Forwarded Message
Anders Qvist
2003-01-04 19:39:29 UTC
Permalink
Post by Laura Creighton
--------- Forwarded Message
Delivery-Date: Sat Jan 4 00:10:32 2003
Subject: 2.3a1 Solaris build trouble - make test failure
When I try 'make test' during building python2.3a1 with gcc2.95.3 in
Solaris8, it stops at 'test_signal' - it stays there until the process
is killed via kill -9.
This is a known problem, which occurs in the farm as well. We have as
yet no good leads. It has been suggested there is a Solaris bug. Neal
recommended the following patches:

http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fpatches%2F108528&zone_32=signal+%20hang%20%22Solaris%208%22&wholewords=on

http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fpatches%2F108827&zone_32=signal+%20hang%20%22Solaris%208%22&wholewords=on

Patches can be gotten from here:

http://sunsolve.sun.com/pub-cgi/show.pl?target=patches/patch-access

We have not yet had time to test these. Any result from applying these
patches will be appreciated.
Post by Laura Creighton
If running test_signal only, the message is;
$ python test_signal.py
+ sleep 2
starting pause() loop...
call pause()...
+ kill -5 6825
+ sleep 2
handlerA (5, <frame object at 0x173f18>)
pause() returned
call pause()...
+ kill -2 6825
+ sleep 2
handlerB (2, <frame object at 0x173f18>)
HandlerBCalled exception caught
call pause()...
+ kill -3 6825
KeyboardInterrupt (assume the alarm() went off)
$
My build system is;
$ uname -a
SunOS abbey 5.8 Generic_108528-07 sun4u sparc SUNW,Sun-Blade-1000
$ python
Python 2.3a1 (#1, Jan 2 2003, 22:36:29)
[GCC 2.95.3 20010315 (release)] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
Is it reproducible with other Solaris installation? If it is, I'll
submit a bug report.
Inyeol Lee...
- - --
http://mail.python.org/mailman/listinfo/python-list
- ------- End of Forwarded Message
_______________________________________________
snake-farm mailing list
http://lists.lysator.liu.se/mailman/listinfo/snake-farm
--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky
Neal Norwitz
2003-01-05 00:51:36 UTC
Permalink
Post by Anders Qvist
Post by Laura Creighton
When I try 'make test' during building python2.3a1 with gcc2.95.3 in
Solaris8, it stops at 'test_signal' - it stays there until the process
is killed via kill -9.
This is a known problem, which occurs in the farm as well. We have as
yet no good leads.
I just ran test_socket a bunch of times on our sun without problems.
I believe Martin von Loewis and Andrew Koenig also use a solaris box
fairly regularly. (Hopefully Martin is reading this list and can
confirm.)

So the problem may be limited to the snake farm and a few other boxes.
This may also apply to the AIX problem, since I've found no one else
who has the same problems. I've worked a fair amount on the AIX
problem. I have spent no time as of yet on Solaris. I'll try to
do some testing to see if I can find out more.

Neal
Guido van Rossum
2003-01-05 01:02:53 UTC
Permalink
Post by Neal Norwitz
Post by Anders Qvist
Post by Laura Creighton
When I try 'make test' during building python2.3a1 with gcc2.95.3 in
Solaris8, it stops at 'test_signal' - it stays there until the process
is killed via kill -9.
This is a known problem, which occurs in the farm as well. We have as
yet no good leads.
I just ran test_socket a bunch of times on our sun without problems.
I believe Martin von Loewis and Andrew Koenig also use a solaris box
fairly regularly. (Hopefully Martin is reading this list and can
confirm.)
So the problem may be limited to the snake farm and a few other boxes.
Most likely it's dependent on a particular combination of software
versions. It's worth trying to find out which versions the snake-farm
and inyeol's box have in common.
Post by Neal Norwitz
This may also apply to the AIX problem, since I've found no one else
who has the same problems. I've worked a fair amount on the AIX
problem. I have spent no time as of yet on Solaris. I'll try to
do some testing to see if I can find out more.
One problem with AIX is that almost all AIX users have little working
Unix knowledge. :-( (Perhaps because people with working Unix
knowlege know to avoid it. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)
Neal Norwitz
2003-01-05 18:00:10 UTC
Permalink
Post by Guido van Rossum
Post by Neal Norwitz
Post by Anders Qvist
Post by Laura Creighton
When I try 'make test' during building python2.3a1 with gcc2.95.3 in
Solaris8, it stops at 'test_signal' - it stays there until the process
is killed via kill -9.
This is a known problem, which occurs in the farm as well. We have as
yet no good leads.
I just ran test_socket a bunch of times on our sun without problems.
I believe Martin von Loewis and Andrew Koenig also use a solaris box
fairly regularly. (Hopefully Martin is reading this list and can
confirm.)
So the problem may be limited to the snake farm and a few other boxes.
Most likely it's dependent on a particular combination of software
versions. It's worth trying to find out which versions the snake-farm
and inyeol's box have in common.
Looks like I goofed. I thought this problem affected 2.2.2+ as well.
So I didn't spent any time hunting down the problem. However, the
problem does not affect 2.2.2.

2.2.2+ runs all tests to completion. There is a problem that
_socket.so isn't built due to libssl issues. I don't think this
changes anything although test_asynchat and test_socket fail.
test_queue succeeds. More on the significance of these tests in a
bit.

2.3 runs test_signal by itself ok. test_signal only hangs when run
with any one of these 4 tests: test_asynchat, test_logging,
test_queue, and test_socket.

When I run the 2.3 tests like:

./python -E -tt ./Lib/test/regrtest.py \
-x test_asynchat test_logging test_queue test_socket

everything works.

One thing these 4 tests share is threads. I arrived at this
conclusion by the following:

1. found one test that caused test_signal to hang (test_asynchat)
2. ran all tests -x test_asynchat, test_signal still hung
3. looked for possible issues, determined the likely problem
to be threading
4. ran all tests -x all modules which import thread*
(test_asynchat test_fork1 test_logging test_queue test_socket)
5. tried removing each of those, only test_fork1 runs ok with
test_signal

I built 2.3 on proton, 2.2 on fafner. I tried running the 2.2 and 2.3
tests on both machines. So it's possibly a build env't on proton,
but it's not a runtime env't since the problem is the same on both boxes.

Anybody have any clues? I'll continue to investigate.

Neal
Neal Norwitz
2003-01-05 19:48:28 UTC
Permalink
On Sun, Jan 05, 2003 at 01:00:10PM -0500, Neal Norwitz wrote:

[lots of stuff about test_signal hanging on Solaris]

This problem was introduced:

cvs log Python/thread_pthread.h

revision 2.39
date: 2002/03/17 09:53:51; author: loewis; state: Exp; lines: +119 -0
Patch #525532: Add support for POSIX semaphores.

If I disallow the use of semaphores by adding:

#undef USE_SEMAPHORES

at line 113 (after it is set if _POSIX_SEMAPHORES is set), it seems
to fix the problem. In those patches mentioned earlier, I remember
mention of semaphore functions. So maybe the reason why it works
on some boxes is that it's depends on whether certain patches are
installed.

Of course, it's possible the semaphore code is wrong. I'll post
a new bug and we can discuss there.

Neal
Neal Norwitz
2003-01-05 20:00:34 UTC
Permalink
Post by Neal Norwitz
[lots of stuff about test_signal hanging on Solaris]
Of course, it's possible the semaphore code is wrong. I'll post
a new bug and we can discuss there.
I've tried to put all the important and useful information in:

http://python.org/sf/662787

Neal
Guido van Rossum
2003-01-05 20:20:52 UTC
Permalink
[Inyeol]
Post by Neal Norwitz
Post by Guido van Rossum
Post by Neal Norwitz
Post by Anders Qvist
Post by Laura Creighton
When I try 'make test' during building python2.3a1 with
gcc2.95.3 in Solaris8, it stops at 'test_signal' - it stays
there until the process is killed via kill -9.
[Others]
Post by Neal Norwitz
Post by Guido van Rossum
Post by Neal Norwitz
Post by Anders Qvist
This is a known problem, which occurs in the farm as well. We
have as yet no good leads.
[Neal]
Post by Neal Norwitz
Post by Guido van Rossum
Post by Neal Norwitz
I just ran test_socket a bunch of times on our sun without
problems. I believe Martin von Loewis and Andrew Koenig also
use a solaris box fairly regularly. (Hopefully Martin is
reading this list and can confirm.)
So the problem may be limited to the snake farm and a few other boxes.
[Guido]
Post by Neal Norwitz
Post by Guido van Rossum
Most likely it's dependent on a particular combination of software
versions. It's worth trying to find out which versions the snake-farm
and inyeol's box have in common.
[Neal]
Post by Neal Norwitz
Looks like I goofed. I thought this problem affected 2.2.2+ as well.
So I didn't spent any time hunting down the problem. However, the
problem does not affect 2.2.2.
2.2.2+ runs all tests to completion. There is a problem that
_socket.so isn't built due to libssl issues. I don't think this
changes anything although test_asynchat and test_socket fail.
test_queue succeeds. More on the significance of these tests in a
bit.
2.3 runs test_signal by itself ok. test_signal only hangs when run
with any one of these 4 tests: test_asynchat, test_logging,
test_queue, and test_socket.
./python -E -tt ./Lib/test/regrtest.py \
-x test_asynchat test_logging test_queue test_socket
everything works.
One thing these 4 tests share is threads. I arrived at this
1. found one test that caused test_signal to hang (test_asynchat)
2. ran all tests -x test_asynchat, test_signal still hung
3. looked for possible issues, determined the likely problem
to be threading
4. ran all tests -x all modules which import thread*
(test_asynchat test_fork1 test_logging test_queue test_socket)> 5. tried removing each of those, only test_fork1 runs ok with
test_signal
I built 2.3 on proton, 2.2 on fafner. I tried running the 2.2 and 2.3
tests on both machines. So it's possibly a build env't on proton,
but it's not a runtime env't since the problem is the same on both boxes.
Anybody have any clues? I'll continue to investigate.
I was going to suggest threads as well -- threads and signals mix
notoriously poorly.

But what could be the difference between 2.2.2 and 2.3 here? Scan the
revisions of the pthread and thread code -- I think there were changes
related to threads.

Maybe dummy_thread has something to do with this? (I doubt it, but
who knows, maybe there are in fact no threads on that platform.)

Does Solaris have something like strace? That might give a clue
(compare a 2.2.2 and a 2.3 run).

I don't think changes in the Queue module or its unit tests are that
significant, so I think we have to look for changes in the thread
implementation.

That's all I can think of for now.

--Guido van Rossum (home page: http://www.python.org/~guido/)
Neal Norwitz
2003-01-05 20:35:17 UTC
Permalink
Post by Guido van Rossum
Does Solaris have something like strace? That might give a clue
(compare a 2.2.2 and a 2.3 run).
Yes, it's called truss on Solaris. I thought I tried running it and
it failed, but that must have been HPUX.

I'll attach the truss output to the bug report, but it doesn't provide
me with any more info. Perhaps, it will help someone else.

Neal
Inyeol Lee
2003-01-06 18:11:16 UTC
Permalink
Post by Guido van Rossum
Post by Neal Norwitz
Post by Anders Qvist
Post by Laura Creighton
When I try 'make test' during building python2.3a1 with gcc2.95.3 in
Solaris8, it stops at 'test_signal' - it stays there until the process
is killed via kill -9.
This is a known problem, which occurs in the farm as well. We have as
yet no good leads.
I just ran test_socket a bunch of times on our sun without problems.
I believe Martin von Loewis and Andrew Koenig also use a solaris box
fairly regularly. (Hopefully Martin is reading this list and can
confirm.)
So the problem may be limited to the snake farm and a few other boxes.
Most likely it's dependent on a particular combination of software
versions. It's worth trying to find out which versions the snake-farm
and inyeol's box have in common.
I built 2.3a1 with gcc 2.95 (gcc version 2.95.3 20010315 (release)) and
Solaris8 linker. I didn't use GNU binutil or glibc.

For comparison, I built another one with Solaris compiler (WorkShop
Compilers 5.0 98/12/15 C 5.0) instead of gcc, but the symptom is the
same.

Inyeol...
Neal Norwitz
2003-01-06 22:49:35 UTC
Permalink
Post by Inyeol Lee
Post by Guido van Rossum
Post by Neal Norwitz
Post by Anders Qvist
Post by Laura Creighton
When I try 'make test' during building python2.3a1 with gcc2.95.3 in
Solaris8, it stops at 'test_signal' - it stays there until the process
is killed via kill -9.
This is a known problem, which occurs in the farm as well. We have as
yet no good leads.
I just ran test_socket a bunch of times on our sun without problems.
I believe Martin von Loewis and Andrew Koenig also use a solaris box
fairly regularly. (Hopefully Martin is reading this list and can
confirm.)
So the problem may be limited to the snake farm and a few other boxes.
Most likely it's dependent on a particular combination of software
versions. It's worth trying to find out which versions the snake-farm
and inyeol's box have in common.
I built 2.3a1 with gcc 2.95 (gcc version 2.95.3 20010315 (release)) and
Solaris8 linker. I didn't use GNU binutil or glibc.
For comparison, I built another one with Solaris compiler (WorkShop
Compilers 5.0 98/12/15 C 5.0) instead of gcc, but the symptom is the
same.
Inyeol:

Can you report the output of:

showrev -p | cut -d' ' -f1-2 | egrep '(108528|108827)' | sort | uniq

You can post the information in the bug report here:

http://python.org/sf/662787

Thanks,
Neal

Loading...