Discussion:
Core dumps building Python on AIX
Guido van Rossum
2002-08-14 15:05:28 UTC
Permalink
The AIX builds fail in the phase where it's building the extensions
case $MAKEFLAGS in \
*-s*) CC='gcc' LDSHARED='../python/dist/src/Modules/ld_so_aix gcc -bI:Modules/python.exp' OPT='-DNDEBUG -g -O3 -Wall -Wstrict-prototypes' ./python -E ../python/dist/src/setup.py -q build;; \
*) CC='gcc' LDSHARED='../python/dist/src/Modules/ld_so_aix gcc -bI:Modules/python.exp' OPT='-DNDEBUG -g -O3 -Wall -Wstrict-prototypes' ./python -E ../python/dist/src/setup.py build;; \
esac
make: *** [sharedmods] Segmentation fault (core dumped)
Can someone with access to the snake farm try to find out what's going
on here?

One suggestion I have is to try to rebuild without -O3. It seems
Python is dumping core, and a broken optimizer is a likely cause.

(Thanks for the HP fix, BTW! Again the snake-farm has proved useful --
I've decided that the test had no business trying to open 1000 files
simultaneously, and changed it to 100.)

--Guido van Rossum (home page: http://www.python.org/~guido/)
Kalle Svensson
2002-08-14 17:50:30 UTC
Permalink
[Guido van Rossum]
Post by Guido van Rossum
The AIX builds fail in the phase where it's building the extensions
...
Post by Guido van Rossum
Can someone with access to the snake farm try to find out what's going
on here?
It dies trying to link the first dynamic module, in this case
struct.so.

running build
running build_ext
building 'struct' extension
creating build
creating build/temp.aix-4.2-2.3
gcc -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I. -I/tmp_mnt/mp/slaskdisk/tmp/kalle/python/dist/src/./Include -I/usr/local/include -I/tmp_mnt/mp/slaskdisk/tmp/kalle/python/dist/src/Include -I/tmp_mnt/mp/slaskdisk/tmp/kalle/python/dist/src -c /tmp_mnt/mp/slaskdisk/tmp/kalle/python/dist/src/Modules/structmodule.c -o build/temp.aix-4.2-2.3/structmodule.o
creating build/lib.aix-4.2-2.3
./Modules/ld_so_aix gcc -bI:Modules/python.exp build/temp.aix-4.2-2.3/structmodule.o -L/usr/local/lib -o build/lib.aix-4.2-2.3/struct.so
make: *** [sharedmods] Segmentation fault (core dumped)

I would have tried my poor gdb skills on the core file if it weren't
for the fact that we have no gdb on the platform. I'll try to build
one later.
Post by Guido van Rossum
One suggestion I have is to try to rebuild without -O3. It seems
Python is dumping core, and a broken optimizer is a likely cause.
No change with OPT="-g".

I uncommented the echo lines in ld_so_aix and got this:

./Modules/ld_so_aix gcc -bI:Modules/python.exp build/temp.aix-4.2-2.3/structmodule.o -L/usr/local/lib -o build/lib.aix-4.2-2.3/struct.so
ld_so_aix: Debug info section
-> output file : build/lib.aix-4.2-2.3/struct.so
-> import file : Modules/python.exp
-> export file : struct.exp
-> entry point : initstruct
-> object files: build/temp.aix-4.2-2.3/structmodule.o
-> CC arguments: build/temp.aix-4.2-2.3/structmodule.o -L/usr/local/lib
./Modules/makexp_aix struct.exp build/lib.aix-4.2-2.3/struct.so build/temp.aix-4.2-2.3/structmodule.o
gcc -Wl,-einitstruct -Wl,-bE:struct.exp -Wl,-bI:Modules/python.exp -Wl,-bhalt:4 -Wl,-bM:SRE -Wl,-T512 -Wl,-H512 -lm -o build/lib.aix-4.2-2.3/struct.so build/temp.aix-4.2-2.3/structmodule.o -L/usr/local/lib
make: *** [sharedmods] Segmentation fault (core dumped)

Peace,
Kalle
- --
Kalle Svensson, http://www.juckapan.org/~kalle/
Student, root and saint in the Church of Emacs.
Guido van Rossum
2002-08-14 18:30:55 UTC
Permalink
Post by Kalle Svensson
Post by Guido van Rossum
The AIX builds fail in the phase where it's building the extensions
...
Post by Guido van Rossum
Can someone with access to the snake farm try to find out what's going
on here?
It dies trying to link the first dynamic module, in this case
struct.so.
running build
running build_ext
building 'struct' extension
creating build
creating build/temp.aix-4.2-2.3
gcc -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I. -I/tmp_mnt/mp/slaskdisk/tmp/kalle/python/dist/src/./Include -I/usr/local/include -I/tmp_mnt/mp/slaskdisk/tmp/kalle/python/dist/src/Include -I/tmp_mnt/mp/slaskdisk/tmp/kalle/python/dist/src -c /tmp_mnt/mp/slaskdisk/tmp/kalle/python/dist/src/Modules/structmodule.c -o build/temp.aix-4.2-2.3/structmodule.o
creating build/lib.aix-4.2-2.3
./Modules/ld_so_aix gcc -bI:Modules/python.exp build/temp.aix-4.2-2.3/structmodule.o -L/usr/local/lib -o build/lib.aix-4.2-2.3/struct.so
make: *** [sharedmods] Segmentation fault (core dumped)
I would have tried my poor gdb skills on the core file if it weren't
for the fact that we have no gdb on the platform. I'll try to build
one later.
Post by Guido van Rossum
One suggestion I have is to try to rebuild without -O3. It seems
Python is dumping core, and a broken optimizer is a likely cause.
No change with OPT="-g".
./Modules/ld_so_aix gcc -bI:Modules/python.exp build/temp.aix-4.2-2.3/structmodule.o -L/usr/local/lib -o build/lib.aix-4.2-2.3/struct.so
ld_so_aix: Debug info section
-> output file : build/lib.aix-4.2-2.3/struct.so
-> import file : Modules/python.exp
-> export file : struct.exp
-> entry point : initstruct
-> object files: build/temp.aix-4.2-2.3/structmodule.o
-> CC arguments: build/temp.aix-4.2-2.3/structmodule.o -L/usr/local/lib
./Modules/makexp_aix struct.exp build/lib.aix-4.2-2.3/struct.so build/temp.aix-4.2-2.3/structmodule.o
gcc -Wl,-einitstruct -Wl,-bE:struct.exp -Wl,-bI:Modules/python.exp -Wl,-bhalt:4 -Wl,-bM:SRE -Wl,-T512 -Wl,-H512 -lm -o build/lib.aix-4.2-2.3/struct.so build/temp.aix-4.2-2.3/structmodule.o -L/usr/local/lib
make: *** [sharedmods] Segmentation fault (core dumped)
Hmm... What does "file core" say? (On Linux this tells us the name
of the program that dumps core. It seems that it could be gcc. :-( )

I don't see a '-fPIC' option in the gcc invocation that created
structmodule.o, to create position-independent code. Maybe you can
try again after adding this? (If that works, we know we have to fix
configure.in to add this to CCSHARED when the platform is AIX.)

--Guido van Rossum (home page: http://www.python.org/~guido/)
Kalle Svensson
2002-08-14 20:18:22 UTC
Permalink
[me]
Post by Guido van Rossum
make: *** [sharedmods] Segmentation fault (core dumped)
[Guido van Rossum]
Post by Guido van Rossum
Hmm... What does "file core" say? (On Linux this tells us the name
of the program that dumps core. It seems that it could be gcc. :-( )
core: data or International Language text

I tried looking around a bit with hexl-mode, but found nothing I
understood. I found the string 'python' at 0x06BC, but I guess that
could mean anything. I'll work on getting a gdb.
Post by Guido van Rossum
I don't see a '-fPIC' option in the gcc invocation that created
structmodule.o, to create position-independent code. Maybe you can
try again after adding this? (If that works, we know we have to fix
configure.in to add this to CCSHARED when the platform is AIX.)
Sadly, it made no difference.

Peace,
Kalle
- --
Kalle Svensson, http://www.juckapan.org/~kalle/
Student, root and saint in the Church of Emacs.
Kalle Svensson
2002-08-15 01:23:59 UTC
Permalink
[me, core file from build process on AIX]
Post by Kalle Svensson
I tried looking around a bit with hexl-mode, but found nothing I
understood. I found the string 'python' at 0x06BC, but I guess that
could mean anything. I'll work on getting a gdb.
Okay, I've built a gdb now. It seems to be a python core file.

Here are the first ten levels from 'bt':
#0 0xd01918f0 in Py_InitModule4 () at modsupport.c:35
#1 0xd0190870 in initstruct () at structmodule.c:1508
#2 0x100f2e90 in _PyImport_LoadDynamicModule (name=0x0, pathname=0x0,
fp=0x2ff20f18) at importdl.c:53
#3 0x10067160 in imp_load_dynamic (self=0x0, args=0x203acecc) at import.c:2377
#4 0x100a101c in PyCFunction_Call (func=0x202be46c, arg=0x203acecc, kw=0x0)
at methodobject.c:80
#5 0x100462a4 in eval_frame (f=0x203e8244) at ceval.c:1971
#6 0x10048644 in PyEval_EvalCodeEx (co=0x202bc760, globals=0x202632d4,
locals=0x0, args=0x203e1acc, argcount=2, kws=0x203e1ad4, kwcount=0,
defs=0x0, defcount=0, closure=0x0) at ceval.c:2552
#7 0x1004b110 in fast_function (func=0x202e4b8c, pp_stack=0x2ff211d8, n=2,
na=2, nk=0) at ceval.c:3137
#8 0x10046490 in eval_frame (f=0x203e1974) at ceval.c:1991
#9 0x10048644 in PyEval_EvalCodeEx (co=0x202d8c60, globals=0x202d3a44,
locals=0x0, args=0x203177d8, argcount=1, kws=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at ceval.c:2552

What am I looking for?

Peace,
Kalle
- --
Kalle Svensson, http://www.juckapan.org/~kalle/
Student, root and saint in the Church of Emacs.
Guido van Rossum
2002-08-15 02:37:37 UTC
Permalink
Post by Kalle Svensson
Okay, I've built a gdb now. It seems to be a python core file.
#0 0xd01918f0 in Py_InitModule4 () at modsupport.c:35
#1 0xd0190870 in initstruct () at structmodule.c:1508
#2 0x100f2e90 in _PyImport_LoadDynamicModule (name=0x0, pathname=0x0,
fp=0x2ff20f18) at importdl.c:53
#3 0x10067160 in imp_load_dynamic (self=0x0, args=0x203acecc) at import.c:2377
#4 0x100a101c in PyCFunction_Call (func=0x202be46c, arg=0x203acecc, kw=0x0)
at methodobject.c:80
#5 0x100462a4 in eval_frame (f=0x203e8244) at ceval.c:1971
#6 0x10048644 in PyEval_EvalCodeEx (co=0x202bc760, globals=0x202632d4,
locals=0x0, args=0x203e1acc, argcount=2, kws=0x203e1ad4, kwcount=0,
defs=0x0, defcount=0, closure=0x0) at ceval.c:2552
#7 0x1004b110 in fast_function (func=0x202e4b8c, pp_stack=0x2ff211d8, n=2,
na=2, nk=0) at ceval.c:3137
#8 0x10046490 in eval_frame (f=0x203e1974) at ceval.c:1991
#9 0x10048644 in PyEval_EvalCodeEx (co=0x202d8c60, globals=0x202d3a44,
locals=0x0, args=0x203177d8, argcount=1, kws=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at ceval.c:2552
What am I looking for?
The line numbers and function names make sense, but the argument
values don't. Can you rebuild without -O3?

But I'm confused. Earlier it looked like the makexp_aix script was
failing, but apparently that's not the case here, because it doesn't
invoke Python.

I'd like to see more precisely what the Makefile is doing at this
point. I'm *guessing* that since it's calling Python, that must meant
it's running setup.py. The struct module is the first module that's
built, so that makes sense as well: after a successful build, setup.py
tries to import the module it just built, and apparently it dies in
the module initialization of the just-loaded module.

So it sounds like there's a problem with dynamically loaded modules...

Have you scanned Misc/AIX-NOTES for clues? Have you got any
understanding of AIX? Maybe you understand what makexp_aix is trying
to do, and how it could fail when using GCC? (I think that when it
was written, the AIX compiler was generally used.)

Alternatively, you may try to enable most modules statically in the
Modules/Setup file, and forget about dynamically loading of
extensions. But that seems a shame, and should be done as a last
resort only.

--Guido van Rossum (home page: http://www.python.org/~guido/)
Neal Norwitz
2002-08-15 02:54:15 UTC
Permalink
Post by Guido van Rossum
Alternatively, you may try to enable most modules statically in the
Modules/Setup file, and forget about dynamically loading of
extensions. But that seems a shame, and should be done as a last
resort only.
From what I've seen so far. AIX uses a special functions to do
the dynamic loading in Python/dynload_aix.c. This was 'necessary'
prior to AIX 4.2 (http://www.faqs.org/faqs/aix-faq/part4/section-21.html).

I *think* we should try to get rid of this mechanism and use the generic
versions of dlopen, dlsym, dlclose from dynload_shlib. I don't use AIX
much any more, so this may not be the best approach.

To support earlier versions of AIX (3.2.5 & 4.1.x), there is a dl library
which I believe works. The dl library is also mentioned in the FAQ
above.

To make this happen, the special casing for AIX in configure.in would
have to be removed. Probably need some other magic in configure to support
the 3rd party dl library.

AIX 4.2 is pretty old and I don't think IBM supports it anymore.
It looks like IBM is supporting 5.1 and 4.3. I think 4.2 came
out around 1997 (based on memory).

Neal

Loading...