Further adventures in the land of 64-bit linux

Discussion:

Further adventures in the land of 64-bit linux

Anders Qvist

2002-09-13 22:02:37 UTC

The next problem in the land of 64bit linux seems to be test_grp:

[ 23:15 ] - ./python -u ../python/dist/src/Lib/test/regrtest.py -v -s
test_grp
test_getgrgid (test.test_grp.GroupDatabaseTestCase) ... zsh: segmentation fault ./python -u
../python/dist/src/Lib/test/regrtest.py -v -s

or, shorter:

[ 23:18 ] - ./python -u
Python 2.3a0 (#3, Sep 12 2002, 19:11:35)
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import grp
>>> g = grp.getgrall()
zsh: segmentation fault ./python -u

or, in gdb:

>>> import grp
>>> grp.getgrall()

Program received signal SIGSEGV, Segmentation fault.
strlen () at ../sysdeps/alpha/strlen.S:45
45 ../sysdeps/alpha/strlen.S: Filen eller katalogen finns inte.
Current language: auto; currently asm
(gdb) bt
#0 strlen () at ../sysdeps/alpha/strlen.S:45
#1 0x12004624c in PyString_FromString (str=0x0)
at ../python/dist/src/Objects/stringobject.c:111
#2 0x20000d652ac in mkgrent (p=0x200008b31b0)
at /mp/slaskdisk/tmp/quest/python/dist/src/Modules/grpmodule.c:61
#3 0x20000d653f8 in grp_getgrall (self=0x0, args=0x20000097325)
at /mp/slaskdisk/tmp/quest/python/dist/src/Modules/grpmodule.c:115
#4 0x1200d9328 in PyCFunction_Call (func=0x20000093fd0, arg=0x20000025048,
kw=0x0) at ../python/dist/src/Objects/methodobject.c:101
#5 0x120082cb4 in call_function (pp_stack=0x11ffff530, oparg=619301)
at ../python/dist/src/Python/ceval.c:3228
[...]

(gdb) frame 2
#2 0x20000d652ac in mkgrent (p=0x200008b31b0)
at /mp/slaskdisk/tmp/quest/python/dist/src/Modules/grpmodule.c:61
61 SET(setIndex++, PyString_FromString(p->gr_passwd));
Current language: auto; currently c
(gdb) print member
$1 = (char **) 0x1202921b8
(gdb) print p
$2 = (struct group *) 0x200008b31b0
(gdb) print p->gr_passwd
$3 = 0x0
(gdb) ptype p
type = struct group {
char *gr_name;
char *gr_passwd;
__gid_t gr_gid;
char **gr_mem;
} *
(gdb) frame 1
#1 0x12004624c in PyString_FromString (str=0x0)
at ../python/dist/src/Objects/stringobject.c:111
111 size = strlen(str);
(gdb) list
106 {
107 register size_t size;
108 register PyStringObject *op;
109
110 assert(str != NULL);
111 size = strlen(str);
112 if (size > INT_MAX) {
113 PyErr_SetString(PyExc_OverflowError,
114 "string is too long for a Python string");
115 return NULL;

If I understand things correctly, this means that the previous line in
stringobject.c (110) somehow fails to do its job, or we would have
gotten an assert error. Is 0x0 NULL on linux/alpha?

--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky

Guido van Rossum

2002-09-14 00:42:03 UTC

I'm cc'ing Martin von Loewis, since I believe he checked in this
version of grpmodule.c.

[Anders Qvist]
> The next problem in the land of 64bit linux seems to be test_grp:
>
> [ 23:15 ] - ./python -u ../python/dist/src/Lib/test/regrtest.py -v -s
> test_grp
> test_getgrgid (test.test_grp.GroupDatabaseTestCase) ... zsh: segmentation fault ./python -u
> ../python/dist/src/Lib/test/regrtest.py -v -s
>
> or, shorter:
>
> [ 23:18 ] - ./python -u
> Python 2.3a0 (#3, Sep 12 2002, 19:11:35)
> [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import grp
> >>> g = grp.getgrall()
> zsh: segmentation fault ./python -u
>
> or, in gdb:
>
> >>> import grp
> >>> grp.getgrall()
>
> Program received signal SIGSEGV, Segmentation fault.
> strlen () at ../sysdeps/alpha/strlen.S:45
> 45 ../sysdeps/alpha/strlen.S: Filen eller katalogen finns inte.
> Current language: auto; currently asm
> (gdb) bt
> #0 strlen () at ../sysdeps/alpha/strlen.S:45
> #1 0x12004624c in PyString_FromString (str=0x0)
> at ../python/dist/src/Objects/stringobject.c:111
> #2 0x20000d652ac in mkgrent (p=0x200008b31b0)
> at /mp/slaskdisk/tmp/quest/python/dist/src/Modules/grpmodule.c:61
> #3 0x20000d653f8 in grp_getgrall (self=0x0, args=0x20000097325)
> at /mp/slaskdisk/tmp/quest/python/dist/src/Modules/grpmodule.c:115
> #4 0x1200d9328 in PyCFunction_Call (func=0x20000093fd0, arg=0x20000025048,
> kw=0x0) at ../python/dist/src/Objects/methodobject.c:101
> #5 0x120082cb4 in call_function (pp_stack=0x11ffff530, oparg=619301)
> at ../python/dist/src/Python/ceval.c:3228
> [...]
>
> (gdb) frame 2
> #2 0x20000d652ac in mkgrent (p=0x200008b31b0)
> at /mp/slaskdisk/tmp/quest/python/dist/src/Modules/grpmodule.c:61
> 61 SET(setIndex++, PyString_FromString(p->gr_passwd));
> Current language: auto; currently c
> (gdb) print member
> $1 = (char **) 0x1202921b8
> (gdb) print p
> $2 = (struct group *) 0x200008b31b0
> (gdb) print p->gr_passwd
> $3 = 0x0
> (gdb) ptype p
> type = struct group {
> char *gr_name;
> char *gr_passwd;
> __gid_t gr_gid;
> char **gr_mem;
> } *
> (gdb) frame 1
> #1 0x12004624c in PyString_FromString (str=0x0)
> at ../python/dist/src/Objects/stringobject.c:111
> 111 size = strlen(str);
> (gdb) list
> 106 {
> 107 register size_t size;
> 108 register PyStringObject *op;
> 109
> 110 assert(str != NULL);
> 111 size = strlen(str);
> 112 if (size > INT_MAX) {
> 113 PyErr_SetString(PyExc_OverflowError,
> 114 "string is too long for a Python string");
> 115 return NULL;
>
> If I understand things correctly, this means that the previous line in
> stringobject.c (110) somehow fails to do its job, or we would have
> gotten an assert error. Is 0x0 NULL on linux/alpha?

Yes, NULL is 0x0 even on Linux/Alpha.

But Python is compiled with -DNDEBUG (unless you request a debug build
with "configure --with-pydebug") so the assert() is not executed.

I'm guessing that it's possible that the gr_passwd slot can be NULL on
that platform; the code doesn't expect this.

Martin?

--Guido van Rossum (home page: http://www.python.org/~guido/)

Anders Qvist

2002-09-14 07:30:22 UTC

On Fri, Sep 13, 2002 at 08:42:03PM -0400, Guido van Rossum wrote:
> But Python is compiled with -DNDEBUG (unless you request a debug build
> with "configure --with-pydebug") so the assert() is not executed.

Maybe the farm should compile with pydebugging?
--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky

Guido van Rossum

2002-09-14 17:06:50 UTC

> On Fri, Sep 13, 2002 at 08:42:03PM -0400, Guido van Rossum wrote:
> > But Python is compiled with -DNDEBUG (unless you request a debug build
> > with "configure --with-pydebug") so the assert() is not executed.
>
> Maybe the farm should compile with pydebugging?

Ideally, there should be separate compilation runs with and without
debugging. Some things only break in a non-debug run, some bugs are
only detected (or detected earlier or diagnosed clearlier) in a
debugging run.

It might be worth it to try more variations too, e.g. the 2-byte and
4-byte Unicode options.

--Guido van Rossum (home page: http://www.python.org/~guido/)

Anders Qvist

2002-09-14 18:55:27 UTC

On Sat, Sep 14, 2002 at 01:06:50PM -0400, Guido van Rossum wrote:
> > On Fri, Sep 13, 2002 at 08:42:03PM -0400, Guido van Rossum wrote:
> > > But Python is compiled with -DNDEBUG (unless you request a debug build
> > > with "configure --with-pydebug") so the assert() is not executed.
> >
> > Maybe the farm should compile with pydebugging?
>
> Ideally, there should be separate compilation runs with and without
> debugging. Some things only break in a non-debug run, some bugs are
> only detected (or detected earlier or diagnosed clearlier) in a
> debugging run.
>
> It might be worth it to try more variations too, e.g. the 2-byte and
> 4-byte Unicode options.

I've been thinking about this. We might want to have a separate
project in xenofarm that cycles through "odd" builds. On one pass, we
build wit debugging, on the next we build with UCS4, and so on.
--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky

Per Cederqvist

2002-09-14 19:06:48 UTC

Anders Qvist <***@lysator.liu.se> writes:

> On Sat, Sep 14, 2002 at 01:06:50PM -0400, Guido van Rossum wrote:
> > > On Fri, Sep 13, 2002 at 08:42:03PM -0400, Guido van Rossum wrote:
> > > > But Python is compiled with -DNDEBUG (unless you request a debug build
> > > > with "configure --with-pydebug") so the assert() is not executed.
> > >
> > > Maybe the farm should compile with pydebugging?
> >
> > Ideally, there should be separate compilation runs with and without
> > debugging. Some things only break in a non-debug run, some bugs are
> > only detected (or detected earlier or diagnosed clearlier) in a
> > debugging run.
> >
> > It might be worth it to try more variations too, e.g. the 2-byte and
> > 4-byte Unicode options.
>
> I've been thinking about this. We might want to have a separate
> project in xenofarm that cycles through "odd" builds. On one pass, we
> build wit debugging, on the next we build with UCS4, and so on.
> --
> Anders "Quest" Qvist

Why complicate things like that? You can build the same version in
several different ways, like this:

test: default ./create-response.sh
test: coverage ./create-response.sh --cfg "--with-debug-calls"
test: valgrind-cov ./create-response.sh --cfg "CC='gcc -V 3.0.4' --without-optimization --with-valgrind --with-debug-calls --disable-malloc-guards"
test: valgrind-std ./create-response.sh --cfg "CC='gcc -V 3.0.4' --without-optimization --with-valgrind"

(Of course, you have to modifiy create-response so that it handles the
--cfg option;
http://www.lysator.liu.se/xenofarm/lyskom-server/builds/latest
contains the example I currently use.)

If I'm not misinformed, you can now have test lines that are run on a
specific host only:

test: default ./create-response.sh
test-moria: coverage ./create-response.sh --cfg "--with-debug-calls"

This would run the default test on all hosts, and the coverage version
on only moria. This way the fast and/or dedicated computers could
build many variants, while the computers that are also used for other
purposes can run only the default build.

/ceder

Anders Qvist

2002-09-14 19:28:26 UTC

On Sat, Sep 14, 2002 at 09:06:48PM +0200, Per Cederqvist wrote:
> Anders Qvist <***@lysator.liu.se> writes:
>
> > I've been thinking about this. We might want to have a separate
> > project in xenofarm that cycles through "odd" builds. On one pass, we
> > build wit debugging, on the next we build with UCS4, and so on.
> > --
> > Anders "Quest" Qvist
>
> Why complicate things like that? You can build the same version in
> several different ways, like this:
>
> test: default ./create-response.sh
> test: coverage ./create-response.sh --cfg "--with-debug-calls"
> test: valgrind-cov ./create-response.sh --cfg "CC='gcc -V 3.0.4' --without-optimization --with-valgrind --with-debug-calls --disable-malloc-guards"
> test: valgrind-std ./create-response.sh --cfg "CC='gcc -V 3.0.4' --without-optimization --with-valgrind"
>
> (Of course, you have to modifiy create-response so that it handles the
> --cfg option;
> http://www.lysator.liu.se/xenofarm/lyskom-server/builds/latest
> contains the example I currently use.)
>
> If I'm not misinformed, you can now have test lines that are run on a
> specific host only:

Yes, but the problem is that if I add, say 10 such test lines, it'll
take too long to run that project, and the other projects will be
delayed too long (might be more than a day on slower machines).

> test: default ./create-response.sh
> test-moria: coverage ./create-response.sh --cfg "--with-debug-calls"
>
> This would run the default test on all hosts, and the coverage version
> on only moria. This way the fast and/or dedicated computers could
> build many variants, while the computers that are also used for other
> purposes can run only the default build.

Selective builds is another way to do it, but that's risky. It may
mean that those options are never tested on a particular arch.
--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky

Michael Hudson

2002-09-23 11:17:10 UTC

Guido van Rossum <***@python.org> writes:

> It might be worth it to try more variations too, e.g. the 2-byte and
> 4-byte Unicode options.

And then someone might finally get around to working out why
test_unicode fails in 4-byte mode.

Also, building some release22-maints before a 2.2.2 gets released
would seem to be a good idea.

Cheers,
M.

--
And then the character-only displays went away (leading to
increasingly silly graphical effects and finally to ads on
web pages). -- John W. Baxter, comp.lang.python

Guido van Rossum

2002-09-23 13:30:51 UTC

> Also, building some release22-maints before a 2.2.2 gets released
> would seem to be a good idea.

That would be an *extremely* good idea.

Anders, can I make a special pleading to ask you to try Python 2.2.x
on the snake farm? All you need to do is check out the
release22-maint branch instead of the trunk. If you can only do one
version, 2.2.x would be the version of choice. See

http://mail.python.org/pipermail/python-dev/2002-September/028806.html

--Guido van Rossum (home page: http://www.python.org/~guido/)

Mats Wichmann

2002-09-23 14:36:52 UTC

At 09:30 AM 9/23/2002 -0400, Guido van Rossum wrote:
>> Also, building some release22-maints before a 2.2.2 gets released
>> would seem to be a good idea.
>
>That would be an *extremely* good idea.
>
>Anders, can I make a special pleading to ask you to try Python 2.2.x
>on the snake farm? All you need to do is check out the
>release22-maint branch instead of the trunk. If you can only do one
>version, 2.2.x would be the version of choice. See
>
> http://mail.python.org/pipermail/python-dev/2002-September/028806.html

I have a *limited* amount of time this week to help look
at checkins, if someone can prepare the tedious list.

I will do an ia64 build today, as a crosscheck to what the
snake farm can do.

Mats

Tim Peters

2002-09-14 20:34:04 UTC

[Anders Qvist]
> Maybe the farm should compile with pydebugging?

Testing a release build is crucial, because that's what end users use, and
it's the only way to provoke platform compiler optimization bugs.

Of all the build variations beyond that, --with-pydebug is by far the most
valuable to try. In the 2.3 line, it's especially valuable to do a debug
build because it triggers new memory management code in Python that can
catch many kinds of insanity (reading/writing out of bounds, reading
uninitialized heap memory, reading heap memory that's already been free()ed,
...) early.

Which build variations beyond that to try may depend on how much resource
the PBF is willing to throw at them.

Laura Creighton

2002-09-15 07:40:08 UTC

> [Anders Qvist]
> > Maybe the farm should compile with pydebugging?
>
> Testing a release build is crucial, because that's what end users use, and
> it's the only way to provoke platform compiler optimization bugs.
>
> Of all the build variations beyond that, --with-pydebug is by far the most
> valuable to try. In the 2.3 line, it's especially valuable to do a debug
> build because it triggers new memory management code in Python that can
> catch many kinds of insanity (reading/writing out of bounds, reading
> uninitialized heap memory, reading heap memory that's already been free()ed,
> ...) early.
>
> Which build variations beyond that to try may depend on how much resource
> the PBF is willing to throw at them.

Ah, they are Lysator's resoures, which means that Lysator and not the
PBF gets to decide how the machines get used. We now have a procedure
for adding machines to run test builds outside of Lysator, so perhaps
we will get more machines that way. But the bottleneck is the time of
Lysator volunteers, and the cpu cycles of machines that Lysator uses
for other purposes.

Laura

Anders Qvist

2002-09-14 12:08:43 UTC

On Fri, Sep 13, 2002 at 08:42:03PM -0400, Guido van Rossum wrote:
> I'm cc'ing Martin von Loewis, since I believe he checked in this
> version of grpmodule.c.
>
> [Anders Qvist]
[snip]
> > [ 23:18 ] - ./python -u
> > Python 2.3a0 (#3, Sep 12 2002, 19:11:35)
> > [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>> import grp
> > >>> g = grp.getgrall()
> > zsh: segmentation fault ./python -u

> > If I understand things correctly, this means that the previous line in
> > stringobject.c (110) somehow fails to do its job, or we would have
> > gotten an assert error. Is 0x0 NULL on linux/alpha?
>
> Yes, NULL is 0x0 even on Linux/Alpha.
>
> But Python is compiled with -DNDEBUG (unless you request a debug build
> with "configure --with-pydebug") so the assert() is not executed.
>
> I'm guessing that it's possible that the gr_passwd slot can be NULL on
> that platform; the code doesn't expect this.

With --with-pydebug

test_glob
test_global
test_grp
python: ../python/dist/src/Objects/stringobject.c:110: PyString_FromString: Assertion `str != ((void *)0)' failed.
make: *** [test] Avbruten (SIGABRT)

I think we may want pydebug, so I'll turn it on on all hosts.

--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky

Martin v. Löwis

2002-09-16 14:01:31 UTC

Anders Qvist <***@lysator.liu.se> writes:

> > I'm guessing that it's possible that the gr_passwd slot can be NULL on
> > that platform; the code doesn't expect this.

Can you please try the attached patch?

Thanks,
Martin

Index: grpmodule.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/grpmodule.c,v
retrieving revision 2.19
diff -u -r2.19 grpmodule.c
--- grpmodule.c 2 Aug 2002 02:27:13 -0000 2.19
+++ grpmodule.c 16 Sep 2002 13:58:56 -0000
@@ -58,7 +58,12 @@

#define SET(i,val) PyStructSequence_SET_ITEM(v, i, val)
SET(setIndex++, PyString_FromString(p->gr_name));
- SET(setIndex++, PyString_FromString(p->gr_passwd));
+ if (p->gr_passwd)
+ SET(setIndex++, PyString_FromString(p->gr_passwd));
+ else {
+ SET(setIndex++, Py_None);
+ Py_INCREF(Py_None);
+ }
SET(setIndex++, PyInt_FromLong((long) p->gr_gid));
SET(setIndex++, w);
#undef SET

Anders Qvist

2002-09-16 21:02:56 UTC

On Mon, Sep 16, 2002 at 04:01:31PM +0200, Martin v. Löwis wrote:
> Anders Qvist <***@lysator.liu.se> writes:
>
> > > I'm guessing that it's possible that the gr_passwd slot can be NULL on
> > > that platform; the code doesn't expect this.
>
> Can you please try the attached patch?
> RCS file: /cvsroot/python/python/dist/src/Modules/grpmodule.c,v
> retrieving revision 2.19
> diff -u -r2.19 grpmodule.c
> --- grpmodule.c 2 Aug 2002 02:27:13 -0000 2.19
> +++ grpmodule.c 16 Sep 2002 13:58:56 -0000
> @@ -58,7 +58,12 @@
[snip]

Like a charm.

Next one to "fail" is test_longexp, which bloats the interpreter
massively. Simplified, it looks like this:

>>> l = eval ("[" + "2," * 65580 + "]")

./python ../python/dist/src/Lib/test/test_longexp.py & ; pid=$\! ;
while /bin/true ; do ps aux | grep $pid ; sleep 1 ; done

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
quest 30018 0.0 0.3 3792 224 pts/5 DN 22:14 0:00 ./python ../pytho
quest 30018 10.3 4.1 10224 2536 pts/5 DN 22:14 0:00 ./python ../pytho
quest 30018 43.8 15.7 17032 9640 pts/5 RN 22:14 0:00 ./python ../pytho
quest 30018 63.7 35.1 29392 21488 pts/5 RN 22:14 0:01 ./python ../pytho
quest 30018 59.2 55.4 42000 33912 pts/5 RN 22:14 0:02 ./python ../pytho
quest 30018 32.5 65.9 64840 40376 pts/5 DN 22:14 0:04 ./python ../pytho

The behaviour seems exponential, for nothing untoward is apparent with
small number of elements in the list. As numbers grow into tens of
thousands, the memory usage starts to balloon. Technically, the
machine could prolly pass the test, had it not been for its shortage
of memory.

[ 22:40 ] - ./python
Python 2.3a0 (#2, Sep 16 2002, 17:37:08)
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> l = eval ("[" + "2," * 30000 + "]")
[47434 refs]

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
30153 quest 5 0 24952 24M 2464 S 23M 0.0 20.3 0:04 python

It's late, so I'll have to continue the hunt later. Here's the
backtrace from gdb, but it doesn't seem to point to what's wrong.

#0 0x12011e3ec in PyNode_AddChild (n1=0x200026166c8, type=301, str=0x0,
lineno=1) at ../python/dist/src/Parser/node.c:95
#1 0x12011ea9c in push (s=0x1202eb798, type=301, d=0x12026daa0, newstate=1,
lineno=1) at ../python/dist/src/Parser/parser.c:126
#2 0x12011eeb4 in PyParser_AddToken (ps=0x1202eb798, type=2,
str=0x2000260cc98 "2", lineno=1, expected_ret=0x11ffff2cc)
at ../python/dist/src/Parser/parser.c:234
#3 0x120010e08 in parsetok (tok=0x1202e9b98, g=0x12026e788, start=258,
err_ret=0x11ffff2a8, flags=0) at ../python/dist/src/Parser/parsetok.c:157
#4 0x120010a10 in PyParser_ParseStringFlagsFilename (
s=0x200008b604c "[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2"...,
filename=0x0, g=0x12026e788, start=258, err_ret=0x11ffff2a8, flags=0)
at ../python/dist/src/Parser/parsetok.c:56
#5 0x1200108d8 in PyParser_ParseStringFlags (
s=0x200008b604c "[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2"...,
g=0x12026e788, start=258, err_ret=0x11ffff2a8, flags=0)
at ../python/dist/src/Parser/parsetok.c:31
#6 0x1200f7e94 in PyParser_SimpleParseStringFlags (
str=0x200008b604c "[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2"...,
start=258, flags=0) at ../python/dist/src/Python/pythonrun.c:1183
#7 0x1200f7508 in PyRun_StringFlags (
str=0x200008b604c "[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2"...,
start=258, globals=0x120285780, locals=0x120285780, flags=0x11ffff360)
at ../python/dist/src/Python/pythonrun.c:1040
#8 0x1200a3c60 in builtin_eval (self=0x0, args=0x20000048998)
at ../python/dist/src/Python/bltinmodule.c:467
---Type <return> to continue, or q <return> to quit---
#9 0x12012ef5c in PyCFunction_Call (func=0x2000002ef18, arg=0x20000048998,
kw=0x0) at ../python/dist/src/Objects/methodobject.c:80
#10 0x1200c0ab8 in call_function (pp_stack=0x11ffff468, oparg=1)
at ../python/dist/src/Python/ceval.c:3228
#11 0x1200bab40 in eval_frame (f=0x1202908a0)
at ../python/dist/src/Python/ceval.c:1993
#12 0x1200bda30 in PyEval_EvalCodeEx (co=0x20000026818, globals=0x120285780,
locals=0x120285780, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at ../python/dist/src/Python/ceval.c:2538
#13 0x1200afdf4 in PyEval_EvalCode (co=0x20000026818, globals=0x120285780,
locals=0x120285780) at ../python/dist/src/Python/ceval.c:478
#14 0x1200f77c8 in run_node (n=0x2000003f038,
filename=0x11ffffb6f "../python/dist/src/Lib/test/test_longexp.py",
globals=0x120285780, locals=0x120285780, flags=0x11ffff898)
at ../python/dist/src/Python/pythonrun.c:1083
#15 0x1200f7714 in run_err_node (n=0x2000003f038,
filename=0x11ffffb6f "../python/dist/src/Lib/test/test_longexp.py",
globals=0x120285780, locals=0x120285780, flags=0x11ffff898)
at ../python/dist/src/Python/pythonrun.c:1070
#16 0x1200f7698 in PyRun_FileExFlags (fp=0x12027c560,
filename=0x11ffffb6f "../python/dist/src/Lib/test/test_longexp.py",
start=257, globals=0x120285780, locals=0x120285780, closeit=1,
flags=0x11ffff898) at ../python/dist/src/Python/pythonrun.c:1061
#17 0x1200f57e0 in PyRun_SimpleFileExFlags (fp=0x12027c560,
filename=0x11ffffb6f "../python/dist/src/Lib/test/test_longexp.py",
closeit=1, flags=0x11ffff898) at ../python/dist/src/Python/pythonrun.c:692
#18 0x1200f4bc8 in PyRun_AnyFileExFlags (fp=0x12027c560,
filename=0x11ffffb6f "../python/dist/src/Lib/test/test_longexp.py",
closeit=1, flags=

Could it be these lines in parsetok.c (138-139)?

len = b - a; /* XXX this may compute NULL - NULL */
str = (char *) PyObject_MALLOC(len + 1);

CVS/Entries says:

/parsetok.c/2.33/Sun Aug 4 17:29:52 2002//
--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky

Guido van Rossum

2002-09-16 21:52:22 UTC

> > Can you please try the attached patch?
> [snip]
>
> Like a charm.

Great! Martin will check it in.

> Next one to "fail" is test_longexp, which bloats the interpreter
> massively. Simplified, it looks like this:
>
> >>> l = eval ("[" + "2," * 65580 + "]")

Could it be that the parse tree uses twice the memory on a 64-bit
machine, and that that is simply too much? Give it a try with smaller
values of 65580. (:-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

Anders Qvist

2002-09-17 04:56:24 UTC

On Mon, Sep 16, 2002 at 05:52:22PM -0400, Guido van Rossum wrote:
> > Next one to "fail" is test_longexp, which bloats the interpreter
> > massively. Simplified, it looks like this:
> >
> > >>> l = eval ("[" + "2," * 65580 + "]")
>
> Could it be that the parse tree uses twice the memory on a 64-bit
> machine, and that that is simply too much? Give it a try with smaller
> values of 65580. (:-)

Alright. Is this the largest test? If so, it defines the amount of
memory a machine needs to complete the tests. We've been talking about
adding memory to those machines (asmodean and moghedian), that should
solve the problem.

Is there any point to having such a large number?
--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky

Guido van Rossum

2002-09-17 04:59:52 UTC

> Is there any point to having such a large number?

Yes, it tests a specific failure when there were more than 2**16 parse
tree subnodes at a given level. Some generated code really needed
this, and it was fixed (if you have enough memory).

--Guido van Rossum (home page: http://www.python.org/~guido/)

Anders Qvist

2002-09-17 08:41:28 UTC

On Mon, Sep 16, 2002 at 05:52:22PM -0400, Guido van Rossum wrote:
> > > Can you please try the attached patch?
> > [snip]
> >
> > Like a charm.
>
> Great! Martin will check it in.

Surprising noone in particular, test_pwd broke in the same way.

(gdb) f 5
#5 0x200009ff284 in mkpwent (p=0x200008b3308)
at /mp/slaskdisk/tmp/quest/python/dist/src/Modules/pwdmodule.c:59
59 SETS(setIndex++, p->pw_passwd);
(gdb) print p->pw_passwd
$1 = 0x0

Aha! Lookie here:

(gdb) print p->pw_name
$4 = 0x1202faec0 "+"

It's the "NIS-thingie" in /etc/passwd that is the culprit. We may want
to handle that some way?
--
Anders "Quest" Qvist

"We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true." -- Robert Wilensky

Martin v. Löwis

2002-09-17 09:36:35 UTC

Anders Qvist <***@lysator.liu.se> writes:

> Surprising noone in particular, test_pwd broke in the same way.

Fixed in CVS as well.

> $4 = 0x1202faec0 "+"
>
> It's the "NIS-thingie" in /etc/passwd that is the culprit. We may want
> to handle that some way?

It seems that this is both a misconfiguration of your system, and a
bug of it:
- If you use NIS, the "+" entry should transparently resolved to get
list all users known to the system.
- If you don't use NIS, the "+" entry should not be present.
- If a field is ommitted in /etc/passwd, getpwent(3) should return
"", not NULL, for this field.

So I think processing it by returning those fields as None (as done in
previous Python versions) should be sufficient.

Regards,
Martin

Guido van Rossum

2002-09-17 13:33:39 UTC

> Surprising noone in particular, test_pwd broke in the same way.
>
> (gdb) f 5
> #5 0x200009ff284 in mkpwent (p=0x200008b3308)
> at /mp/slaskdisk/tmp/quest/python/dist/src/Modules/pwdmodule.c:59
> 59 SETS(setIndex++, p->pw_passwd);
> (gdb) print p->pw_passwd
> $1 = 0x0
>
> Aha! Lookie here:
>
> (gdb) print p->pw_name
> $4 = 0x1202faec0 "+"
>
> It's the "NIS-thingie" in /etc/passwd that is the culprit. We may want
> to handle that some way?

Hm. AFAIK this code is not reading and parsing /etc/passwd directly,
it's using getpwd() for that. Why doesn't getpwd() process that
cookie??? I think it's supposed to! What does the manual on your
system say? (I don't have NIS set up anywhere here so I can't check.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

Martin v. Löwis

2002-09-17 16:48:51 UTC

Guido van Rossum <***@python.org> writes:

> Hm. AFAIK this code is not reading and parsing /etc/passwd directly,
> it's using getpwd() for that. Why doesn't getpwd() process that
> cookie??? I think it's supposed to! What does the manual on your
> system say? (I don't have NIS set up anywhere here so I can't check.)

It could be that the entry is still in /etc/passwd, but NIS is
disabled in /etc/nsswitch.conf. More likely, there is a bug in getpwent.

Regards,
Martin

Mats Wichmann

2002-09-17 16:57:16 UTC

Interestingly, I'm not seeing this recent spate
of problems on my Itanium Linux build.

It's down to one build warning, which I don't
care about (it's the tempnam stuff), and one
fail and one warning during the regression tests:

./python -E -tt ./Lib/test/regrtest.py -l
<string>:0: FutureWarning: hex/oct constants > sys.maxint will return
positive values in Python 2.4 and up
<string>:0: FutureWarning: hex/oct constants > sys.maxint will return
positive values in Python 2.4 and up
<string>:0: FutureWarning: hex/oct constants > sys.maxint will return
positive values in Python 2.4 and up
test_opcodes
*** These are due to three large constants in the test module
*** I don't see where they're a problem, although in 2.4
*** this test will no longer make sense since they compare to -1

test test_dl crashed -- exceptions.SystemError: module dl requires
sizeof(int) == sizeof(long) == sizeof(char*)

Guido van Rossum

2002-09-17 17:05:10 UTC

> Interestingly, I'm not seeing this recent spate
> of problems on my Itanium Linux build.
>
> It's down to one build warning, which I don't
> care about (it's the tempnam stuff), and one
> fail and one warning during the regression tests:
>
> ./python -E -tt ./Lib/test/regrtest.py -l
> <string>:0: FutureWarning: hex/oct constants > sys.maxint will return
> positive values in Python 2.4 and up
> <string>:0: FutureWarning: hex/oct constants > sys.maxint will return
> positive values in Python 2.4 and up
> <string>:0: FutureWarning: hex/oct constants > sys.maxint will return
> positive values in Python 2.4 and up
> test_opcodes
> *** These are due to three large constants in the test module
> *** I don't see where they're a problem, although in 2.4
> *** this test will no longer make sense since they compare to -1

I'll look at these (eventually).

> test test_dl crashed -- exceptions.SystemError: module dl requires
> sizeof(int) == sizeof(long) == sizeof(char*)

Try removing dlmodule.so from your build subdir. The dl module should
no longer being built on proper 64-bit platforms (where long is
64-bit), but I think if it's there left behind by an earlier build
(before I fixed this in setup.py) it will still try to run the test.

--Guido van Rossum (home page: http://www.python.org/~guido/)

Mats Wichmann

2002-09-17 17:35:08 UTC

>> test test_dl crashed -- exceptions.SystemError: module dl requires
>> sizeof(int) == sizeof(long) == sizeof(char*)
>
>Try removing dlmodule.so from your build subdir. The dl module should
>no longer being built on proper 64-bit platforms (where long is
>64-bit), but I think if it's there left behind by an earlier build
>(before I fixed this in setup.py) it will still try to run the test.

Yup, all gone after cleaning out and rebuilding.

(skipped, no such module)

Tim Peters

2002-09-17 02:06:53 UTC

[Anders Qvist, on test_longexp]
> ...
> The behaviour seems exponential, for nothing untoward is apparent with
> small number of elements in the list.

The parse tree built for this list is huge, though. test_longexp is a pain
in the ass that's failed to work on various platforms since it was first
introduced. That got fixed for all known platforms after 2.2.1 was
released, via a combination of switching to pymalloc for the small
allocations, and doing aggressive overallocation for monstrously large (many
children) parse nodes.

> As numbers grow into tens of thousands, the memory usage starts to
> balloon.

This is so on all platforms -- the test is extreme.

> Technically, the machine could prolly pass the test, had it not been
> for its shortage of memory.
>
> [ 22:40 ] - ./python
> Python 2.3a0 (#2, Sep 16 2002, 17:37:08)
> [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> l = eval ("[" + "2," * 30000 + "]")
> [47434 refs]
>
> PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
> 30153 quest 5 0 24952 24M 2464 S 23M 0.0 20.3 0:04 python
>
> It's late, so I'll have to continue the hunt later. Here's the
> backtrace from gdb, but it doesn't seem to point to what's wrong.

To the contrary, I expect it pointed directly at the culprit <wink>:

> #0 0x12011e3ec in PyNode_AddChild (n1=0x200026166c8, type=301, str=0x0,
> lineno=1) at ../python/dist/src/Parser/node.c:95

and line 95 is a realloc() call. As a parse node grows large,
PyNode_AddChild doubles the amount of memory it asks for every time it runs
out of room.

[Guido]
> Could it be that the parse tree uses twice the memory on a 64-bit
> machine, and that that is simply too much? Give it a try with smaller
> values of 65580. (:-)

test_longexp consumes about 25MB of data space on my 32-bit box. A parse
node contains two pointers so at least those two fields are twice as big on
a 64-bit box. It may (or may not) help to rearrange the struct decl to
order the fields from widest to narrowest:

Current:

typedef struct _node {
short n_type;
char *n_str;
int n_lineno;
int n_nchildren;
struct _node *n_child;
} node;

Possibly more space-efficient:

typedef struct _node {
char *n_str;
struct _node *n_child;
int n_lineno;
int n_nchildren;
short n_type;
} node;

At least one n_child vector grows very large in test_longexp.

Guido van Rossum

2002-09-17 03:25:11 UTC

> test_longexp consumes about 25MB of data space on my 32-bit box. A parse
> node contains two pointers so at least those two fields are twice as big on
> a 64-bit box. It may (or may not) help to rearrange the struct decl to
> order the fields from widest to narrowest:
>
> Current:
>
> typedef struct _node {
> short n_type;
> char *n_str;
> int n_lineno;
> int n_nchildren;
> struct _node *n_child;
> } node;
>
> Possibly more space-efficient:
>
> typedef struct _node {
> char *n_str;
> struct _node *n_child;
> int n_lineno;
> int n_nchildren;
> short n_type;
> } node;
>
> At least one n_child vector grows very large in test_longexp.

Alas, I think both structs would have the same size. The old one
wastes 6 bytes after n_type. So does the new one. Both would be 32
bytes long.

(I checked this in, not realizing this. I'll back it out.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

26 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

Anders Qvist 2002-09-13 22:02:37 UTC

Guido van Rossum 2002-09-14 00:42:03 UTC

Anders Qvist 2002-09-14 07:30:22 UTC

Guido van Rossum 2002-09-14 17:06:50 UTC

Anders Qvist 2002-09-14 18:55:27 UTC

Per Cederqvist 2002-09-14 19:06:48 UTC

Anders Qvist 2002-09-14 19:28:26 UTC

Michael Hudson 2002-09-23 11:17:10 UTC

Guido van Rossum 2002-09-23 13:30:51 UTC

Mats Wichmann 2002-09-23 14:36:52 UTC

Tim Peters 2002-09-14 20:34:04 UTC

Laura Creighton 2002-09-15 07:40:08 UTC

Anders Qvist 2002-09-14 12:08:43 UTC

Martin v. Löwis 2002-09-16 14:01:31 UTC

Anders Qvist 2002-09-16 21:02:56 UTC

Guido van Rossum 2002-09-16 21:52:22 UTC

Anders Qvist 2002-09-17 04:56:24 UTC

Guido van Rossum 2002-09-17 04:59:52 UTC

Anders Qvist 2002-09-17 08:41:28 UTC

Martin v. Löwis 2002-09-17 09:36:35 UTC

Guido van Rossum 2002-09-17 13:33:39 UTC

Martin v. Löwis 2002-09-17 16:48:51 UTC

Mats Wichmann 2002-09-17 16:57:16 UTC

Guido van Rossum 2002-09-17 17:05:10 UTC

Mats Wichmann 2002-09-17 17:35:08 UTC

Tim Peters 2002-09-17 02:06:53 UTC

Guido van Rossum 2002-09-17 03:25:11 UTC

about - legalese

Loading...