Perl 5.8.[5-7] Regular Expression Bug
Written by Geoff Mottram (geoff at minaret dot biz).
Placed in the public domain on September 14, 2004 by the author.
Last updated: September 17, 2005.
This document is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the author be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with this document or the use or other dealings in this document.
Contents
Introduction
Summary
Details
Introduction
This document describes a bug when using the "-d" (debug) option of Perl versions 5.8.5 through 5.8.7 where a regular expression will either cause the program to crash or to emit one of the following error messages:Out of memory during "large" request for...Out of memory during ridiculously large requestSummary
If you are receiving a segmentation fault or an out of memory condition when executing a Perl regular expression while debugging a Perl script with the Perl debug ("-d") option, you should either apply the following fix or revert back to version 5.8.4 of Perl. The problem is caused when "()" operators are used in a regular expression. In my case, the regular expression was the following:if ($a =~ m/^([A-Za-z]+)(.*)/)The second time this expression is executed, the previous values of $1 and $2 are replaced with new values. When the replacement occurs, the old values are freed. If the old values are objects (which seems to happen with UTF-8 values), their DESTROY method is called. When running under the Perl debugger (-d option), the "sub" method of the "DB.pm" module is called which includes a regular expression. This regular expression causes the state of original regular expression to get clobbered.
The fix is applied to Perl source file "sv.c". It requires Perl to save its regular expression context on its stack prior to the call to the DESTROY method of the regular expression object being freed. A patch has been provided at the end of this document. It is a simple fix that can be applied by hand. Insert the following line of code to function Perl_sv_clear in file "sv.c" at line 5105 (in Perl 5.8.7, your line number may vary for previous versions):
save_re_context();The following snippet shows the addition:
if (destructor) { SV* tmpref = newRV(sv); SvREADONLY_on(tmpref); /* DESTROY() could be naughty */ ENTER; save_re_context(); PUSHSTACKi(PERLSI_DESTROY); EXTEND(SP, 2); PUSHMARK(SP); PUSHs(tmpref); PUTBACK; call_sv((SV*)destructor, G_DISCARD|G_EVAL|G_KEEPERR|G_VOID); POPSTACK; SPAGAIN; LEAVE; if(SvREFCNT(tmpref) < 2) { /* tmpref is not kept alive! */ SvREFCNT(sv)--; SvRV(tmpref) = 0; SvROK_off(tmpref); } SvREFCNT_dec(tmpref); }The call to "save_re_context()" is placed between the "ENTER;" and "PUSHSTACKi(PERLSI_DESTROY);" lines.
Once you have added this line, recompile Perl and you are all set. If you are using mod_perl, you will need to re-make that program after you have installed the fixed version of Perl.
Details
The following information was submitted to perlbug@perl.org and was assigned an ID of [perl #37102]:-------- Original Message -------- Subject: Perl regexp state gets clobbered (includes a fix) Date: Wed, 07 Sep 2005 20:54:29 -0400 From: Geoff MottramTo: perlbug at perl.org The following problem has been observed under versions of Perl after 5.8.4 on Linux and Windows including the most recent stable version (5.8.7). The script that experienced the problem is too complex to include and requires a connection to a database server. A patch with a fix is provided at the end of this email. The following regex was causing either an Out of Memory condition or a segmentation fault after two iterations: if ($a =~ m/^([A-Za-z]+)(.*)/) It was caused by the PL_regeol variable of regexec.c getting overwritten when in the "av_store()" call on line 4370 of regexec.c (see gdb backtrace below): if (a) sw = *a; else if (si && doinit) { sw = swash_init("utf8", "", si, 1, 0); (void)av_store(av, 1, sw); } When av_store is called, the previous positional parameter is de-referenced and its DESTROY method is called. I could not figure out what object was being destroyed but the gdb backtrace will probably ring a few bells for you. The script was working with UTF-8 content and wide characters, which I suspect was related to the problem. In any case, the DESTROY method that was called included a regular expression that would trash the state of the regular expression engine a few methods up the stack. My solution is given in a patch at the very end of this email. Please excuse the comment with my name but it will allow me to find my fix in my copy of Perl should I need to in the future. I added the following call to sv_clear() in sv.c: save_re_context(); just prior to the call to the object's DESTROY method. This allows regular expressions to be used in the DESTROY method of whatever object is being used as a positional variable in a regular expression (a Unicode string?). I agree with the comments above the save_re_context() function that this is a really kludgey way for the regular expression engine to be written. The engine should store its state in a stack or allocated structure that is passed to its subroutines and not use global variables. I am afraid that there may be other sections in Perl that will also require calls to save_re_context() that have not been discovered yet. Thanks for getting my fix in to the Perl distribution. Sincerely, Geoff Mottram geoff@gate.net --------------Output of perl -V ------------ Summary of my perl5 (revision 5 version 8 subversion 7) configuration: Platform: osname=linux, osvers=2.4.21-27.0.2.elsmp, archname=i686-linux uname='linux classificationweb.net 2.4.21-27.0.2.elsmp #1 smp wed jan 12 23:35:44 est 2005 i686 i686 i386 gnulinux ' config_args='-de' hint=previous, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING', optimize='-g', cppflags='-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING' ccversion='', gccversion='3.2.3 20030502 (Red Hat Linux 3.2.3-49)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: DEBUGGING USE_LARGE_FILES Built under linux Compiled at Sep 7 2005 06:10:19 @INC: /usr/local/lib/perl5/5.8.7/i686-linux /usr/local/lib/perl5/5.8.7 /usr/local/lib/perl5/site_perl/5.8.7/i686-linux /usr/local/lib/perl5/site_perl/5.8.7 /usr/local/lib/perl5/site_perl . -------------------------------------------- ------------Output of perbug -d ------------- --- Flags: category= severity= --- Site configuration information for perl v5.8.7: Configured by minaret at Wed Sep 7 06:08:36 EDT 2005. Summary of my perl5 (revision 5 version 8 subversion 7) configuration: Platform: osname=linux, osvers=2.4.21-27.0.2.elsmp, archname=i686-linux uname='linux classificationweb.net 2.4.21-27.0.2.elsmp #1 smp wed jan 12 23:35:44 est 2005 i686 i686 i386 gnulinux ' config_args='-de' hint=previous, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING', optimize='-g', cppflags='-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING' ccversion='', gccversion='3.2.3 20030502 (Red Hat Linux 3.2.3-49)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: --- @INC for perl v5.8.7: /usr/local/lib/perl5/5.8.7/i686-linux /usr/local/lib/perl5/5.8.7 /usr/local/lib/perl5/site_perl/5.8.7/i686-linux /usr/local/lib/perl5/site_perl/5.8.7 /usr/local/lib/perl5/site_perl . --- Environment for perl v5.8.7: HOME=/minaret LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin/bin:/usr/local/java/jdk1.5.0_04/bin:/usr/local/java/jdk1.5.0_04/jre/bin:/minaret/m/bin:/minaret/m/util:/minaret/bin PERL_BADLANG (unset) SHELL=/bin/bash ---------------------------------------------- ----- gdb backtrace showing how PL_regeol gets clobbered by Perl_re_intuit_start------------------------- #0 Perl_re_intuit_start (prog=0x90e5070, sv=0x8fda2a0, strpos=0x9727d28 "utf8::DESTROY", strend=0x9727d35 "", flags=2, data=0x0) at regexec.c:454 #1 0x080ca09f in Perl_pp_match () at pp_hot.c:1328 #2 0x080b288d in Perl_runops_debug () at dump.c:1452 #3 0x08063509 in S_call_body (myop=0xbfffa050, is_eval=0) at perl.c:2364 #4 0x0806318f in Perl_call_sv (sv=0x95d17b0, flags=150) at perl.c:2282 #5 0x080d9de2 in Perl_sv_clear (sv=0x972d270) at sv.c:5110 #6 0x080da66c in Perl_sv_free (sv=0x972d270) at sv.c:5356 #7 0x080da24c in Perl_sv_clear (sv=0x978cf10) at sv.c:5208 #8 0x080da66c in Perl_sv_free (sv=0x978cf10) at sv.c:5356 #9 0x080c3800 in Perl_av_store (av=0x95509a0, key=1, val=0x9702280) at av.c:335 #10 0x0812e348 in Perl_regclass_swash (node=0x9552c84, doinit=1 '\001', listsvp=0x0, altsvp=0xbfffa27c) at regexec.c:4370 #11 0x0812e551 in S_reginclass (n=0x9552c84, p=0x9760c01 "3279.K86", lenp=0x0, do_utf8=1 '\001') at regexec.c:4420 #12 0x0812d438 in S_regrepeat (p=0x9552c84, max=2147483647) at regexec.c:4112 #13 0x0812c4ae in S_regmatch (prog=0x9552c74) at regexec.c:3774 #14 0x08126f7c in S_regtry (prog=0x9552c30, startpos=0x9760c00 "B3279.K86") at regexec.c:2204 #15 0x08125a61 in Perl_regexec_flags (prog=0x9552c30, stringarg=0x9760c00 "B3279.K86", strend=0x9760c09 "", strbeg=0x9760c00 "B3279.K86", minend=0, sv=0x95507fc, data=0x0, flags=3) at regexec.c:1753 #16 0x080ca120 in Perl_pp_match () at pp_hot.c:1340 #17 0x080b288d in Perl_runops_debug () at dump.c:1452 #18 0x08062878 in S_run_body (oldscope=1) at perl.c:1995 #19 0x08062431 in perl_run (my_perl=0x8fda008) at perl.c:1919 #20 0x0805e81b in main (argc=20, argv=0xbfffa914, env=0xbfffa968) at perlmain.c:98 (gdb) p *prog $111 = {startp = 0x90dfb48, endp = 0x90e4e88, regstclass = 0x0, substrs = 0x90e1c88, precomp = 0x90e5018 "^threads::new$", data = 0x0, subbeg = 0x0, offsets = 0x90e5030, sublen = 0, refcnt = 1, minlen = 12, prelen = 14, nparens = 0, lastparen = 0, lastcloseparen = 0, reganch = 137363457, program = {{flags = 156 '\234', type = 0 '\0', next_off = 0}}} ---------------- Patch for sv.c to fix problem --------------- --- sv.c.orig 2005-05-27 06:38:11.000000000 -0400 +++ sv.c 2005-09-07 19:31:19.000000000 -0400 @@ -5102,6 +5102,8 @@ SV* tmpref = newRV(sv); SvREADONLY_on(tmpref); /* DESTROY() could be naughty */ ENTER; + /* Next line added by Geoff Mottram 9/7/05. */ + save_re_context(); PUSHSTACKi(PERLSI_DESTROY); EXTEND(SP, 2); PUSHMARK(SP); ---------------------------------------------------------------