Date: Fri, 08 Feb 2008 12:41:44 +0000 From: Alex Zbyslaw <xfb52@dial.pipex.com> To: lachlan@lkla.org Cc: freebsd-questions@freebsd.org, mark@msapiro.net Subject: Re: Memory Error using Mailman on FreeBSD. How to debug? Message-ID: <47AC4E08.1060801@dial.pipex.com> In-Reply-To: <26921.137.153.0.25.1202463164.squirrel@sm.lkla.org> References: <1153.137.153.0.37.1202210274.squirrel@sm.lkla.org> <69739C80-0639-4808-B5EB-0D9553826559@dpcsys.com> <30396.137.153.0.36.1202264253.squirrel@sm.lkla.org> <47A99B4E.1080707@dial.pipex.com> <28742.137.153.0.25.1202301936.squirrel@sm.lkla.org> <47A9BCB0.8020309@dial.pipex.com> <26921.137.153.0.25.1202463164.squirrel@sm.lkla.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Lachlan Michael wrote: >>Real puzzler. I'm surprised not to have at least one process growing, >>though. Maybe it's not using much CPU and you're not spotting it. >> >> >Following you advice, as far as I can tell, the mailman qrunner process > > /usr/local/bin/python2.5 /usr/local/mailman/bin/qrunner >--runner=IncomingRunner:0:1 -s > >is the one that crashes: all other mailman processes are unaffected. I >couldn't see it increase much in size (maybe it went from 8.5M to 12.5M), >then it just bombed and a new process was spawned (easy to tell by the >large increase in PID). > > All I can think us that qrunner asks for such a large amount of memory in one go, that it bombs out without ever growing. That fits with the ktrace output as well. Regretably, I don't think you can tell *how* much memory was asked for. (The normal pattern with out of memory errors is for the process to grow and grown and grow and die; but it's not the only one). >>Other things to try: Up the stack size >> ulimit -s 262144 >> >>inside the mailman startup. Again, I've had processes in the past which >>needed this. >> >> >Ok, I am going to gradually try different limits. It seems as though setting >kern.maxssiz="256M" >and so on in /boot/loader.conf will allow me to increase the limits. >Having to reboot is a pain, though. How far can I go? 512M? (Physical >memory is 1GB) > > Certainly not more than physical memory :-) To be honest, if 256M doesn't do it then this probably isn't the problem. I'm not particularly hopeful that this will do it, but in your circumstance I would try it. At the same time, you could also increase the data size (maxdsiz?) to 1Gb (yours looks like 0.5Gb, half your physical memory). My limit settings (also 1Gb) look like: datasize 1048576 kbytes stacksize 262144 kbytes which come from trying to set 256Mb and 1024Mb in the kernel config (old FreeBSD - no sysctls). Keep the ulimit -a in the mailman startup script so you can confirm that you really get these numbers. >>Can you email a file of the size your are >>trying not through mailman? Maybe your MTA (sendmail/postfix etc) has a >>limit that somehow causes mailman to get this error. >> >> > >This is definitely not the case. Users can receive (and send) similar >sized large attachments individually, so the MTA (sendmail in this case) >is not the cause. > > OK - rule that out. The ktrace showing qrunner failing a break pretty much does that too. >>The final suggestion is to try to trace (ktrace, strace from ports) the >>process that is dying, >> >I'll admit it is my first time to try a ktrace, but after noting which >process it was that crashed I could identify the newly spawned PID, and >obtained a ktrace.out (binary) and a kdump (called >mailman_process_log.txt) when the problems occurs by sending another large >mail attachment. I'll leave the files up for a couple of days. (Both >files are about 2MB in size) > >http://lachlan.lkla.org/tmp/mailman_memory_error/ > >Not that I can properly interpret the results, but it seems the mail file >is completely read, but whatever happens next causes the memory error. > > 52506 python2.5 RET read 354/0x162 > 52506 python2.5 CALL break(0x8add000) > 52506 python2.5 RET break 0 > 52506 python2.5 CALL break(0x8cc3000) > 52506 python2.5 RET break -1 errno 12 Cannot allocate memory > > The kdump output is the only useful bit, really. Your analysis seems correct to me. You are also getting a stack trace from python when it exits with the "out of memory" error. ktrace is just showing python printing the stuff - it may be that the error also ends up in a log file somewhere - don't know where mailman logs, sorry. From that stack trace it should be possible to figure out which line of the python is actually causing that memory request. My bet is on one of the cPickle lines, but it would be nice to see the stack trace "raw" so to speak. Maybe that stack trace would help someone on the mailman list suggest something else. Did you already try sending a different kind of attachment that's the same kind of size (a bit bigger would be better). Maybe it's something about the attachment itself that's causing the issue? As a final resort, if none of the above resolves or leads to clues, I would try uninstalling python2.5 and installing python2.4 *just in case*. I'm assuming that you only have python for mailman. (If you have real python users then it's trickier. You can install multiple versions of python but possibly not from ports. But python always compiled cleanly from tarball on FreeBSD for me. I can offer some help with that process if you really need it). I can't help thinking that 500Kb is a very small attachment and I can't really see why it would legitimately cause a request for so much memory that your settings aren't handling it. A quick look at the mailman web site shows that you can run qrunner from the command line - couldn't immediately find the man page though. If you could somehow queue up the email with Mailman switched off, you could run qrunner by hand and then you'd definitely get the python backtrace. Maybe the mailman list, or a mailman admin here, can help with that, if you need it. Running out of ideas. --Alex
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47AC4E08.1060801>