Date: Thu, 20 Aug 2009 12:04:13 +0930 From: "Daniel O'Connor" <doconnor@gsoft.com.au> To: freebsd-stable@freebsd.org Subject: Blocked process Message-ID: <200908201204.24914.doconnor@gsoft.com.au>
next in thread | raw e-mail | index | archive | help
--nextPart5065802.6FKKfXtbGd Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi, We have several systems doing data acquisition and I had originally=20 thought we were seeing the interrupt handler for out PCI card not being=20 called quickly enough, however I misread the diagnostics :) The digitised data is fed into a FIFO and when it is part full=20 (32kbytes) an interrupt is generated. The IRQ routine reads 32kbyte=20 chunks into a kernel buffer (4Mbyte) until part full goes away. If the=20 =46IFO full flag is seen (it is latched by the hardware) then acquisition=20 is halted. The problem appears to now be that the userland process that reads data=20 out of the kernel is being stalled for over 4 seconds. This process=20 reads from the kernel and does some minor processing and then writes it=20 out to a child process to do some more work on it. I ran 'ps -xaulwww' in a loop every second to see what ELSE was using=20 the CPU when it was stalled and found that my script stalled for 7=20 seconds. I tried increasing the buffer inside the kernel (to 8Mb) which seemed to=20 have no effect, however renice'ing the process from -5 to -20 has=20 greatly reduced the frequency of occurrence. WRT the buffer size - I=20 would expect that if I increased it more it would reduce the problem=20 but since I have only increased it to ~4 seconds worth and the stall is=20 longer I see no effect. Given that renice'ing has an effect it seems to be a scheduler problem,=20 I don't see how it can be something to do with the motherboard stalling=20 the whole system otherwise the FIFO full error would occur, however I=20 only see the 4Mb kernel buffer filling up. One other possibility would be something holding a lock for too long=20 that blocks both the DAQ readout process and ps, however I am not sure=20 how I would find out what. Unfortunately the system is in Finland and I'm in Australia so I can't=20 sit at the console :( I am hoping to be able to replicate the HW & SW locally at some stage=20 but haven't been able to yet. Any help appreciated, thanks! =2D-=20 Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C --nextPart5065802.6FKKfXtbGd Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (FreeBSD) iD8DBQBKjLYw5ZPcIHs/zowRAqa7AJ9W8IABIKjqB7Owy1Bn3n3d3H5rzACfS93E 1rl/XRZzeFggAjs0MhDFCLw= =hOG1 -----END PGP SIGNATURE----- --nextPart5065802.6FKKfXtbGd--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200908201204.24914.doconnor>