Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Aug 2009 12:04:13 +0930
From:      "Daniel O'Connor" <doconnor@gsoft.com.au>
To:        freebsd-stable@freebsd.org
Subject:   Blocked process
Message-ID:  <200908201204.24914.doconnor@gsoft.com.au>

next in thread | raw e-mail | index | archive | help
--nextPart5065802.6FKKfXtbGd
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hi,
We have several systems doing data acquisition and I had originally=20
thought we were seeing the interrupt handler for out PCI card not being=20
called quickly enough, however I misread the diagnostics :)

The digitised data is fed into a FIFO and when it is part full=20
(32kbytes) an interrupt is generated. The IRQ routine reads 32kbyte=20
chunks into a kernel buffer (4Mbyte) until part full goes away. If the=20
=46IFO full flag is seen (it is latched by the hardware) then acquisition=20
is halted.

The problem appears to now be that the userland process that reads data=20
out of the kernel is being stalled for over 4 seconds. This process=20
reads from the kernel and does some minor processing and then writes it=20
out to a child process to do some more work on it.

I ran 'ps -xaulwww' in a loop every second to see what ELSE was using=20
the CPU when it was stalled and found that my script stalled for 7=20
seconds.

I tried increasing the buffer inside the kernel (to 8Mb) which seemed to=20
have no effect, however renice'ing the process from -5 to -20 has=20
greatly reduced the frequency of occurrence. WRT the buffer size - I=20
would expect that if I increased it more it would reduce the problem=20
but since I have only increased it to ~4 seconds worth and the stall is=20
longer I see no effect.

Given that renice'ing has an effect it seems to be a scheduler problem,=20
I don't see how it can be something to do with the motherboard stalling=20
the whole system otherwise the FIFO full error would occur, however I=20
only see the 4Mb kernel buffer filling up.

One other possibility would be something holding a lock for too long=20
that blocks both the DAQ readout process and ps, however I am not sure=20
how I would find out what.

Unfortunately the system is in Finland and I'm in Australia so I can't=20
sit at the console :(

I am hoping to be able to replicate the HW & SW locally at some stage=20
but haven't been able to yet.

Any help appreciated, thanks!

=2D-=20
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

--nextPart5065802.6FKKfXtbGd
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (FreeBSD)

iD8DBQBKjLYw5ZPcIHs/zowRAqa7AJ9W8IABIKjqB7Owy1Bn3n3d3H5rzACfS93E
1rl/XRZzeFggAjs0MhDFCLw=
=hOG1
-----END PGP SIGNATURE-----

--nextPart5065802.6FKKfXtbGd--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200908201204.24914.doconnor>