Date: Tue, 28 May 2002 12:37:36 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: "Robert Blayzor" <rblayzor@inoc.net> Cc: <freebsd-stable@FreeBSD.ORG> Subject: Re: Swap_pager error Message-ID: <200205281937.g4SJbalA024380@apollo.backplane.com> References: <008201c20673$37ac9c60$6f00000a@z0.inoc.net>
next in thread | previous in thread | raw e-mail | index | archive | help
This message: :swap_pager: indefinite wait buffer: #amrd/0x20001, blkno:272, size:4096 Occurs when the kernel tries to write a page of memory to swap and the write is still not complete after 20 seconds. This type of error typically occurs if the hard drive has gone all flaky or if hard errors exist in the swap partition. If so, the 'dmesg' output should show a hard disk I/O warning or error message. fsck only checks filesystems, and then only for corrupt data (it doesn't check for bad blocks). Fsck does not check swap. I recommend that you scan your hard drive partitions for errors using dd. 'pstat -s' will tell you what your swap is mounted on. For example: apollo:/usr/src/sys> pstat -s Device 1K-blocks Used Avail Capacity Type /dev/rda0s1b 1048448 244 1048204 0% Interleaved To read every block on a partition use 'dd' on the partition: dd if=/dev/da0s1b of=/dev/null bs=32k (long wait. The drive light should be saturated. Run 'iostat da0 1' in another window to observe the disk transfer activity). You can do this on any partition, including filesystem partitions, and the system can be live when you do it (since all you are doing is reading the raw blocks off the disk). You can also run 'dd' on the entire disk (e.g. /dev/da0 rather then /dev/da0s<X><Y>) but then if you get errors you may not be able to figure out which logical partition they occured in. In anycase, if the machine is otherwise idle you should see a fairly uniform data transfer rate in the iostat output while the dd is going on. For example on one of my machines I get: iostat da0 1 ... tin tout KB/t tps MB/s us ni sy in id 9 765 0.00 0 0.00 0 0 0 0100 9 461 8.00 1 0.01 2 0 0 0 98 3 49 0.00 0 0.00 0 0 0 0100 0 43 0.00 0 0.00 0 0 0 0100 tty da0 cpu tin tout KB/t tps MB/s us ni sy in id 4 169 31.69 388 12.01 0 0 2 0 98 <<< start dd test 0 42 31.88 999 31.11 0 0 1 0 99 0 43 32.00 1043 32.58 2 0 1 2 96 0 44 32.00 1006 31.43 0 0 6 0 94 ... 1 75 32.00 1050 32.83 1 0 2 1 96 0 43 32.00 1051 32.86 0 0 2 1 97 2 44 32.00 1042 32.55 0 0 3 2 95 6 223 32.00 1053 32.92 0 0 1 0 99 0 44 32.00 1051 32.86 1 0 2 1 96 0 43 31.98 1033 32.25 0 0 2 1 98 0 174 32.00 906 28.31 0 0 2 1 97 <<< dd finishes 0 43 0.00 0 0.00 1 0 0 0 99 If you see it suddenly drop down in the middle of the dd operation and then pick up again the hard drive may have soft errors internally but is still able to finally retrieve the block. If the kernel ('dmesg' program and '/var/log/messages' log file) reports disk errors during your dd then you may have a problem with one or more drives. -Matt : ( from "Robert Blayzor" <rblayzor@inoc.net> ) :We have a Dell PowerEdge 2550 server. It's running FreeBSD4-stable :(up'd just a couple of weeks ago). It's an SMP box, 1GB of RAM, two :3com Tigon2 Gigabit NIC cards and a PERC3/QC controller. : :We have two logical drives. One is a RAID1 set of two 9GB drives which :holds the operating system only. The other is a 300GB RAID10 array. : :The box had been running fine for months when suddenly the box got hosed :as we received tons of these errors on the console. (nothing logged to :/var/log/messages) : :swap_pager: indefinite wait buffer: #amrd/0x20001, blkno:272, size:4096 : :The box only runs as an NFS/Samba server and nothing else. It :eventually just became useless and we had to reset the box hard. : :We ran FSCK and it reported no errors and the box came up normally. We :were considering running scanning on the OS disk containing the swap, :but feel there really is no need to as the RAID controller is reporting :no problems as well. : :Anyone have any suggestions on where to start looking for this problem? :We've had this unit in service almost six months and this is the first :time we've seen this. Is there a way to "test" swap space in production :other than writing something to gobble up memory and forcing the box to :swap? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200205281937.g4SJbalA024380>