Date: Sat, 12 Jun 2010 16:26:29 +0200 From: Rolf Grossmann <rg@xamine.com> To: Daniel Braniss <danny@cs.huji.ac.il> Cc: freebsd-scsi@freebsd.org Subject: Re: ZFS reports problem on iscsi target Message-ID: <4C139915.8040308@xamine.com> In-Reply-To: <E1ONLjb-00086F-C8@kabab.cs.huji.ac.il> References: <4C12538C.9000400@xamine.com> <E1ONLjb-00086F-C8@kabab.cs.huji.ac.il>
next in thread | previous in thread | raw e-mail | index | archive | help
On 12.06.2010 10:06, Daniel Braniss wrote: >> Hi, >> >> I'm having some trouble with iscsi on FreeBSD 8. My current setup is a >> stock FreeBSD 8.1-PRERELEASE (as of 2 days ago), GENERIC kernel with >> some modules loaded, running on a Dell PowerEdge R905 with 64GB RAM, 4 >> quad code CPUs. Attached is an EqualLogic PS6500 storage array with some >> configured volumes, one of which is for testing. It is configured in >> /etc/iscsi.conf like this: >> >> test2 { >> >> TargetName=iqn.2001-05.com.equallogic:0-8a0906-7a4bb9f06-038000000304c0d1-test2 >> TargetAddress=10.26.17.10:3260,1 >> tags = 256 >> } >> >> Now I'm running the following sequence of commands (shown with output): >> >> # iscontrol -n test2 >> iscontrol[56255]: running >> iscontrol[56255]: (pass2:iscsi0:0:0:0): tagged openings now 256 >> iscontrol[56255]: cam_open_btl: no passthrough device found at 2:0:1 >> iscontrol[56255]: cam_open_btl: no passthrough device found at 2:0:2 >> iscontrol[56255]: cam_open_btl: no passthrough device found at 2:0:3 >> iscontrol: supervise starting main loop >> # zpool create test2 da2 >> # zpool scrub test2 >> # zpool status test2 >> pool: test2 >> state: ONLINE >> scrub: scrub completed after 0h0m with 0 errors on Fri Jun 11 16:56:33 2010 >> config: >> >> NAME STATE READ WRITE CKSUM >> test2 ONLINE 0 0 0 >> da2 ONLINE 0 0 0 >> >> errors: No known data errors >> # cp -Rp /export/system /test2/ >> # zpool scrub test2 >> # zpool status test2 >> pool: test2 >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: scrub completed after 0h0m with 19 errors on Fri Jun 11 17:00:38 >> 2010 >> config: >> >> NAME STATE READ WRITE CKSUM >> test2 ONLINE 0 0 19 >> da2 ONLINE 0 0 38 >> >> errors: 19 data errors, use '-v' for a list >> # >> >> /export/system is a FreeBSD distribution (make install >> DESTDIR=/export/system). Note how zfs thinks there are 19 files broken >> after the copy. If I repeat the process, the files vary, but there are >> always some reported as broken. In this case, they don't seem to be (as >> checked with md5 and rsync --checksum), but I've had files only giving >> me an i/o error. Also, if I repeat the same steps on a local disk, zfs >> is reporting no errors. >> >> What I would like to know is: >> - Is there anything I'm doing wrong? Is there a known problem? >> - Are there any tools to debug or more reliably reproduce (and narrow >> down) the problem? I've tried fsx (from /usr/src/tools/regression), but >> I couldn't find any usage suggestions (other than the usage when run >> without options) and it doesn't complain when run. >> - On a different system I've tried using a newer iscsi version from >> http://www.cs.huji.ac.il/~danny/ftp/freebsd/ but it didn't make any >> difference. Is that still preferable? >> >> Some help would be appreciated. > > Hi Rolf, > I just ran a bunch of tests, like yours, without any problem. > my setup: > the target is a NetApp, the host runing the initiator is an > AMD Phenom(tm) II X6 1090T Processor, running a very resent 8.1-PRERELEASE > with 4GB of RAM so that "vfs.zfs.prefetch_disable" is true, so maybe > you can try disabling it? > appart from that, maybe you can check EqualLogic's logs. > HTH, > danny > PS: you should use the latest iscsi-2.2.4.tar.gz > > Hi Danny, thanks for your reply. I've just tried again with vfs.zfs.prefetch_disable=1, but it makes no difference. I also don't expect zfs to be my problem, so I've just had the idea to try ufs with the following result (still on stock 8.1-PRERELEASE): # newfs /dev/da2 /dev/da2: 20490.0MB (41963520 sectors) block size 16384, fragment size 2048 using 112 cylinder groups of 183.72MB, 11758 blks, 23552 inodes. super-block backups (for fsck -b #) at: 160, 376416, 752672, 1128928, 1505184, 1881440, 2257696, 2633952, [...] 41388320, 41764576 # fsck -t ufs /dev/da2 ** /dev/da2 ** Last Mounted on ** Phase 1 - Check Blocks and Sizes PARTIALLY ALLOCATED INODE I=94272 CLEAR? [yn] *ouch* interesting is the fact that this time it seems to be very repeatable. I can even have fsck fix the problem (and subsequent fsck are fine), but after a newfs, fsck complains about this inode. There is nothing in the EqualLogic's logs except for connect and disconnect entries. Also, I'm using a different volume on the same EqualLogic from a different machine running Ubuntu Linux with open-iscsi and fuse-zfs with no problems (except less performance ;P), so I don't suspect a hardware problem. I guess I'll spend some time looking at a tcpdump of the newfs/fsck test, but it will be a while until I understand all the protocols involved. Any other suggestions would be very welcome. Thanks, Rolf.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C139915.8040308>