Date: Mon, 29 Oct 2007 09:46:59 +0300 From: "Artem Kuchin" <matrix@itlegion.ru> To: <freebsd-current@freebsd.org> Subject: Problems with gjournal or something else. Message-ID: <00f101c819f7$833d5370$0c00a8c0@Artem>
next in thread | raw e-mail | index | archive | help
I am experiencing a very weird problem with filesystem and it seems to be related to gjournal. It is FreeBSD 7-BETA1 RAID controller: 3WARE 7500x device driver: twe SMP enabled (Pentium D) Mirror raid. I have created the following partitions: twed1s1a <none> 1100MB * twed1s1b swap 1024MB SWAP twed1s1d <none> 5120MB * twed1s1e <none> 30720MB * twed1s1f <none> 261GB * did reboot just is case something is cached. Then did: newfs -J -b 8192 -f 1024 -g 50000 -h 20 -i 40960 /dev/twed1s1f gjournal load gjournal label -f /dev/twed1s1f tunefs -J enable -n disable /dev/twed1s1f mount -o noatime /dev/twed1s1f.journal /NEW/suit osiris# tunefs -p /dev/twed1s1f tunefs: ACLs: (-a) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) disabled tunefs: gjournal: (-J) enabled tunefs: maximum blocks per file in a cylinder group: (-e) 1024 tunefs: average file size: (-f) 50000 tunefs: average number of files in a directory: (-s) 20 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time tunefs: volume label: (-L) # newfs command for /dev/twed1s1f (/dev/twed1s1f) newfs -O 2 -a 16 -b 8192 -d 8192 -e 1024 -f 1024 -g 50000 -h 20 -m 8 -o time -s 273771329 /dev/twed1s1f Then i started a huge and long copying process from the old raid 5 array (about 200GB of data). Some time later i have found machine practically frozen becauase log file is filling with error: Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275085824, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278362624, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279272857600, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278493696, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275216896, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278624768, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279272988672, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275347968, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278755840, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279273119744, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278886912, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275479040, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279279017984, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279273250816, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279279149056, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275610112, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279279280128, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279273381888, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279279411200, length=131072)]error = 5 Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275741184, length=131072)]error = 5 Since it is a EIO i have started verify on the contoller - everything is ok. Did cat /dev/random > /NEW/suit/aaa.dat filling the whole fs with a hunge file. - ok did dd if=/dev/twed1s1f of=/dev/null bs=1M - ok The i re-newfs-ed this fs w/o -J, unloaded gjournal and did the same copying - it took several hours and went just fine. So, it is not a hardware problem and it seems to be related to gjournal. One more weird thing happened here. gjournal complained hat BIO_FLUSH is not supported by the driver. However, AFAIK twe is working via scsi subsystem and the authour of gjournal said somewhere that he has had implemeneted BIO_FLISH for scsi and he specifically mentioned that he has tested twe and twa and they both support BIO_FLUSH. Alo, I think offset value in the error message is out of range of this filesystem. The controller has a cache of 64MB on board and the author of gjournal said in some discussion that if BIO_FLUSH support is missing and controller chache is larger than gjournal's cache then there might be problems. I did not find any specific value for the gjournal cache. So, the problem maybe related to this issue (something gets messed up). but i am not sure. Any idea anyone? -- Regards, Artem
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00f101c819f7$833d5370$0c00a8c0>