Date: Mon, 12 Mar 2012 20:33:04 +0200 From: Mikolaj Golub <to.my.trociny@gmail.com> To: Phil Regnauld <regnauld@x0.dk> Cc: freebsd-stable@freebsd.org Subject: Re: Issue with hast replication Message-ID: <86k42pu0tb.fsf@kopusha.home.net> In-Reply-To: <20120312143127.GM12975@macbook.bluepipe.net> (Phil Regnauld's message of "Mon, 12 Mar 2012 15:31:27 %2B0100") References: <20120311185457.GB1684@macbook.bluepipe.net> <861uoyvpzh.fsf@kopusha.home.net> <20120311220911.GD1684@macbook.bluepipe.net> <20120312143127.GM12975@macbook.bluepipe.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 12 Mar 2012 15:31:27 +0100 Phil Regnauld wrote: PR> Phil Regnauld (regnauld) writes: >> >> 7) ktrace on the destination dd: >> >> fstat(0,{ mode=p--------- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0) >> lseek(0,0x0,SEEK_CUR) ERR#29 'Illegal seek' PR> [...] >> Illegal seek, eh ? Any clues ? >> >> The boxes are identical (HP DL380 G6), though the RAM config is different. >> >> Summary: >> >> - ssh works fine >> - h1 zvol to h2 zvol over ssh fails >> - h1 zvol to h2 /tmp/x over ssh is fine >> - h2 /dev/zero locally to h2 zvol is fine >> - h2 /tmp/x locally to h2 zvol fails at first, but works afterwards... PR> A few more data points: dd from a local zvol to a local zvol on either PR> machine works fine. PR> Using nc instead of ssh, this time it's the sender nc dying: PR> ktrace on the sender: PR> 47704 nc CALL write(0x3,0x7fffffff5450,0x800) PR> 47704 nc RET write -1 errno 32 Broken pipe PR> 47704 nc PSIG SIGPIPE SIG_DFL code=0x10006 PR> truss on the sender: PR> poll({3/POLLIN 0/POLLIN},2,-1) = 2 (0x2) PR> read(3,0x7fffffff5450,2048) ERR#54 'Connection reset by peer' PR> close(3) = 0 (0x0) PR> On tcpdump, I do see the receiver send a FIN when using nc. PR> When using ssh, the sender is sending the FIN. PR> Anything else I can look for ? It looks like in the case of hastd this was send(2) who returned ENOMEM, but it would be good to check. Could you please start synchronization again, ktrace primary worker process when ENOMEM errors are observed and show output here? If it is send(2) who fails then monitoring netstat and network driver statistics might be helpful. Something like netstat -nax netstat -naT netstat -m netstat -nid sysctl -a dev.<nic> And may be vmstat -m vmstat -z -- Mikolaj Golub
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86k42pu0tb.fsf>