Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Mar 2012 20:33:04 +0200
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        Phil Regnauld <regnauld@x0.dk>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Issue with hast replication
Message-ID:  <86k42pu0tb.fsf@kopusha.home.net>
In-Reply-To: <20120312143127.GM12975@macbook.bluepipe.net> (Phil Regnauld's message of "Mon, 12 Mar 2012 15:31:27 %2B0100")
References:  <20120311185457.GB1684@macbook.bluepipe.net> <861uoyvpzh.fsf@kopusha.home.net> <20120311220911.GD1684@macbook.bluepipe.net> <20120312143127.GM12975@macbook.bluepipe.net>

next in thread | previous in thread | raw e-mail | index | archive | help

On Mon, 12 Mar 2012 15:31:27 +0100 Phil Regnauld wrote:

 PR> Phil Regnauld (regnauld) writes:
 >> 
 >> 7) ktrace on the destination dd:
 >> 
 >>     fstat(0,{ mode=p--------- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0)
 >>     lseek(0,0x0,SEEK_CUR)                            ERR#29 'Illegal seek'

 PR>     [...]

 >>     Illegal seek, eh ? Any clues ?
 >> 
 >>     The boxes are identical (HP DL380 G6), though the RAM config is different.
 >> 
 >>     Summary:
 >> 
 >>     - ssh works fine
 >>     - h1 zvol to h2 zvol over ssh fails
 >>     - h1 zvol to h2 /tmp/x over ssh is fine
 >>     - h2 /dev/zero locally to h2 zvol is fine
 >>     - h2 /tmp/x locally to h2 zvol fails at first, but works afterwards...

 PR>     A few more data points: dd from a local zvol to a local zvol on either
 PR>     machine works fine.

 PR>     Using nc instead of ssh, this time it's the sender nc dying:

 PR>     ktrace on the sender:

 PR>     47704 nc       CALL  write(0x3,0x7fffffff5450,0x800)
 PR>     47704 nc       RET   write -1 errno 32 Broken pipe
 PR>     47704 nc       PSIG  SIGPIPE SIG_DFL code=0x10006

 PR>     truss on the sender:

 PR>     poll({3/POLLIN 0/POLLIN},2,-1)                   = 2 (0x2)
 PR>     read(3,0x7fffffff5450,2048)                      ERR#54 'Connection reset by peer'
 PR>     close(3)                                         = 0 (0x0)


 PR>     On tcpdump, I do see the receiver send a FIN when using nc.
 PR>     When using ssh, the sender is sending the FIN.

 PR>     Anything else I can look for ?

It looks like in the case of hastd this was send(2) who returned ENOMEM, but
it would be good to check. Could you please start synchronization again,
ktrace primary worker process when ENOMEM errors are observed and show output
here?

If it is send(2) who fails then monitoring netstat and network driver
statistics might be helpful. Something like

netstat -nax
netstat -naT
netstat -m
netstat -nid

sysctl -a dev.<nic>

And may be

vmstat -m
vmstat -z

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86k42pu0tb.fsf>