From owner-freebsd-hackers@FreeBSD.ORG  Sun Sep 28 01:07:43 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 26239990
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 01:07:43 +0000 (UTC)
Received: from mail-ig0-f181.google.com (mail-ig0-f181.google.com
 [209.85.213.181])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E691EA21
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 01:07:42 +0000 (UTC)
Received: by mail-ig0-f181.google.com with SMTP id h18so1516243igc.8
 for <freebsd-hackers@freebsd.org>; Sat, 27 Sep 2014 18:07:36 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to
 :content-type;
 bh=dw+uLjsl8zv5Dm0fuABFKZlhGvwtipCVFUDSHnlH00I=;
 b=abx5fhT8MSMWooPZTQAYG/FeDIahOvFGkWdSL1RmNOzhchL4wZTC7lR+SuEXxyMy5I
 /iPGm0qtlvkp5CpkK9iQAa1xxwQiQ2cC4w9QZXd3tHMp7rMnDPumFhSfLBi7FqTCwdJu
 8xOI09fhwK3KhLYDV1nzm6wI/TeG/4BTDOnX3uJcm7x6u5K8HLvnuzIbybxTPukbS1mP
 ddjgLPyrcSWz7bTa/RLAKju1dU7NLDy9IIZDi774GIt94fCLSV8Pilcu0jcWKsp8pl2t
 XXXs49uTpJLPmluHjziB74Yo76ncq3ffEtIq3o2fcXqwExPtAXYSZ/8ZvFe0CbliImD1
 4dtg==
X-Gm-Message-State: ALoCoQmQLvHjmz5E+HbBv9unznq58BbPFknus8WOzAb/ls8FxyMbUmU8emARJNy/wjnpFYzqzC0g
X-Received: by 10.50.62.50 with SMTP id v18mr24503647igr.21.1411866007491;
 Sat, 27 Sep 2014 18:00:07 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.9.67 with HTTP; Sat, 27 Sep 2014 17:59:47 -0700 (PDT)
X-Originating-IP: [67.198.113.68]
From: Bryan Venteicher <bryanv@daemoninthecloset.org>
Date: Sat, 27 Sep 2014 19:59:47 -0500
Message-ID: <CAMo0n6Q=P5H3+Cqr8KjFRVLvWHZfJYnROVe4xF3DmKw95D+5zQ@mail.gmail.com>
Subject: Change uma_mtx to rwlock
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Content-Type: multipart/mixed; boundary=047d7bdc0854d973ec050415ab2b
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Sep 2014 01:07:43 -0000

--047d7bdc0854d973ec050415ab2b
Content-Type: text/plain; charset=UTF-8

Hi,

I'd appreciate some comments attached patch that changes the uma_mtx to a
rwlock.

At $JOB, we have machines with ~400GB RAM, with much of that being
allocated through UMA zones. We've observed that timeouts were sometimes
unexpectedly delayed by a half second or more. We tracked one of the
reasons for this down to when the page daemon was running, calling
uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while
zone_drain()'ing each zone. If uma_timeout() fires, it will block on the
uma_mtx when it tries to zone_timeout() each zone.

--047d7bdc0854d973ec050415ab2b
Content-Type: application/octet-stream; 
	name="0001-Make-the-UMA-lock-a-rwlock-instead-of-a-mutex.patch"
Content-Disposition: attachment; 
	filename="0001-Make-the-UMA-lock-a-rwlock-instead-of-a-mutex.patch"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_i0lofryd0

RnJvbSA5NTYwYzMwODIyOWZlODhjOTAxMTRhZmIyMDdmZDJhOTZlZDYzMjRmIE1vbiBTZXAgMTcg
MDA6MDA6MDAgMjAwMQpGcm9tOiBCcnlhbiBWZW50ZWljaGVyIDxicnlhbnZAZGFlbW9uaW50aGVj
bG9zZXQub3JnPgpEYXRlOiBUdWUsIDEgSnVsIDIwMTQgMTY6MDQ6MjMgLTA1MDAKU3ViamVjdDog
W1BBVENIXSBNYWtlIHRoZSBVTUEgbG9jayBhIHJ3bG9jayBpbnN0ZWFkIG9mIGEgbXV0ZXgKClRo
ZSB6b25lX2ZvcmVhY2goKSBjYWxsIHRoYXQgaXMgZG9uZSBpbiB1bWFfdGltZW91dCgpIG1heSBi
bG9jayBvbiB0aGUKVU1BIG11dGV4IHdoaWxlIHRoZSBlYWNoIHpvbmUgaXMgZHJhaW5lZCBpbiB1
bWFfcmVjbGFpbSgpLiBUaGlzIG1heQpzdGFsbCB0aW1lb3V0cyBmb3IgYW4gdW5hY2NlcHRhYmxl
IGFtb3VudCBvZiB0aW1lIGlmIHRoZSBkcmFpbmluZyB0YWtlcwphIGxvbmcgdGltZS4KLS0tCiBz
eXMvdm0vdW1hX2NvcmUuYyB8IDQ5ICsrKysrKysrKysrKysrKysrKysrKysrKysrKy0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0KIDEgZmlsZSBjaGFuZ2VkLCAyNyBpbnNlcnRpb25zKCspLCAyMiBkZWxl
dGlvbnMoLSkKCmRpZmYgLS1naXQgYS9zeXMvdm0vdW1hX2NvcmUuYyBiL3N5cy92bS91bWFfY29y
ZS5jCmluZGV4IDgxYjcxNGEuLjMwYmE5MGEgMTAwNjQ0Ci0tLSBhL3N5cy92bS91bWFfY29yZS5j
CisrKyBiL3N5cy92bS91bWFfY29yZS5jCkBAIC0xMzUsOCArMTM1LDggQEAgc3RhdGljIExJU1Rf
SEVBRCgsdW1hX2tlZykgdW1hX2tlZ3MgPSBMSVNUX0hFQURfSU5JVElBTElaRVIodW1hX2tlZ3Mp
Owogc3RhdGljIExJU1RfSEVBRCgsdW1hX3pvbmUpIHVtYV9jYWNoZXpvbmVzID0KICAgICBMSVNU
X0hFQURfSU5JVElBTElaRVIodW1hX2NhY2hlem9uZXMpOwogCi0vKiBUaGlzIG11dGV4IHByb3Rl
Y3RzIHRoZSBrZWcgbGlzdCAqLwotc3RhdGljIHN0cnVjdCBtdHhfcGFkYWxpZ24gdW1hX210eDsK
Ky8qIFRoaXMgUlcgbG9jayBwcm90ZWN0cyB0aGUga2VnIGxpc3QgKi8KK3N0YXRpYyBzdHJ1Y3Qg
cndsb2NrX3BhZGFsaWduIHVtYV9yd2xvY2s7CiAKIC8qIExpbmtlZCBsaXN0IG9mIGJvb3QgdGlt
ZSBwYWdlcyAqLwogc3RhdGljIExJU1RfSEVBRCgsdW1hX3NsYWIpIHVtYV9ib290X3BhZ2VzID0K
QEAgLTg4Niw2ICs4ODYsNyBAQCBmaW5pc2hlZDoKIHN0YXRpYyB2b2lkCiB6b25lX2RyYWluX3dh
aXQodW1hX3pvbmVfdCB6b25lLCBpbnQgd2FpdG9rKQogeworCWludCB3bG9jazsKIAogCS8qCiAJ
ICogU2V0IGRyYWluaW5nIHRvIGludGVybG9jayB3aXRoIHpvbmVfZHRvcigpIHNvIHdlIGNhbiBy
ZWxlYXNlIG91cgpAQCAtODk3LDE2ICs4OTgsMjAgQEAgem9uZV9kcmFpbl93YWl0KHVtYV96b25l
X3Qgem9uZSwgaW50IHdhaXRvaykKIAl3aGlsZSAoem9uZS0+dXpfZmxhZ3MgJiBVTUFfWkZMQUdf
RFJBSU5JTkcpIHsKIAkJaWYgKHdhaXRvayA9PSBNX05PV0FJVCkKIAkJCWdvdG8gb3V0OwotCQlt
dHhfdW5sb2NrKCZ1bWFfbXR4KTsKKwkJd2xvY2sgPSByd193b3duZWQoJnVtYV9yd2xvY2spOwor
CQlyd191bmxvY2soJnVtYV9yd2xvY2spOwogCQltc2xlZXAoem9uZSwgem9uZS0+dXpfbG9ja3B0
ciwgUFZNLCAiem9uZWRyYWluIiwgMSk7Ci0JCW10eF9sb2NrKCZ1bWFfbXR4KTsKKwkJaWYgKHds
b2NrICE9IDApCisJCQlyd193bG9jaygmdW1hX3J3bG9jayk7CisJCWVsc2UKKwkJCXJ3X3Jsb2Nr
KCZ1bWFfcndsb2NrKTsKIAl9CiAJem9uZS0+dXpfZmxhZ3MgfD0gVU1BX1pGTEFHX0RSQUlOSU5H
OwogCWJ1Y2tldF9jYWNoZV9kcmFpbih6b25lKTsKIAlaT05FX1VOTE9DSyh6b25lKTsKIAkvKgog
CSAqIFRoZSBEUkFJTklORyBmbGFnIHByb3RlY3RzIHVzIGZyb20gYmVpbmcgZnJlZWQgd2hpbGUK
LQkgKiB3ZSdyZSBydW5uaW5nLiAgTm9ybWFsbHkgdGhlIHVtYV9tdHggd291bGQgcHJvdGVjdCB1
cyBidXQgd2UKKwkgKiB3ZSdyZSBydW5uaW5nLiAgTm9ybWFsbHkgdGhlIHVtYV9yd2xvY2sgd291
bGQgcHJvdGVjdCB1cyBidXQgd2UKIAkgKiBtdXN0IGJlIGFibGUgdG8gcmVsZWFzZSBhbmQgYWNx
dWlyZSB0aGUgcmlnaHQgbG9jayBmb3IgZWFjaCBrZWcuCiAJICovCiAJem9uZV9mb3JlYWNoX2tl
Zyh6b25lLCAma2VnX2RyYWluKTsKQEAgLTE1NDIsOSArMTU0Nyw5IEBAIGtlZ19jdG9yKHZvaWQg
Km1lbSwgaW50IHNpemUsIHZvaWQgKnVkYXRhLCBpbnQgZmxhZ3MpCiAKIAlMSVNUX0lOU0VSVF9I
RUFEKCZrZWctPnVrX3pvbmVzLCB6b25lLCB1el9saW5rKTsKIAotCW10eF9sb2NrKCZ1bWFfbXR4
KTsKKwlyd193bG9jaygmdW1hX3J3bG9jayk7CiAJTElTVF9JTlNFUlRfSEVBRCgmdW1hX2tlZ3Ms
IGtlZywgdWtfbGluayk7Ci0JbXR4X3VubG9jaygmdW1hX210eCk7CisJcndfd3VubG9jaygmdW1h
X3J3bG9jayk7CiAJcmV0dXJuICgwKTsKIH0KIApAQCAtMTU5NCw5ICsxNTk5LDkgQEAgem9uZV9j
dG9yKHZvaWQgKm1lbSwgaW50IHNpemUsIHZvaWQgKnVkYXRhLCBpbnQgZmxhZ3MpCiAJCXpvbmUt
PnV6X3JlbGVhc2UgPSBhcmctPnJlbGVhc2U7CiAJCXpvbmUtPnV6X2FyZyA9IGFyZy0+YXJnOwog
CQl6b25lLT51el9sb2NrcHRyID0gJnpvbmUtPnV6X2xvY2s7Ci0JCW10eF9sb2NrKCZ1bWFfbXR4
KTsKKwkJcndfd2xvY2soJnVtYV9yd2xvY2spOwogCQlMSVNUX0lOU0VSVF9IRUFEKCZ1bWFfY2Fj
aGV6b25lcywgem9uZSwgdXpfbGluayk7Ci0JCW10eF91bmxvY2soJnVtYV9tdHgpOworCQlyd193
dW5sb2NrKCZ1bWFfcndsb2NrKTsKIAkJZ290byBvdXQ7CiAJfQogCkBAIC0xNjEzLDcgKzE2MTgs
NyBAQCB6b25lX2N0b3Iodm9pZCAqbWVtLCBpbnQgc2l6ZSwgdm9pZCAqdWRhdGEsIGludCBmbGFn
cykKIAkJem9uZS0+dXpfZmluaSA9IGFyZy0+ZmluaTsKIAkJem9uZS0+dXpfbG9ja3B0ciA9ICZr
ZWctPnVrX2xvY2s7CiAJCXpvbmUtPnV6X2ZsYWdzIHw9IFVNQV9aT05FX1NFQ09OREFSWTsKLQkJ
bXR4X2xvY2soJnVtYV9tdHgpOworCQlyd193bG9jaygmdW1hX3J3bG9jayk7CiAJCVpPTkVfTE9D
Syh6b25lKTsKIAkJTElTVF9GT1JFQUNIKHosICZrZWctPnVrX3pvbmVzLCB1el9saW5rKSB7CiAJ
CQlpZiAoTElTVF9ORVhUKHosIHV6X2xpbmspID09IE5VTEwpIHsKQEAgLTE2MjIsNyArMTYyNyw3
IEBAIHpvbmVfY3Rvcih2b2lkICptZW0sIGludCBzaXplLCB2b2lkICp1ZGF0YSwgaW50IGZsYWdz
KQogCQkJfQogCQl9CiAJCVpPTkVfVU5MT0NLKHpvbmUpOwotCQltdHhfdW5sb2NrKCZ1bWFfbXR4
KTsKKwkJcndfd3VubG9jaygmdW1hX3J3bG9jayk7CiAJfSBlbHNlIGlmIChrZWcgPT0gTlVMTCkg
ewogCQlpZiAoKGtlZyA9IHVtYV9rY3JlYXRlKHpvbmUsIGFyZy0+c2l6ZSwgYXJnLT51bWluaXQs
IGFyZy0+ZmluaSwKIAkJICAgIGFyZy0+YWxpZ24sIGFyZy0+ZmxhZ3MpKSA9PSBOVUxMKQpAQCAt
MTcyMCw5ICsxNzI1LDkgQEAgem9uZV9kdG9yKHZvaWQgKmFyZywgaW50IHNpemUsIHZvaWQgKnVk
YXRhKQogCWlmICghKHpvbmUtPnV6X2ZsYWdzICYgVU1BX1pGTEFHX0lOVEVSTkFMKSkKIAkJY2Fj
aGVfZHJhaW4oem9uZSk7CiAKLQltdHhfbG9jaygmdW1hX210eCk7CisJcndfd2xvY2soJnVtYV9y
d2xvY2spOwogCUxJU1RfUkVNT1ZFKHpvbmUsIHV6X2xpbmspOwotCW10eF91bmxvY2soJnVtYV9t
dHgpOworCXJ3X3d1bmxvY2soJnVtYV9yd2xvY2spOwogCS8qCiAJICogWFhYIHRoZXJlIGFyZSBz
b21lIHJhY2VzIGhlcmUgd2hlcmUKIAkgKiB0aGUgem9uZSBjYW4gYmUgZHJhaW5lZCBidXQgem9u
ZSBsb2NrCkBAIC0xNzQ0LDkgKzE3NDksOSBAQCB6b25lX2R0b3Iodm9pZCAqYXJnLCBpbnQgc2l6
ZSwgdm9pZCAqdWRhdGEpCiAJICogV2Ugb25seSBkZXN0cm95IGtlZ3MgZnJvbSBub24gc2Vjb25k
YXJ5IHpvbmVzLgogCSAqLwogCWlmIChrZWcgIT0gTlVMTCAmJiAoem9uZS0+dXpfZmxhZ3MgJiBV
TUFfWk9ORV9TRUNPTkRBUlkpID09IDApICB7Ci0JCW10eF9sb2NrKCZ1bWFfbXR4KTsKKwkJcndf
d2xvY2soJnVtYV9yd2xvY2spOwogCQlMSVNUX1JFTU9WRShrZWcsIHVrX2xpbmspOwotCQltdHhf
dW5sb2NrKCZ1bWFfbXR4KTsKKwkJcndfd3VubG9jaygmdW1hX3J3bG9jayk7CiAJCXpvbmVfZnJl
ZV9pdGVtKGtlZ3MsIGtlZywgTlVMTCwgU0tJUF9OT05FKTsKIAl9CiAJWk9ORV9MT0NLX0ZJTkko
em9uZSk7CkBAIC0xNzY4LDEyICsxNzczLDEyIEBAIHpvbmVfZm9yZWFjaCh2b2lkICgqemZ1bmMp
KHVtYV96b25lX3QpKQogCXVtYV9rZWdfdCBrZWc7CiAJdW1hX3pvbmVfdCB6b25lOwogCi0JbXR4
X2xvY2soJnVtYV9tdHgpOworCXJ3X3Jsb2NrKCZ1bWFfcndsb2NrKTsKIAlMSVNUX0ZPUkVBQ0go
a2VnLCAmdW1hX2tlZ3MsIHVrX2xpbmspIHsKIAkJTElTVF9GT1JFQUNIKHpvbmUsICZrZWctPnVr
X3pvbmVzLCB1el9saW5rKQogCQkJemZ1bmMoem9uZSk7CiAJfQotCW10eF91bmxvY2soJnVtYV9t
dHgpOworCXJ3X3J1bmxvY2soJnVtYV9yd2xvY2spOwogfQogCiAvKiBQdWJsaWMgZnVuY3Rpb25z
ICovCkBAIC0xNzg5LDcgKzE3OTQsNyBAQCB1bWFfc3RhcnR1cCh2b2lkICpib290bWVtLCBpbnQg
Ym9vdF9wYWdlcykKICNpZmRlZiBVTUFfREVCVUcKIAlwcmludGYoIkNyZWF0aW5nIHVtYSBrZWcg
aGVhZGVycyB6b25lIGFuZCBrZWcuXG4iKTsKICNlbmRpZgotCW10eF9pbml0KCZ1bWFfbXR4LCAi
VU1BIGxvY2siLCBOVUxMLCBNVFhfREVGKTsKKwlyd19pbml0KCZ1bWFfcndsb2NrLCAiVU1BIGxv
Y2siKTsKIAogCS8qICJtYW51YWxseSIgY3JlYXRlIHRoZSBpbml0aWFsIHpvbmUgKi8KIAltZW1z
ZXQoJmFyZ3MsIDAsIHNpemVvZihhcmdzKSk7CkBAIC0zMzY0LDEyICszMzY5LDEyIEBAIHN5c2N0
bF92bV96b25lX2NvdW50KFNZU0NUTF9IQU5ETEVSX0FSR1MpCiAJaW50IGNvdW50OwogCiAJY291
bnQgPSAwOwotCW10eF9sb2NrKCZ1bWFfbXR4KTsKKwlyd19ybG9jaygmdW1hX3J3bG9jayk7CiAJ
TElTVF9GT1JFQUNIKGt6LCAmdW1hX2tlZ3MsIHVrX2xpbmspIHsKIAkJTElTVF9GT1JFQUNIKHos
ICZrei0+dWtfem9uZXMsIHV6X2xpbmspCiAJCQljb3VudCsrOwogCX0KLQltdHhfdW5sb2NrKCZ1
bWFfbXR4KTsKKwlyd19ydW5sb2NrKCZ1bWFfcndsb2NrKTsKIAlyZXR1cm4gKHN5c2N0bF9oYW5k
bGVfaW50KG9pZHAsICZjb3VudCwgMCwgcmVxKSk7CiB9CiAKQEAgLTMzOTQsNyArMzM5OSw3IEBA
IHN5c2N0bF92bV96b25lX3N0YXRzKFNZU0NUTF9IQU5ETEVSX0FSR1MpCiAJc2J1Zl9uZXdfZm9y
X3N5c2N0bCgmc2J1ZiwgTlVMTCwgMTI4LCByZXEpOwogCiAJY291bnQgPSAwOwotCW10eF9sb2Nr
KCZ1bWFfbXR4KTsKKwlyd19ybG9jaygmdW1hX3J3bG9jayk7CiAJTElTVF9GT1JFQUNIKGt6LCAm
dW1hX2tlZ3MsIHVrX2xpbmspIHsKIAkJTElTVF9GT1JFQUNIKHosICZrei0+dWtfem9uZXMsIHV6
X2xpbmspCiAJCQljb3VudCsrOwpAQCAtMzQ3MCw3ICszNDc1LDcgQEAgc2tpcDoKIAkJCVpPTkVf
VU5MT0NLKHopOwogCQl9CiAJfQotCW10eF91bmxvY2soJnVtYV9tdHgpOworCXJ3X3J1bmxvY2so
JnVtYV9yd2xvY2spOwogCWVycm9yID0gc2J1Zl9maW5pc2goJnNidWYpOwogCXNidWZfZGVsZXRl
KCZzYnVmKTsKIAlyZXR1cm4gKGVycm9yKTsKLS0gCjEuOC41LjQKCg==
--047d7bdc0854d973ec050415ab2b--

From owner-freebsd-hackers@FreeBSD.ORG  Sun Sep 28 01:42:40 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 54410C83
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 01:42:40 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id 193DCD1F
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 01:42:39 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id 26A8720E7088F; Sun, 28 Sep 2014 01:42:32 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
 smtp1.multiplay.co.uk
X-Spam-Level: **
X-Spam-Status: No, score=2.2 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX,
 FSL_HELO_NON_FQDN_1,RDNS_DYNAMIC,STOX_REPLY_TYPE autolearn=no version=3.3.1
Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 42E1820E7088B;
 Sun, 28 Sep 2014 01:42:30 +0000 (UTC)
Message-ID: <CF61374E89D8461093CEDB42D4FCBABE@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Bryan Venteicher" <bryanv@daemoninthecloset.org>,
 <freebsd-hackers@freebsd.org>
References: <CAMo0n6Q=P5H3+Cqr8KjFRVLvWHZfJYnROVe4xF3DmKw95D+5zQ@mail.gmail.com>
Subject: Re: Change uma_mtx to rwlock
Date: Sun, 28 Sep 2014 02:42:25 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Sep 2014 01:42:40 -0000

Out of interest does that include ZFS and its UMA zones, as we're currently
investigating issues around this.

    Regards
    Steve

----- Original Message ----- 
From: "Bryan Venteicher" <bryanv@daemoninthecloset.org>
To: <freebsd-hackers@freebsd.org>
Sent: Sunday, September 28, 2014 1:59 AM
Subject: Change uma_mtx to rwlock


> Hi,
> 
> I'd appreciate some comments attached patch that changes the uma_mtx to a
> rwlock.
> 
> At $JOB, we have machines with ~400GB RAM, with much of that being
> allocated through UMA zones. We've observed that timeouts were sometimes
> unexpectedly delayed by a half second or more. We tracked one of the
> reasons for this down to when the page daemon was running, calling
> uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while
> zone_drain()'ing each zone. If uma_timeout() fires, it will block on the
> uma_mtx when it tries to zone_timeout() each zone.
>


--------------------------------------------------------------------------------


> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Sun Sep 28 01:56:13 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 976841BD
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 01:56:13 +0000 (UTC)
Received: from mail-ig0-f181.google.com (mail-ig0-f181.google.com
 [209.85.213.181])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5F15DE05
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 01:56:13 +0000 (UTC)
Received: by mail-ig0-f181.google.com with SMTP id h18so1549071igc.2
 for <freebsd-hackers@freebsd.org>; Sat, 27 Sep 2014 18:56:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-type;
 bh=7zJx+v75//Lz8DK3p8hbBa8Bk3TdSky9JhOlgQSrSfs=;
 b=UG4WnVhEpA99zBjxbXXiSs6zSoy+wCHgXJ9CEqaYCozAuO4tSi2aWEbmoHK1n4BUQc
 27UIJpoMM2+EL/M7ae2D386Sov60c/6/bxiPsE4obzDr7ETDph32qSnL74EST7LmQW5s
 E2osw7DUS3SWld1GZACriWsK8U4BLtbPGGT5MdkHvC4GIp7MmXQs+vAz39//C5p8Iihf
 lt6caGO7R0YV7gJK5p+1LLRgB+L4VcDhfHbhvdobYF9aTveUzLTExAg9emjt8gsluNiI
 rzy/QI/c4t44QW/g/9x1FlP8yCetgOz3RbF0DlVa9k3+tBSWb/S1NUr4BNkA1y7iPDFn
 QlXg==
X-Gm-Message-State: ALoCoQmKb1t0HjE+3UNRfI1ALmo6zn+Mbd/MmkSiC8KjUyauTm3ZVDyGvU0EtYlLCpgG0X8/xYWz
X-Received: by 10.50.20.4 with SMTP id j4mr42755112ige.13.1411869372468; Sat,
 27 Sep 2014 18:56:12 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.9.67 with HTTP; Sat, 27 Sep 2014 18:55:52 -0700 (PDT)
X-Originating-IP: [67.198.113.68]
In-Reply-To: <CF61374E89D8461093CEDB42D4FCBABE@multiplay.co.uk>
References: <CAMo0n6Q=P5H3+Cqr8KjFRVLvWHZfJYnROVe4xF3DmKw95D+5zQ@mail.gmail.com>
 <CF61374E89D8461093CEDB42D4FCBABE@multiplay.co.uk>
From: Bryan Venteicher <bryanv@daemoninthecloset.org>
Date: Sat, 27 Sep 2014 20:55:52 -0500
Message-ID: <CAMo0n6QMogoNS-X0exL-LHc_pymRGDA1kn408KF59XgJ1qRQtw@mail.gmail.com>
Subject: Re: Change uma_mtx to rwlock
To: Steven Hartland <killing@multiplay.co.uk>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Sep 2014 01:56:13 -0000

On Sat, Sep 27, 2014 at 8:42 PM, Steven Hartland <killing@multiplay.co.uk>
wrote:

> Out of interest does that include ZFS and its UMA zones, as we're current=
ly
> investigating issues around this.
>
>
=E2=80=8BYes, I believe this would include ZFS's zones too.=E2=80=8B


>    Regards
>    Steve
>
> ----- Original Message ----- From: "Bryan Venteicher" <
> bryanv@daemoninthecloset.org>
> To: <freebsd-hackers@freebsd.org>
> Sent: Sunday, September 28, 2014 1:59 AM
> Subject: Change uma_mtx to rwlock
>
>
>
>  Hi,
>>
>> I'd appreciate some comments attached patch that changes the uma_mtx to =
a
>> rwlock.
>>
>> At $JOB, we have machines with ~400GB RAM, with much of that being
>> allocated through UMA zones. We've observed that timeouts were sometimes
>> unexpectedly delayed by a half second or more. We tracked one of the
>> reasons for this down to when the page daemon was running, calling
>> uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while
>> zone_drain()'ing each zone. If uma_timeout() fires, it will block on the
>> uma_mtx when it tries to zone_timeout() each zone.
>>
>>
>
> ------------------------------------------------------------
> --------------------
>
>
>  _______________________________________________
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.or=
g
>> "
>>
>

From owner-freebsd-hackers@FreeBSD.ORG  Sun Sep 28 02:30:24 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 518F96BA
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 02:30:24 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id 1559010C
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 02:30:23 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id ECF9D20E7088F; Sun, 28 Sep 2014 02:30:22 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
 smtp1.multiplay.co.uk
X-Spam-Level: ***
X-Spam-Status: No, score=3.1 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX,
 FSL_HELO_NON_FQDN_1,RDNS_DYNAMIC,STOX_REPLY_TYPE,URIBL_BLACK autolearn=no
 version=3.3.1
Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 612FE20E7088B;
 Sun, 28 Sep 2014 02:30:21 +0000 (UTC)
Message-ID: <0EAA931C1DCF4897A45D86C7CBDCA11C@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Bryan Venteicher" <bryanv@daemoninthecloset.org>
References: <CAMo0n6Q=P5H3+Cqr8KjFRVLvWHZfJYnROVe4xF3DmKw95D+5zQ@mail.gmail.com>
 <CF61374E89D8461093CEDB42D4FCBABE@multiplay.co.uk>
 <CAMo0n6QMogoNS-X0exL-LHc_pymRGDA1kn408KF59XgJ1qRQtw@mail.gmail.com>
Subject: Re: Change uma_mtx to rwlock
Date: Sun, 28 Sep 2014 03:30:16 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Sep 2014 02:30:24 -0000

----- Original Message ----- 
From: "Bryan Venteicher" <bryanv@daemoninthecloset.org>
> On Sat, Sep 27, 2014 at 8:42 PM, Steven Hartland <killing@multiplay.co.uk>
> wrote:
> 
>> Out of interest does that include ZFS and its UMA zones, as we're currently
>> investigating issues around this.
>>
>>
> Yes, I believe this would include ZFS's zones too

It would but I was more curious as it if you had seen the delay specifically on
the ZFS zone or if it was other zones which triggered the issue for you?

    Regards
    Steve

From owner-freebsd-hackers@FreeBSD.ORG  Sun Sep 28 03:55:36 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5CBCF51C
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 03:55:36 +0000 (UTC)
Received: from mail-ig0-f173.google.com (mail-ig0-f173.google.com
 [209.85.213.173])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 27F1DB3E
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 03:55:35 +0000 (UTC)
Received: by mail-ig0-f173.google.com with SMTP id uq10so491469igb.6
 for <freebsd-hackers@freebsd.org>; Sat, 27 Sep 2014 20:55:28 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-type;
 bh=J72CyKVfY3Z4G/L31piNy4Q6ODQG790oE6CqB5/pPOY=;
 b=c75NgR9Af+tUdp0PTjAMg+brYqa6fuiopS9x45jTIgU0ziuCE/M4LLef0k8ciefK8T
 foK29ieYm2gHRoQN8UblxnuQSdFQUx9N+rkV8T1ss7MbECiHVBKF5HZuPoZR/4FEOqrT
 cff8LSkQhrE1jZvmeOcQ9f0i+qHJ9YXJkK6TvhYw31ZTMvHAz0cjC0KZi5FkanIMjEzN
 i4vKD+Zlp9VurC3XwuhLl8/grgEXv4QfxvZ2xRfmo2H7BHNElN6MVUt2kDIoFM9xjxGb
 tGz01PmHnXy5wgKpxR7FV6o4lByCIJItZWLHiMnRCtLxUdPtfUhhnD7WJmj5Pp0dU6gn
 dxjg==
X-Gm-Message-State: ALoCoQmtzNx6Azjdfyj/QfxE0WDj+B1vgpP1Gi+GMn3yjKJYxM2Q5C74thF2NXbiFYiMPndAIL5x
X-Received: by 10.42.180.5 with SMTP id bs5mr287002icb.70.1411874716370; Sat,
 27 Sep 2014 20:25:16 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.9.67 with HTTP; Sat, 27 Sep 2014 20:24:56 -0700 (PDT)
X-Originating-IP: [67.198.113.68]
In-Reply-To: <0EAA931C1DCF4897A45D86C7CBDCA11C@multiplay.co.uk>
References: <CAMo0n6Q=P5H3+Cqr8KjFRVLvWHZfJYnROVe4xF3DmKw95D+5zQ@mail.gmail.com>
 <CF61374E89D8461093CEDB42D4FCBABE@multiplay.co.uk>
 <CAMo0n6QMogoNS-X0exL-LHc_pymRGDA1kn408KF59XgJ1qRQtw@mail.gmail.com>
 <0EAA931C1DCF4897A45D86C7CBDCA11C@multiplay.co.uk>
From: Bryan Venteicher <bryanv@daemoninthecloset.org>
Date: Sat, 27 Sep 2014 22:24:56 -0500
Message-ID: <CAMo0n6Sfqxh0vx8ZfwzHLqiwdvR7okrQ9ogK6PzvQ0aaq6iD6Q@mail.gmail.com>
Subject: Re: Change uma_mtx to rwlock
To: Steven Hartland <killing@multiplay.co.uk>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Sep 2014 03:55:36 -0000

On Sat, Sep 27, 2014 at 9:30 PM, Steven Hartland <killing@multiplay.co.uk>
wrote:

> ----- Original Message ----- From: "Bryan Venteicher" <
> bryanv@daemoninthecloset.org>
>
>> On Sat, Sep 27, 2014 at 8:42 PM, Steven Hartland <killing@multiplay.co.u=
k
>> >
>> wrote:
>>
>>  Out of interest does that include ZFS and its UMA zones, as we're
>>> currently
>>> investigating issues around this.
>>>
>>>
>>>  Yes, I believe this would include ZFS's zones too
>>
>
> It would but I was more curious as it if you had seen the delay
> specifically on
> the ZFS zone or if it was other zones which triggered the issue for you?
>
>
=E2=80=8BWe are using an old version of FreeBSD/ZFS, so our ZFS allocations=
 go
through malloc instead of directly to UMA.

For us, it was the cumulative effect of the number of UMA zones (buckets
really) =E2=80=8Bthat lead to long hold times of the UMA mutex. ZFS related
allocations were a significant part of that, but not typically the largest.


>    Regards
>    Steve
>

From owner-freebsd-hackers@FreeBSD.ORG  Sun Sep 28 14:04:20 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8D88D24C
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 14:04:20 +0000 (UTC)
Received: from astart2.astart.com
 (108-248-95-193.lightspeed.sndgca.sbcglobal.net [108.248.95.193])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5805965E
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 14:04:19 +0000 (UTC)
Received: from laptop_93.private (localhost [127.0.0.1])
 by astart2.astart.com (8.14.4/8.14.4) with ESMTP id s8SE4CYJ067023
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 07:04:12 -0700 (PDT)
 (envelope-from papowell@astart.com)
Message-ID: <5428155C.5000404@astart.com>
Date: Sun, 28 Sep 2014 07:04:12 -0700
From: Patrick Powell <papowell@astart.com>
Reply-To: papowell@astart.com
Organization: Astart Technologies
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.0
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Subject: Re: Inproper ada# assignment in 10-BETA2
References: <1411851225.9364.YahooMailNeo@web180902.mail.ne1.yahoo.com>
 <CAOgwaMu71N0697+DUOJC7cy-Z3XenEGxwKLY+-q_LoMZLgPY6w@mail.gmail.com>
In-Reply-To: <CAOgwaMu71N0697+DUOJC7cy-Z3XenEGxwKLY+-q_LoMZLgPY6w@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Sep 2014 14:04:20 -0000

On 09/27/14 15:15, Mehmet Erol Sanliturk wrote:
> On Sat, Sep 27, 2014 at 1:53 PM, Jin Guojun <jguojun@sbcglobal.net> wrote:
>
>> Installed 10-BETA2 on SATA port 4 (ad8) and then added another SATA port 3
>> (ad6), the system has not correctly enumerate the ada # for the boot device.
>> As original boot (without the second SATA drive), the ad8 is enumerated as
>> ada0 -- the boot drive:
>>
>> Sep 24 22:51:30 R10-B2 kernel: ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
>> Sep 24 22:51:30 R10-B2 kernel: ada0: <Hitachi HDP725050GLA360 GM4OA50E>
>> ATA-8 SATA 2.x device
>> ...
>> Sep 24 22:51:30 R10-B2 kernel: ada0: Previously was known as ad8
>>
>>
>> However, after added another SATA drive (ad6), this new drive is assigned
>> to ada0, but ad8 has changed to ada1. This is incorrect dynamic device
>> assignment. FreeBSD has kept using fixed disk ID assignment due to the same
>> problem introduced in around 4-R (or may be slightly later), and after a
>> simple debate, a decision was made to use fixed drive ID to avoid such
>> hassle.
>>
>> If now we want to use dynamic enumeration for drive ID# assignment, this
>> has to be done correctly -- boot drive MUST assigned to 0 or whatever the #
>> as installation assigned to; otherwise, adding a new drive will cause
>> system not bootable, or make other existing drive not mountable due to
>> enumeration # changes.
>>
>> Has this been reported as a known problem for 10-R, or shall I open a bug
>> to track?
>>
>> -Jin
>>
>
>
>
> One point should be checked :
>
> On mainboards SATA ports are numbered from 0  or 1 to  upward .
> BIOS always uses first SATA drive for boot .  This is NOT related to the
> operating system .
> Therefore , it is necessary to check port numbers of existing drives and
> the bootable SATA drive should be connected
> to the smallest numbered SATA port among existent drives .
>
>
> For example , assume bootable drive is connected to SATA port 2 .
> New drive should be connected to a higher numbered SATA port .
> If there are only two SATA ports , then bootable drive should be connected
> to the first SATA port .
>
> If mainboard BIOS allows definition of any SATA port for boot , and
> bootable SATA port and drive is specified in there , again it may boot from
> that drive . Up to now , I did not see any BIOS which supplies such an
> ordering among SATA ports . Please check your BIOS for such a feature . If
> it is present you may use it , otherwise it is necessary to reconnect SATA
> cables .
>
>
> Thank you very much .
>
> Mehmet Erol Sanliturk
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>
Try the Dell Precision M6500 Laptop which has three SATA ports (two 
internal, one external) and you can via the BIOS select the boot 
drive.   It appears that for FreeBSD 9.3 the drives are all enumerated 
the same,  independent of which is the current boot drive.  Interesting...

From owner-freebsd-hackers@FreeBSD.ORG  Sun Sep 28 16:58:55 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3E8B33AA;
 Sun, 28 Sep 2014 16:58:55 +0000 (UTC)
Received: from mail-ob0-x236.google.com (mail-ob0-x236.google.com
 [IPv6:2607:f8b0:4003:c01::236])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id EE8EDBD6;
 Sun, 28 Sep 2014 16:58:54 +0000 (UTC)
Received: by mail-ob0-f182.google.com with SMTP id wo20so12040901obc.41
 for <multiple recipients>; Sun, 28 Sep 2014 09:58:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=kcrj/Xy9yspD6DDojEZ2azOdTrCs/izFtpixSBvH/kI=;
 b=c/abJIpUpLQrxBNe69qLgCkhuR/2jSLb9CD2vSRPaaNDryN+otRq3qVChKrZRzHATS
 qn1WT9FUmowcOlHaWXZ23zTzOVYRhbw+XFRaEf6y1UgTDjb222NEdSFfOknXWytFYMLp
 XaY53kL2Q0+rrCLwsk7atfOiL+iP3aCK/e0rEG74GWJGL3LY99sEQS4+bhEtUOQ1PKHa
 Rjd5Y0zzM+I8jeSPbtOF6th6o9AXWGNv/RVMgN6QgHSkgrI+Y0tb3HNq1USgfO3CDcbL
 7YWoswNqTJqcmSiiXROaiPFrGvt46Q8sYyEulqjobiqUv7kMApvYusRs93TpUNXa0cX2
 ALoQ==
MIME-Version: 1.0
X-Received: by 10.182.191.39 with SMTP id gv7mr34383538obc.14.1411923534203;
 Sun, 28 Sep 2014 09:58:54 -0700 (PDT)
Received: by 10.202.188.84 with HTTP; Sun, 28 Sep 2014 09:58:54 -0700 (PDT)
In-Reply-To: <CALCNsJRSEWF2q=YaEc=f-BvwaANHV7F7L2WsMWNaHV=33KY_Qg@mail.gmail.com>
References: <CALCNsJRkCEtRvJL1MMNpmeizjgqmkFCFQTvpnTLXRBxOBQHyJA@mail.gmail.com>
 <5408938E.5020005@yandex.ru>
 <CALCpEUH=HgTHojEfFT+iAgN5oN+517wKdVdT+jmKc29wNM6Jcw@mail.gmail.com>
 <CAJP=Hc9NF3p-M_sOts2rkczsoZW8D_gsQDJQNSGGyDLym0rOxg@mail.gmail.com>
 <CALCNsJRSEWF2q=YaEc=f-BvwaANHV7F7L2WsMWNaHV=33KY_Qg@mail.gmail.com>
Date: Sun, 28 Sep 2014 09:58:54 -0700
Message-ID: <CAJP=Hc9UtaNZKjiV_dOCA6Pp-xU6omuGSvHJUZNG0M=bO4a6RQ@mail.gmail.com>
Subject: Re: IOAT driver for FreeBSD
From: Jim Harris <jim.harris@gmail.com>
To: Vijay Singh <vijju.singh@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: "freebsd-hackers@freebsd.org" <hackers@freebsd.org>,
 "Andrey V. Elsukov" <bu7cher@yandex.ru>, hiren panchasara <hiren@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Sep 2014 16:58:55 -0000

On Fri, Sep 26, 2014 at 10:10 AM, Vijay Singh <vijju.singh@gmail.com> wrote:

> Jim, since the device IDs were changed, were there any changes to the
> descriptors for the DMA part?
>
>
Hi Vijay,

No changes.  The descriptor formats are the same.

-Jim


> =vijay
>
> On Thu, Sep 25, 2014 at 4:41 PM, Jim Harris <jim.harris@gmail.com> wrote:
>
>>
>>
>> On Tue, Sep 23, 2014 at 5:38 PM, hiren panchasara <hiren@freebsd.org>
>> wrote:
>>
>>> + Jim
>>>
>>> On Thu, Sep 4, 2014 at 9:30 AM, Andrey V. Elsukov <bu7cher@yandex.ru>
>>> wrote:
>>> > On 03.09.2014 20:59, Vijay Singh wrote:
>>> >> Hi All, I found some discussion in the past about this. Is there a
>>> version
>>> >> of such a driver that I can test, and hopefully help get committed?
>>> >
>>> > There was some work in
>>> >
>>> http://svnweb.freebsd.org/base/user/jimharris/ioat/sys/dev/ioat/
>>>
>>> Hi Jim,
>>>
>>> Whats the status of this user branch?
>>>
>>> cheers,
>>> Hiren
>>>
>>
>> This user branch is a couple of years old, but should not be too
>> difficult to bring forward to HEAD.  It only includes E5 v1 (Sandy Bridge
>> Xeon) device IDs so would need to be updated to include E5 v2 (Ivy Bridge)
>> and v3 (Haswell) device IDs.
>>
>> Note this driver only does DMA operations currently and is not plumbed
>> for other opcodes (XOR/P+Q, CRC, etc.)  But the general framework is there
>> to add code for the other opcodes.
>>
>> E5 v2 and v3 device IDs are pasted below.
>>
>> -Jim
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB0 0x0e20
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB1 0x0e21
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB2 0x0e22
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB3 0x0e23
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB4 0x0e24
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB5 0x0e25
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB6 0x0e26
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB7 0x0e27
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB8 0x0e2e
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_IVB9 0x0e2f
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW0 0x2f20
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW1 0x2f21
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW2 0x2f22
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW3 0x2f23
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW4 0x2f24
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW5 0x2f25
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW6 0x2f26
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW7 0x2f27
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW8 0x2f2e
>>
>> #define PCI_DEVICE_ID_INTEL_IOAT_HSW9 0x2f2f
>>
>>
>

From owner-freebsd-hackers@FreeBSD.ORG  Sun Sep 28 18:26:54 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EE8B31E1
 for <hackers@freebsd.org>; Sun, 28 Sep 2014 18:26:53 +0000 (UTC)
Received: from nm27-vm5.access.bullet.mail.bf1.yahoo.com
 (nm27-vm5.access.bullet.mail.bf1.yahoo.com [216.109.115.228])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9B72A63D
 for <hackers@freebsd.org>; Sun, 28 Sep 2014 18:26:53 +0000 (UTC)
Received: from [66.196.81.163] by nm27.access.bullet.mail.bf1.yahoo.com with
 NNFMP; 28 Sep 2014 18:24:23 -0000
Received: from [66.196.81.140] by tm9.access.bullet.mail.bf1.yahoo.com with
 NNFMP; 28 Sep 2014 18:24:23 -0000
Received: from [127.0.0.1] by omp1016.access.mail.bf1.yahoo.com with NNFMP;
 28 Sep 2014 18:24:23 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 641230.67594.bm@omp1016.access.mail.bf1.yahoo.com
Received: (qmail 94540 invoked by uid 60001); 28 Sep 2014 18:24:23 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sbcglobal.net; s=s1024;
 t=1411928663; bh=q6rdIwwkZdgvEJSJFvrJ6+c2BjabNPeC37i2lrmLDaE=;
 h=References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type;
 b=MiDOH3Ky2lbiqCctYGARtf2oNdGuAkttDkRp3JTa7tWCexjHig3R0gE+BLH4UsGQ3Mdqs9LH6+tyLIgJVOGwZgwwsBgvuOGy26pdy8amPyamayPVPtTL3O5Vxf1f5VQPvpTLET1znSoXSuXyqjY40U4Ql1Io/DejIPigYVaUGvM=
X-YMail-OSG: QkPYOewVM1nKrIQ4eIb3hMiSYzqrf9k0ROXFlkCvsTZT9lP
 lKYhko2TuTDIhGwoq3FOnsOdHV8ZVg7boX2nRi7eEerUMjM9bIkOiSZYff6O
 lVNQAu2MALUTzoN8gbMR529KK9oyfU9lWnFLkxkxrIzuusEIv7X5N8Xc5Vmk
 iSs1ku9MJORECCE2tC1yDOLLYA9sVlU1CvrvOKUtp.3uzpTmBccug6I8g17j
 Isc.metmw9yPhHcA5Ks8jXIO1rB3903UmRMl.pQz0h_XwBwvtxYsJHDuxJ5G
 awznfZEexQfx3Bmoa3CxM_eTmnNq6fAotqJYLimoI75490ZDomS6v_cM5cyn
 5E3FZ_qZZmedJdIdAl67ZA26t3VQkZbU1CZWDseJdXmE3RI8CJBgUTdz821B
 58PcO5GJa0sUqRGl8njR2IvK5ZkaJuf_jnoHpXS3cOGIWygtO_J32woRDXdN
 ogrfD_5RS5Ibz.sxupQDcT1uN1HnQq0blGk2nlEelUXUOeZjaftviMhy28zw
 Tqa3MPPxe1XQV5TzlhTmYX3n.GU.OghzchOFEZuCHLJEn8YWKthOxmOR1N7L
 8kJvxhQbb9A--
Received: from [162.239.0.170] by web180901.mail.ne1.yahoo.com via HTTP;
 Sun, 28 Sep 2014 11:24:22 PDT
X-Rocket-MIMEInfo: 002.001,
 Tm8sIEJJT1MgYm9vdCBpcyBjb21wbGV0ZWx5IGlycmVsZXZhbnQgdG8gdGhpcyBwcm9ibGVtLiAKCkl0IGxvb2tzIGxpa2UgSSBoYXZlIHRvIHJlLWFkZHJlc3MgdGhpcyBpc3N1ZSBkZWJhdGVkIDE1LTIwIHllYXJzIGFnby4KCkZpcnN0IG9mIGFsbCwgbGV0J3MgY2xlYXIgQklPUyBxdWVzdGlvbiBzbyB5b3Ugd2lsbCBub3QgYXJndWUgaXQgYWdhaW4uCkJJT1MgYm9vdCBzZXF1ZW5jZSBjYW4gYmUgc3BlY2lmaWVkIGluIGFueSBvcmRlciwgYW5kIGl0IGJvb3RzIGRlc2lyZWQgIGRyaXZlIGNvcnJlY3RseS4BMAEBAQE-
X-Mailer: YahooMailWebService/0.8.203.696
References: <1411851225.9364.YahooMailNeo@web180902.mail.ne1.yahoo.com>
 <CAOgwaMu71N0697+DUOJC7cy-Z3XenEGxwKLY+-q_LoMZLgPY6w@mail.gmail.com>
Message-ID: <1411928662.22540.YahooMailNeo@web180901.mail.ne1.yahoo.com>
Date: Sun, 28 Sep 2014 11:24:22 -0700
From: Jin Guojun <jguojun@sbcglobal.net>
Reply-To: Jin Guojun <jguojun@sbcglobal.net>
Subject: Re: Inproper ada# assignment in 10-BETA2
To: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
In-Reply-To: <CAOgwaMu71N0697+DUOJC7cy-Z3XenEGxwKLY+-q_LoMZLgPY6w@mail.gmail.com>
MIME-Version: 1.0
X-Mailman-Approved-At: Sun, 28 Sep 2014 18:53:10 +0000
Content-Type: text/plain; charset=us-ascii
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: "hackers@freebsd.org" <hackers@freebsd.org>,
 questions freebsd <questions@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Sep 2014 18:26:54 -0000

No, BIOS boot is completely irrelevant to this problem. 

It looks like I have to re-address this issue debated 15-20 years ago.

First of all, let's clear BIOS question so you will not argue it again.
BIOS boot sequence can be specified in any order, and it boots desired  drive correctly. 

If not, 10.1-BETA2 will not be loaded, and we will not see this problem, period.

After 10.1-BETA2 is loaded and booted, it enumerates drive(s) dynamically to assign ID to ada or da node. 

Kernel clearly knows which drive is ad1, 2, 3, ..., but it does not assigned proper ID to existing drive(s) for ada and da nodes. 

That is, ad note IDs are still correct, but ada and da node IDs are wrong.

The dynamic enumeration is likely used for moving a boot drive from on system to another
or from one bus to another without manually modifying fstab entries.
That is, this mechanism wants to ensure no matter where this drive is plugged in, 

Boot drive should be always enumerated as ID 0 or the ID installation assigned to.

How to ensure this? For boot drive, this is relatively easy -- the boot drive is always this first one in general, 

so this drive should always enumerated as ada0 or da0.
If installation has assigned drive ID to not 0 somehow, then generic enumeration apply.
Generic enumeration is drive serial number (S#) based enumeration mechanism, which has been used for at least two decades. 
For example, if two drives installed and their S# are AAAA and XXXXX (boot drive),
regardless what SATA port they resides at -- AAAA at ad9 and XXXXX at ad5,
We knew installation will likely name drive XXXXX as ada0 and AAAA as ada1. 
In fixed fashion, drive XXXXX is ad5 and AAAA is ad9, when a new drive is inserted as ad0, 

we knew drive XXXXX will be still ad5 and boot should not fail.
But in current 10.1-BETA2, the new drive is likely will be ada0, drive XXXXX will be ada1, and AAAA will be ada2, then boot will fail.
In case if new drive is inserted as ad8, drive XXXXX will remain as ada0 but AAAA will be ada21.
Even though boot will succeed in this case, but mounting drive AAAA will fail.


The S#-based enumeration will record the S# for corresponding device ID in a dynamic boot configuration file, which is used in boot time to determine what device ID should be assigned to each drive.
After existing drive ID has been enumerated, any new drive(s) will be given a unused ID sequentially.
This ensures that existing drive(s) will always get device ID originally assigned to, so the disk mounting operation will never fail no matter where a disk drive (has FreeBSD already installed) is plugged in.


Hopefully, this explains what is correct the dynamic enumeration operation.
In old time, we have a mechanisms to alter the dynamic enumeration to fixed one, but I do not know if this mechanism is still in 10.x-R.

Because it looks like that current developers have no knowledge about this concept,
I am going to open a bug to track this problem again soon, unless I will hear if we have a work around for this problem.


On Saturday, September 27, 2014 3:15 PM, Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com> wrote:
 

On Sat, Sep 27, 2014 at 1:53 PM, Jin Guojun <jguojun@sbcglobal.net> wrote:

Installed 10-BETA2 on SATA port 4 (ad8) and then added another SATA port 3 (ad6), the system has not correctly enumerate the ada # for the boot device.
>As original boot (without the second SATA drive), the ad8 is enumerated as ada0 -- the boot drive:
>
>Sep 24 22:51:30 R10-B2 kernel: ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
>Sep 24 22:51:30 R10-B2 kernel: ada0: <Hitachi HDP725050GLA360 GM4OA50E> ATA-8 SATA 2.x device
>...
>Sep 24 22:51:30 R10-B2 kernel: ada0: Previously was known as ad8
>
>
>However, after added another SATA drive (ad6), this new drive is assigned to ada0, but ad8 has changed to ada1. This is incorrect dynamic device assignment. FreeBSD has kept using fixed disk ID assignment due to the same problem introduced in around 4-R (or may be slightly later), and after a simple debate, a decision was made to use fixed drive ID to avoid such hassle.
>
>If now we want to use dynamic enumeration for drive ID# assignment, this has to be done correctly -- boot drive MUST assigned to 0 or whatever the # as installation assigned to; otherwise, adding a new drive will cause system not bootable, or make other existing drive not mountable due to enumeration # changes.
>
>Has this been reported as a known problem for 10-R, or shall I open a bug to track?
>
>-Jin
>


One point should be checked :


On mainboards SATA ports are numbered from 0  or 1 to  upward .

BIOS always uses first SATA drive for boot .  This is NOT related to the operating system .

Therefore , it is necessary to check port numbers of existing drives and the bootable SATA drive should be connected

to the smallest numbered SATA port among existent drives .


For example , assume bootable drive is connected to SATA port 2 . 

New drive should be connected to a higher numbered SATA port .

If there are only two SATA ports , then bootable drive should be connected to the first SATA port .


If mainboard BIOS allows definition of any SATA port for boot , and bootable SATA port and drive is specified in there , again it may boot from that drive . Up to now , I did not see any BIOS which supplies such an ordering among SATA ports . Please check your BIOS for such a feature . If it is present you may use it , otherwise it is necessary to reconnect SATA cables .


Thank you very much .


Mehmet Erol Sanliturk
From owner-freebsd-hackers@FreeBSD.ORG  Sun Sep 28 21:05:35 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 49B97A0B
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 21:05:35 +0000 (UTC)
Received: from mx1.scaleengine.net (beauharnois2.bhs1.scaleengine.net
 [142.4.218.15]) by mx1.freebsd.org (Postfix) with ESMTP id 244A780F
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 21:05:34 +0000 (UTC)
Received: from [172.16.0.55] (unknown [92.247.20.226])
 (Authenticated sender: allanjude.freebsd@scaleengine.com)
 by mx1.scaleengine.net (Postfix) with ESMTPSA id 1DBB3545ED
 for <freebsd-hackers@freebsd.org>; Sun, 28 Sep 2014 21:05:26 +0000 (UTC)
Message-ID: <54287814.9000207@freebsd.org>
Date: Sun, 28 Sep 2014 17:05:24 -0400
From: Allan Jude <allanjude@freebsd.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.0
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Subject: Re: Inproper ada# assignment in 10-BETA2
References: <1411851225.9364.YahooMailNeo@web180902.mail.ne1.yahoo.com>
 <CAOgwaMu71N0697+DUOJC7cy-Z3XenEGxwKLY+-q_LoMZLgPY6w@mail.gmail.com>
 <1411928662.22540.YahooMailNeo@web180901.mail.ne1.yahoo.com>
In-Reply-To: <1411928662.22540.YahooMailNeo@web180901.mail.ne1.yahoo.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Sep 2014 21:05:35 -0000

On 09/28/2014 14:24, Jin Guojun wrote:
> No, BIOS boot is completely irrelevant to this problem. 
> 
> It looks like I have to re-address this issue debated 15-20 years ago.
> 
> First of all, let's clear BIOS question so you will not argue it again.
> BIOS boot sequence can be specified in any order, and it boots desired  drive correctly. 
> 
> If not, 10.1-BETA2 will not be loaded, and we will not see this problem, period.
> 
> After 10.1-BETA2 is loaded and booted, it enumerates drive(s) dynamically to assign ID to ada or da node. 
> 
> Kernel clearly knows which drive is ad1, 2, 3, ..., but it does not assigned proper ID to existing drive(s) for ada and da nodes. 
> 
> That is, ad note IDs are still correct, but ada and da node IDs are wrong.
> 
> The dynamic enumeration is likely used for moving a boot drive from on system to another
> or from one bus to another without manually modifying fstab entries.
> That is, this mechanism wants to ensure no matter where this drive is plugged in, 
> 
> Boot drive should be always enumerated as ID 0 or the ID installation assigned to.
> 
> How to ensure this? For boot drive, this is relatively easy -- the boot drive is always this first one in general, 
> 
> so this drive should always enumerated as ada0 or da0.
> If installation has assigned drive ID to not 0 somehow, then generic enumeration apply.
> Generic enumeration is drive serial number (S#) based enumeration mechanism, which has been used for at least two decades. 
> For example, if two drives installed and their S# are AAAA and XXXXX (boot drive),
> regardless what SATA port they resides at -- AAAA at ad9 and XXXXX at ad5,
> We knew installation will likely name drive XXXXX as ada0 and AAAA as ada1. 
> In fixed fashion, drive XXXXX is ad5 and AAAA is ad9, when a new drive is inserted as ad0, 
> 
> we knew drive XXXXX will be still ad5 and boot should not fail.
> But in current 10.1-BETA2, the new drive is likely will be ada0, drive XXXXX will be ada1, and AAAA will be ada2, then boot will fail.
> In case if new drive is inserted as ad8, drive XXXXX will remain as ada0 but AAAA will be ada21.
> Even though boot will succeed in this case, but mounting drive AAAA will fail.
> 
> 
> The S#-based enumeration will record the S# for corresponding device ID in a dynamic boot configuration file, which is used in boot time to determine what device ID should be assigned to each drive.
> After existing drive ID has been enumerated, any new drive(s) will be given a unused ID sequentially.
> This ensures that existing drive(s) will always get device ID originally assigned to, so the disk mounting operation will never fail no matter where a disk drive (has FreeBSD already installed) is plugged in.
> 
> 
> Hopefully, this explains what is correct the dynamic enumeration operation.
> In old time, we have a mechanisms to alter the dynamic enumeration to fixed one, but I do not know if this mechanism is still in 10.x-R.
> 
> Because it looks like that current developers have no knowledge about this concept,
> I am going to open a bug to track this problem again soon, unless I will hear if we have a work around for this problem.
> 
> 
> 
> On Saturday, September 27, 2014 3:15 PM, Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com> wrote:
>  
> 
> 
> 
> 
> 
> 
> On Sat, Sep 27, 2014 at 1:53 PM, Jin Guojun <jguojun@sbcglobal.net> wrote:
> 
> Installed 10-BETA2 on SATA port 4 (ad8) and then added another SATA port 3 (ad6), the system has not correctly enumerate the ada # for the boot device.
>> As original boot (without the second SATA drive), the ad8 is enumerated as ada0 -- the boot drive:
>>
>> Sep 24 22:51:30 R10-B2 kernel: ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
>> Sep 24 22:51:30 R10-B2 kernel: ada0: <Hitachi HDP725050GLA360 GM4OA50E> ATA-8 SATA 2.x device
>> ...
>> Sep 24 22:51:30 R10-B2 kernel: ada0: Previously was known as ad8
>>
>>
>> However, after added another SATA drive (ad6), this new drive is assigned to ada0, but ad8 has changed to ada1. This is incorrect dynamic device assignment. FreeBSD has kept using fixed disk ID assignment due to the same problem introduced in around 4-R (or may be slightly later), and after a simple debate, a decision was made to use fixed drive ID to avoid such hassle.
>>
>> If now we want to use dynamic enumeration for drive ID# assignment, this has to be done correctly -- boot drive MUST assigned to 0 or whatever the # as installation assigned to; otherwise, adding a new drive will cause system not bootable, or make other existing drive not mountable due to enumeration # changes.
>>
>> Has this been reported as a known problem for 10-R, or shall I open a bug to track?
>>
>> -Jin
>>
> 
> 
> 
> 
> One point should be checked :
> 
> 
> On mainboards SATA ports are numbered from 0  or 1 to  upward .
> 
> BIOS always uses first SATA drive for boot .  This is NOT related to the operating system .
> 
> Therefore , it is necessary to check port numbers of existing drives and the bootable SATA drive should be connected
> 
> to the smallest numbered SATA port among existent drives .
> 
> 
> 
> For example , assume bootable drive is connected to SATA port 2 . 
> 
> New drive should be connected to a higher numbered SATA port .
> 
> If there are only two SATA ports , then bootable drive should be connected to the first SATA port .
> 
> 
> If mainboard BIOS allows definition of any SATA port for boot , and bootable SATA port and drive is specified in there , again it may boot from that drive . Up to now , I did not see any BIOS which supplies such an ordering among SATA ports . Please check your BIOS for such a feature . If it is present you may use it , otherwise it is necessary to reconnect SATA cables .
> 
> 
> 
> Thank you very much .
> 
> 
> Mehmet Erol Sanliturk
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
> 

The Correct solution is probably to use the diskid (/dev/diskid/<blah>)
or a label (gpt label, ufs label, glabel) so the device name doesn't
matter so much.

My understanding is that the new 'ada' device names, the devices are
named linearly.

-- 
Allan Jude

From owner-freebsd-hackers@FreeBSD.ORG  Mon Sep 29 15:27:55 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 10684EB8;
 Mon, 29 Sep 2014 15:27:55 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id DE899AA3;
 Mon, 29 Sep 2014 15:27:54 +0000 (UTC)
Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net
 [173.70.85.31])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 86F91B921;
 Mon, 29 Sep 2014 11:27:53 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hackers@freebsd.org
Subject: Re: Change uma_mtx to rwlock
Date: Mon, 29 Sep 2014 11:27:16 -0400
Message-ID: <1458140.gGPpU3NGiG@ralph.baldwin.cx>
User-Agent: KMail/4.12.5 (FreeBSD/10.1-BETA2; KDE/4.12.5; amd64; ; )
In-Reply-To: <CAMo0n6Q=P5H3+Cqr8KjFRVLvWHZfJYnROVe4xF3DmKw95D+5zQ@mail.gmail.com>
References: <CAMo0n6Q=P5H3+Cqr8KjFRVLvWHZfJYnROVe4xF3DmKw95D+5zQ@mail.gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 29 Sep 2014 11:27:53 -0400 (EDT)
Cc: jeff@freebsd.org, Bryan Venteicher <bryanv@daemoninthecloset.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Sep 2014 15:27:55 -0000

On Saturday, September 27, 2014 07:59:47 PM Bryan Venteicher wrote:
> Hi,
> 
> I'd appreciate some comments attached patch that changes the uma_mtx to a
> rwlock.
> 
> At $JOB, we have machines with ~400GB RAM, with much of that being
> allocated through UMA zones. We've observed that timeouts were sometimes
> unexpectedly delayed by a half second or more. We tracked one of the
> reasons for this down to when the page daemon was running, calling
> uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while
> zone_drain()'ing each zone. If uma_timeout() fires, it will block on the
> uma_mtx when it tries to zone_timeout() each zone.

The only nit I see is in zone_drain_wait().  It would be nice to not need the 
hack of checking for a read or write lock and just require the one it actually 
needs depending on the callers.

However, checking the code in HEAD, this appears to just be broken.  
Specifically, zone_drain_wait() is called in two places:

void
zone_drain(uma_zone_t zone)
{

	zone_drain_wait(zone, M_NOWAIT);
}

...


static void
zone_dtor(void *arg, int size, void *udata)
{
	...
	mtx_lock(&uma_mtx);
	LIST_REMOVE(zone, uz_link);
	mtx_unlock(&uma_mtx);
	/*
	 * XXX there are some races here where
	 * the zone can be drained but zone lock
	 * released and then refilled before we
	 * remove it... we dont care for now
	 */
	zone_drain_wait(zone, M_WAITOK);
	...
}

Neither one calls it with the uma_mtx locked!  This appears to have been 
broken since that function was introduced in r187681.

I think it might be best to first remove the unlock/lock of uma_mtx from 
zone_drain_wait() (so it can be MFC'd).  That then simplifies that one part of 
your patch (which I think is otherwise fine).

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Mon Sep 29 16:02:46 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BE03BD8E
 for <freebsd-hackers@freebsd.org>; Mon, 29 Sep 2014 16:02:46 +0000 (UTC)
Received: from mail-ie0-f175.google.com (mail-ie0-f175.google.com
 [209.85.223.175])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 89DE9E85
 for <freebsd-hackers@freebsd.org>; Mon, 29 Sep 2014 16:02:46 +0000 (UTC)
Received: by mail-ie0-f175.google.com with SMTP id y20so5245960ier.6
 for <freebsd-hackers@freebsd.org>; Mon, 29 Sep 2014 09:02:40 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-type;
 bh=yvjtphGDvZfVVOsmICWNIaPe3hXvF3kMutmC97jup6A=;
 b=dOgTcHDJVhWh+Gm459tDUhA93CixaMuISHaveaTcPilCQFZn1dL0vCtCAGrPg6yz3c
 sqHYdZKTYM0mdqbf0w1PrJ5HPB+vgJytzsFVxNDRRWwALkyohZ+RQhlB/MihDqTpiOGm
 HakS0B/cUckVVouE8UDFfITYzKfYjq+oCF+H6LD6ngcM2L9/FRXRfhksG8C3bP63P9Qf
 pGV6IHG8g6uCEx1ocp0d06VlW7sKXtK/gPMjInux4QjdSmNwjIuVjt9x7vpFdOKja6QZ
 qCdX834H1sCXQe+n/ucfaryyck9vjdbKT8A3+CThvPI3wz6H9c3Pf7hyE+Cn/K64El5f
 z5wA==
X-Gm-Message-State: ALoCoQlho7pTdnd49PbMcKmG0cA5hrBvx108aToyl2cjvBtGTjwyVlSw+nTAtBfmmGrxdEw6euMU
X-Received: by 10.50.109.228 with SMTP id hv4mr36301701igb.13.1412006560276;
 Mon, 29 Sep 2014 09:02:40 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.9.67 with HTTP; Mon, 29 Sep 2014 09:02:20 -0700 (PDT)
X-Originating-IP: [216.240.30.23]
In-Reply-To: <1458140.gGPpU3NGiG@ralph.baldwin.cx>
References: <CAMo0n6Q=P5H3+Cqr8KjFRVLvWHZfJYnROVe4xF3DmKw95D+5zQ@mail.gmail.com>
 <1458140.gGPpU3NGiG@ralph.baldwin.cx>
From: Bryan Venteicher <bryanv@daemoninthecloset.org>
Date: Mon, 29 Sep 2014 11:02:20 -0500
Message-ID: <CAMo0n6SF5KUefTigP=QxEaCKeeMf6Mav_S2pEeqob3xmmNn=1w@mail.gmail.com>
Subject: Re: Change uma_mtx to rwlock
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 jeff@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Sep 2014 16:02:46 -0000

On Mon, Sep 29, 2014 at 10:27 AM, John Baldwin <jhb@freebsd.org> wrote:

> On Saturday, September 27, 2014 07:59:47 PM Bryan Venteicher wrote:
> > Hi,
> >
> > I'd appreciate some comments attached patch that changes the uma_mtx to=
 a
> > rwlock.
> >
> > At $JOB, we have machines with ~400GB RAM, with much of that being
> > allocated through UMA zones. We've observed that timeouts were sometime=
s
> > unexpectedly delayed by a half second or more. We tracked one of the
> > reasons for this down to when the page daemon was running, calling
> > uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while
> > zone_drain()'ing each zone. If uma_timeout() fires, it will block on th=
e
> > uma_mtx when it tries to zone_timeout() each zone.
>
> The only nit I see is in zone_drain_wait().  It would be nice to not need
> the
> hack of checking for a read or write lock and just require the one it
> actually
> needs depending on the callers.


>
However, checking the code in HEAD, this appears to just be broken.
> Specifically, zone_drain_wait() is called in two places:
>
> void
> zone_drain(uma_zone_t zone)
> {
>
>         zone_drain_wait(zone, M_NOWAIT);
> }
>
> ...
>
>
> static void
> zone_dtor(void *arg, int size, void *udata)
> {
>         ...
>         mtx_lock(&uma_mtx);
>         LIST_REMOVE(zone, uz_link);
>         mtx_unlock(&uma_mtx);
>         /*
>          * XXX there are some races here where
>          * the zone can be drained but zone lock
>          * released and then refilled before we
>          * remove it... we dont care for now
>          */
>         zone_drain_wait(zone, M_WAITOK);
>         ...
> }
>
> Neither one calls it with the uma_mtx locked!  This appears to have been
> broken since that function was introduced in r187681.
>
>
=E2=80=8BIndeed. I had noticed and mentioned that when I sent this patch to=
 jeff@ a
few months ago:

      When zone_dtor() calls zone_drain_wait(), should it hold the uma_{mtx=
,
rwlock}? Can the zone
      not be in the DRAINING state at this point? Similarly, does the while
draining loop in
      zone_drain_wait() then take the uma_mtx and the zone lock out of
order after the msleep().=E2=80=8B

=E2=80=8BBut I was just trying to clear out my queue a bit, and hadn't look=
ed at
the HEAD UMA in awhile, so I was going to double check that later.

I think it might be best to first remove the unlock/lock of uma_mtx from
> zone_drain_wait() (so it can be MFC'd).  That then simplifies that one
> part of
> your patch (which I think is otherwise fine).
>
>
=E2=80=8BI'll try to get a review started in Phabric =E2=80=8Bsoon.


> --
> John Baldwin
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org=
"
>

From owner-freebsd-hackers@FreeBSD.ORG  Mon Sep 29 21:53:48 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D20EDEFC;
 Mon, 29 Sep 2014 21:53:48 +0000 (UTC)
Received: from dmz-mailsec-scanner-4.mit.edu (dmz-mailsec-scanner-4.mit.edu
 [18.9.25.15])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4797AD19;
 Mon, 29 Sep 2014 21:53:47 +0000 (UTC)
X-AuditID: 1209190f-f79aa6d000005b45-d8-5429d3b72ffe
Received: from mailhub-auth-3.mit.edu ( [18.9.21.43])
 (using TLS with cipher AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by dmz-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP id
 B4.C9.23365.7B3D9245; Mon, 29 Sep 2014 17:48:39 -0400 (EDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11])
 by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id s8TLmcxG002935;
 Mon, 29 Sep 2014 17:48:39 -0400
Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37])
 (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU)
 by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s8TLmaVo031029
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
 Mon, 29 Sep 2014 17:48:38 -0400
Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308)
 id s8TLmarR012072; Mon, 29 Sep 2014 17:48:36 -0400 (EDT)
Date: Mon, 29 Sep 2014 17:48:36 -0400 (EDT)
From: Benjamin Kaduk <bjk@freebsd.org>
X-X-Sender: kaduk@multics.mit.edu
To: FreeBSD Current <freebsd-current@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Re: Call for FreeBSD 2014Q3 (July-September) Status Reports
In-Reply-To: <CAPyFy2DaVMqgcMGax+cJQhM1R-4+GoBv6Ms2qviG-Os0-VPgow@mail.gmail.com>
Message-ID: <alpine.GSO.1.10.1409291745430.17516@multics.mit.edu>
References: <CAPyFy2DaVMqgcMGax+cJQhM1R-4+GoBv6Ms2qviG-Os0-VPgow@mail.gmail.com>
User-Agent: Alpine 1.10 (GSO 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrJIsWRmVeSWpSXmKPExsUixCmqrbv9smaIwYcL6ha7rp1mt5jz5gOT
 xfbN/xgdmD1mfJrPEsAYxWWTkpqTWZZapG+XwJWxteUZc8EMnorn21+wNTD+5Oxi5OSQEDCR
 ePz5GDuELSZx4d56ti5GLg4hgdlMEh82HGWHcDYySuy9socZpEpI4BCTRO8nZYhEA6PEuo09
 jCAJFgFtiaUXP4KNYhNQk3i8t5kVYqyixOZTk8CaRQTKJb42ngGrFxZwkbj2vZcFxOYUCJRY
 t2MqWJxXwFFibkcnG8SyAImnUyeCzREV0JFYvX8KC0SNoMTJmU/AbGYBLYnl07exTGAUnIUk
 NQtJagEj0ypG2ZTcKt3cxMyc4tRk3eLkxLy81CJdE73czBK91JTSTYzgUJXk38H47aDSIUYB
 DkYlHl6OFRohQqyJZcWVuYcYJTmYlER5353QDBHiS8pPqcxILM6ILyrNSS0+xCjBwawkwiu3
 AyjHm5JYWZValA+TkuZgURLn3fSDL0RIID2xJDU7NbUgtQgmK8PBoSTB++ESUKNgUWp6akVa
 Zk4JQpqJgxNkOA/Q8MUgNbzFBYm5xZnpEPlTjLoc6zq/9TMJseTl56VKifM+uwhUJABSlFGa
 BzcHlmJeMYoDvSXMuxtkFA8wPcFNegW0hAloSdoGdZAlJYkIKakGRv6J2dfuyW3sUJXx1Et+
 GtN3ZDrTL7OTh7dE3DCdOmVRv8KJEwcmrtfy3MOffdD+y36eaXsETm9oE74Y7Oh0wL4+JF36
 9ja5wmUHo/fMjr4iNV9ki/n9yhdKM7qyAtedMJL9FhX3wO+TSGbxTatAsYKpG1R7TtzgneGf
 sc7f/tHfWNHNJ/wZNyuxFGckGmoxFxUnAgDGX/ZLDAMAAA==
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Sep 2014 21:53:49 -0000

Reminder: the deadline for 2014Q3 status reports is just over a week away!

Thanks,
Ben (on behalf of monthly@)

On Tue, 2 Sep 2014, Ed Maste wrote:

> Dear FreeBSD Community,
>
> The deadline for the next FreeBSD Quarterly Status update is October 7,
> for work done in July through September.
>
> Status report submissions do not have to be very long.  They may be
> about anything happening in the FreeBSD project and community, and
> provide a great way to inform FreeBSD users and developers about what
> you're working on.  Submission of reports is not restricted to
> committers.  Anyone doing anything interesting and FreeBSD-related
> can -- and should -- write one!
>
> The preferred and easiest submission method is to use the XML
> generator [1] with the results emailed to the status report team at
> monthly@freebsd.org .  There is also an XML template [2] which can be
> filled out manually and attached if preferred.  For the expected
> content and style, please study our guidelines on how to write a good
> status report [3].  You can also review previous issues [4][5] for
> ideas on the style and format.
>
> We are looking forward to all of your 2014Q3 reports!
>
> Thanks,
> Ed (on behalf of monthly@)
>
>
> [1] http://www.freebsd.org/cgi/monthly.cgi
> [2] http://www.freebsd.org/news/status/report-sample.xml
> [3] http://www.freebsd.org/news/status/howto.html
> [4] http://www.freebsd.org/news/status/report-2014-01-2014-03.html
> [4] http://www.freebsd.org/news/status/report-2014-04-2014-06.html

From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 08:44:13 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E8618385
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 08:44:13 +0000 (UTC)
Received: from mail-ob0-x232.google.com (mail-ob0-x232.google.com
 [IPv6:2607:f8b0:4003:c01::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B89982EA
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 08:44:13 +0000 (UTC)
Received: by mail-ob0-f178.google.com with SMTP id uy5so3600392obc.23
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 01:44:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=3lP4QPGe4MFYD2frX4yzVH2LxS2eFwY+oJYM7XacmWY=;
 b=ecvx/AR+jwJz2H6WxJa3vgTF/MKGvKa5Y8aGoTMR2dmWmMRabHuyUNkAOijhhAuGYi
 8S/QipAkEAwzZ46gtzx+xyrdFR41SqiAiizJGiDh+b9QDqLYQ7UwHvF4Me+bFC2vRfO8
 bRuOlmRl5JDrZBLhmDGkCy8PvB4K1WqSwI0yyg9zj73Ezi1nIbXAKaqspYm0XNZpBsmh
 E/vFEU/Pd3+ZVRwWr0ffL799NnM2e4hxUmNmPOjz2vj3zveQOS/lPEGu3mc+rlz6XHUs
 WRvciChDXTVa+brcLlZLvJU/GGE0SyfEHvqpZF9IGUP18vbQsnnEfZ0KN2VLflqfYYVf
 /6dA==
MIME-Version: 1.0
X-Received: by 10.60.133.228 with SMTP id pf4mr16917649oeb.38.1412066652951;
 Tue, 30 Sep 2014 01:44:12 -0700 (PDT)
Received: by 10.76.167.65 with HTTP; Tue, 30 Sep 2014 01:44:12 -0700 (PDT)
Date: Tue, 30 Sep 2014 01:44:12 -0700
Message-ID: <CAEOAkMV9WFZPwN=X4n_3rC2sc-YyZxkOEnbw-jdT5rWH_XvB+g@mail.gmail.com>
Subject: Textdump capture not generating "ddb.txt" when scripted via ddb
 utility
From: Shrikanth Kamath <shrikanth07@gmail.com>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 08:44:14 -0000

I am trying to experiment with text dumps, and using the ddb utility
to script the necessary capture information when a panic is triggered.
The problem I am seeing is that ddb.txt is not getting generated as
the ddb capture is not set on when invoked via ddb utility.

I am doing the following

% /sbin/ddb script kdb.enter.panic="textdump set; capture on; show
pcpu; bt; ps; alltrace; capture off; reset"
% sysctl debug.ddb.textdump.pending=1
debug.ddb.textdump.pending: 0 -> 1

I drop to the debugger and trigger a panic, which promptly generates
the text dump but is creating only the following text files

%tar -xvf textdump.tar.1
x msgbuf.txt
x panic.txt
x version.txt

The ddb.txt is not generated. But if I drop to the debugger and do the
following after doing the above scripting,

db> capture on
db>show allpcpu
db>capture off

I am able to see the ddb.txt after triggering panic.

Question is why is /sbin/ddb script not effecting "capture on" when
done from command line? Am I missing any steps. Here are my settings

%sysctl -a | grep ddb

debug.ddb.capture.data:
debug.ddb.capture.bufsize: 49152
debug.ddb.capture.inprogress: 0
debug.ddb.capture.maxbufsize: 5242880
debug.ddb.capture.bufoff: 20523
debug.ddb.scripting.unscript:
debug.ddb.scripting.scripts: kdb.enter.panic=textdump set; capture on;
show pcpu; bt; ps; alltrace; capture off; reset

debug.ddb.textdump.do_version: 1
debug.ddb.textdump.do_panic: 1
debug.ddb.textdump.do_msgbuf: 1
debug.ddb.textdump.do_ddb: 1
debug.ddb.textdump.pending: 1
debug.ddb_use_printf: 0
debug.kdb.current: ddb
debug.kdb.available: ddb gdb ndb


This is in a FreeBSD 10 environment.

--
Shrikanth R K

From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 08:46:00 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5C8EB525
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 08:46:00 +0000 (UTC)
Received: from eu1sys200aog122.obsmtp.com (eu1sys200aog122.obsmtp.com
 [207.126.144.153])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id AD90A313
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 08:45:59 +0000 (UTC)
Received: from mail-wg0-f45.google.com ([74.125.82.45]) (using TLSv1) by
 eu1sys200aob122.postini.com ([207.126.147.11]) with SMTP
 ID DSNKVCptrOGz+YiV7QLEYC64AlgitTLiBo3F@postini.com;
 Tue, 30 Sep 2014 08:45:59 UTC
Received: by mail-wg0-f45.google.com with SMTP id m15so1894770wgh.4
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 01:45:32 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:message-id:to:subject:reply-to;
 bh=aIAgGXCBOfQPwcoIfRIcXOlf7W9LSb9IHmKVtMTqgb8=;
 b=XqRIp5AvZZzeljaL89PaA6uU2nVtgudIWu/XNFHKZp0i7BINrlhg36Bv859t4N9rNR
 AyLwom8Ss6ZrjT1RwpJyzDpzMk0l+AhW69gJ+/kxWqOIA3CT7SEzAqm4FcKv3RdHxzjZ
 ycoiIqgtrt8UsGBtIIQ1MaJwfUdPIFCA/vEB+OPmyWDIr/+X48SsN1oMR7Hi9RTUmL8Y
 f1bPt8XvGJTdTiz+Vr2BJXILRVujov02HKspMpyh9OHdRy4HNxJr08OqrZx4rqYMu3Ne
 hTlrM71RKDGQEl/x88TfZCq87Y6vEBG/RHb9xvfmjIMHUVpWilPtG44yEA2IljNvRJHI
 2fgw==
X-Gm-Message-State: ALoCoQlaL6/zoXYMf13xL5CFLYSNYsxNR9w0jawPXJRW9Cw6w9KQFpS71KAEavtr3iUSU2oDriNjsXFtT97LYlRoumJI/yYBf4aJaG/2WVt70FIXl/SGtbYHCH1TVloJi6e+RFXmAQTdD9G1NsnvzoZVh4Tn62ZZ9g==
X-Received: by 10.180.97.98 with SMTP id dz2mr4058270wib.26.1412066732621;
 Tue, 30 Sep 2014 01:45:32 -0700 (PDT)
X-Received: by 10.180.97.98 with SMTP id dz2mr4058253wib.26.1412066732511;
 Tue, 30 Sep 2014 01:45:32 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk.
 [137.222.187.221])
 by mx.google.com with ESMTPSA id t1sm14384135wiy.8.2014.09.30.01.45.31
 for <freebsd-hackers@freebsd.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 30 Sep 2014 01:45:32 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1])
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s8U8jUYU079242
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 09:45:30 +0100 (BST)
 (envelope-from mexas@mech-as221.men.bris.ac.uk)
Received: (from mexas@localhost)
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s8U8jUTa079241
 for freebsd-hackers@freebsd.org; Tue, 30 Sep 2014 09:45:30 +0100 (BST)
 (envelope-from mexas)
Date: Tue, 30 Sep 2014 09:45:30 +0100 (BST)
From: Anton Shterenlikht <mexas@bris.ac.uk>
Message-Id: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk>
To: freebsd-hackers@freebsd.org
Subject: cluster FS?
Reply-To: mexas@bristol.ac.uk
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 08:46:00 -0000

Hello

Not sure if this is the right list...
I wanted to ask about a cluster file system.
Is there something like this on FreeBSD?

It seems to me (just from reading the handbook)
that none of NFS, HAST or iSCSI provide this.

My specific needs are as follows.
I have multiple nodes and a disk array.
Each node is connected by fibre to the disk array.
I want to have each node read/write access
to all disks on disk array.
So that if any node fails, the
data is still accessible
via the remaining nodes.

I want to have all nodes equal, i.e. no master/slave
or server/client model. Also, the disk array
provides adequate RAID already, so that is not
needed either.

In the archives I see that the demands for
a cluster FS support on FreeBSD have been expressed
periodically over a very long time, but seems
there's never been any resolution.
Some people mention GFS, but I've no idea
if this what I'm trying to describe.

So is what I'm describing a cluster FS at all?
Is there something like this on FreeBSD already?
Is there someting in ports that can be used
to achive this?

Thanks

Anton


From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 11:19:26 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CB544D16;
 Tue, 30 Sep 2014 11:19:26 +0000 (UTC)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id AB3B195B;
 Tue, 30 Sep 2014 11:19:25 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA21479;
 Tue, 30 Sep 2014 14:19:23 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1XYvSl-000G91-7i; Tue, 30 Sep 2014 14:19:23 +0300
Message-ID: <542A916A.2060703@FreeBSD.org>
Date: Tue, 30 Sep 2014 14:18:02 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.2
MIME-Version: 1.0
To: freebsd-hackers@FreeBSD.org
Subject: uk_slabsize, uk_ppera, uk_ipers, uk_pages
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 8bit
Cc: Gleb Smirnoff <glebius@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 11:19:27 -0000


I have a hard time understanding how to use uk_slabsize, uk_ppera, uk_ipers,
uk_pages to derive other useful characteristics of UMA kegs.  This is despite
the good descriptions of the fields and multiple examples of their usage in the
code.  Unfortunately, I find those examples to be at least inconsistent and
possibly contradictory.

First problem is quite obvious.  uk_slabsize has a too narrow type.  For
example, ZFS creates many zones with item sizes larger than 64KB.  So,
obviously, uk_slabsize overflows.  Not sure how that affects further
calculation, if any, but probably not in a good way.
On the other hand, there is probably no harm at all, because as far as I can see
uk_slabsize is actually used only within keg_small_init().  It is set but not
used in keg_large_init() and keg_cachespread_init().  It does not seem to be
used after initialization.  So, maybe this field could be just converted to a
local variable in keg_small_init() ?

Now a second problem.  Even the names uk_ppera (pages per allocation) and
uk_ipers (items per slab) leave some room for ambiguity.  What is a relation
between the allocation and the slab?  It seems that some code assumes that the
slab takes the whole allocation (that is, one slab per allocation), other code
places multiple slabs into a single allocation, while other code looks
inconsistent in this respect.

For instance:
static void
keg_drain(uma_keg_t keg)
{
...
                LIST_REMOVE(slab, us_link);
                keg->uk_pages -= keg->uk_ppera;
                keg->uk_free -= keg->uk_ipers;

A slab is freed.  There is no question about uk_free.  But it is clear that the
code assumes the slab takes a whole allocation.  keg_alloc_slab() is symmetric
with these stats.

int
uma_zone_set_max(uma_zone_t zone, int nitems)
{
        uma_keg_t keg;

        keg = zone_first_keg(zone);
        if (keg == NULL)
                return (0);
        KEG_LOCK(keg);
        keg->uk_maxpages = (nitems / keg->uk_ipers) * keg->uk_ppera;
        if (keg->uk_maxpages * keg->uk_ipers < nitems)
                keg->uk_maxpages += keg->uk_ppera;
        nitems = keg->uk_maxpages * keg->uk_ipers;
        KEG_UNLOCK(keg);

        return (nitems);
}

The uk_maxpages calculation seems to assume that the allocation and the slab is
the same.  We first calculate a number of slabs needed to hold nitems and then
multiply that number by the number of pages per allocation.  But when we
calculate nitems to be returned we simply multiply uk_maxpages by uk_ipers as if
we assume that the slab size is 1 page regardless of uk_ppera.
uma_zone_get_max() calculates nitems in the same way without taking uk_ppera
into account.

uma_print_keg(): out is calculated as
	(keg->uk_ipers * keg->uk_pages) - keg->uk_free
while limit is calculated as:
	(keg->uk_maxpages / keg->uk_ppera) * keg->uk_ipers
In one case we simply multiply a number of pages by ipers, but in the other case
we first divide with uk_ppera.


My personal opinion is that we should settle on the rule that the slab and the
allocation map 1:1 and fix the code that does not conform to that.
It seems that this is how the code that allocates and frees slabs actually
works.  I do not see any good reason to support multiple slabs per an allocation.


P.S.
By the way, we have some wonderful things in UMA code that are not used anymore
(if ever?) and are scarcely documented.  Perhaps some of those could be removed
to simplify the code:
- UMA_ZONE_CACHESPREAD
- uma_zsecond_add()
More generally it looks like the support for multiple zones using the same keg
is quite useful.  On the other hand the support for a zone using multiple kegs
is of questionable utility and at present that capability is not used.

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 11:23:26 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 426FEE72
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 11:23:26 +0000 (UTC)
Received: from puchar.net (puchar.net [188.252.31.250])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "puchar.net", Issuer "puchar.net" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id C3DD9A15
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 11:23:25 +0000 (UTC)
Received: Received: from 127.0.0.1 (localhost [127.0.0.1])
 by puchar.net (8.14.9/8.14.9) with ESMTP id s8UB4XBN001014;
 Tue, 30 Sep 2014 13:04:35 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
Date: Tue, 30 Sep 2014 13:04:34 +0200 (CEST)
From: Wojciech Puchar <wojtek@puchar.net>
X-X-Sender: wojtek@laptop
To: mexas@bristol.ac.uk
Subject: Re: cluster FS?
In-Reply-To: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk>
Message-ID: <alpine.BSF.2.00.1409301300350.864@laptop>
References: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (puchar.net [10.0.1.1]); Tue, 30 Sep 2014 13:04:35 +0200 (CEST)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 11:23:26 -0000

>
> It seems to me (just from reading the handbook)
> that none of NFS, HAST or iSCSI provide this.

none of following are filesystems at all. NFS is remote access to 
filesystem, the rest presents raw block device.

> My specific needs are as follows.
> I have multiple nodes and a disk array.
> Each node is connected by fibre to the disk array.
> I want to have each node read/write access
> to all disks on disk array.
> So that if any node fails, the
> data is still accessible
> via the remaining nodes.

as disk array presents block devices, not files it is not possible to have 
filesystem read write access with more than one computer to the same block 
device.
There is no AFAIK filesystems that can communicate between nodes to 
synchronize state after writes and prevent conflict.

> I want to have all nodes equal, i.e. no master/slave
> or server/client model. Also, the disk array
> provides adequate RAID already, so that is not

instead of using disk arrays (expensive) it's better to run FreeBSD as 
file server with good deal of disks and connectivity and export 
filesystems using eg. NFS.

you may do any RAID type and any filesystem not only cheaper but with 
extra security - on disk format is known and open and you may access these 
disks with any other computer running FreeBSD.

From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 11:44:27 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CAB6841E;
 Tue, 30 Sep 2014 11:44:27 +0000 (UTC)
Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 47E2FC9E;
 Tue, 30 Sep 2014 11:44:26 +0000 (UTC)
Received: from cell.glebius.int.ru (localhost [127.0.0.1])
 by cell.glebius.int.ru (8.14.9/8.14.9) with ESMTP id s8UBiOOe074177
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Tue, 30 Sep 2014 15:44:24 +0400 (MSK)
 (envelope-from glebius@FreeBSD.org)
Received: (from glebius@localhost)
 by cell.glebius.int.ru (8.14.9/8.14.9/Submit) id s8UBiOfH074176;
 Tue, 30 Sep 2014 15:44:24 +0400 (MSK)
 (envelope-from glebius@FreeBSD.org)
X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to
 glebius@FreeBSD.org using -f
Date: Tue, 30 Sep 2014 15:44:24 +0400
From: Gleb Smirnoff <glebius@FreeBSD.org>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: uk_slabsize, uk_ppera, uk_ipers, uk_pages
Message-ID: <20140930114424.GD73266@glebius.int.ru>
References: <542A916A.2060703@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <542A916A.2060703@FreeBSD.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: freebsd-hackers@FreeBSD.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 11:44:28 -0000

  Andriy,

On Tue, Sep 30, 2014 at 02:18:02PM +0300, Andriy Gapon wrote:
A> I have a hard time understanding how to use uk_slabsize, uk_ppera, uk_ipers,
A> uk_pages to derive other useful characteristics of UMA kegs.  This is despite
A> the good descriptions of the fields and multiple examples of their usage in the
A> code.  Unfortunately, I find those examples to be at least inconsistent and
A> possibly contradictory.
A> 
A> First problem is quite obvious.  uk_slabsize has a too narrow type.  For
A> example, ZFS creates many zones with item sizes larger than 64KB.  So,
A> obviously, uk_slabsize overflows.  Not sure how that affects further
A> calculation, if any, but probably not in a good way.
A> On the other hand, there is probably no harm at all, because as far as I can see
A> uk_slabsize is actually used only within keg_small_init().  It is set but not
A> used in keg_large_init() and keg_cachespread_init().  It does not seem to be
A> used after initialization.  So, maybe this field could be just converted to a
A> local variable in keg_small_init() ?

Nice observation. I bet, that when I developed UMA_ZONE_PCPU this field was used
outside of keg_small_init(). It looks like now, uk_ipers and uk_pages are enough
to know outside of keg_small_init().

A> Now a second problem.  Even the names uk_ppera (pages per allocation) and
A> uk_ipers (items per slab) leave some room for ambiguity.  What is a relation
A> between the allocation and the slab?  It seems that some code assumes that the
A> slab takes the whole allocation (that is, one slab per allocation), other code
A> places multiple slabs into a single allocation, while other code looks
A> inconsistent in this respect.
<skip>
A> My personal opinion is that we should settle on the rule that the slab and the
A> allocation map 1:1 and fix the code that does not conform to that.
A> It seems that this is how the code that allocates and frees slabs actually
A> works.  I do not see any good reason to support multiple slabs per an allocation.

In case of UMA_ZONE_PCPU, the slab is ncpu times smaller than the allocation.

BUT, whenever you do uma_zalloc() you allocate not a single item, but ncpu items
at a time. That's why all statistics that you quoted work correctly. And that's
why we do not have 1:1 mapping of slab and allocation and we need to have uk_ppera,
and uk_ipers.

A> P.S.
A> By the way, we have some wonderful things in UMA code that are not used anymore
A> (if ever?) and are scarcely documented.  Perhaps some of those could be removed
A> to simplify the code:
A> - UMA_ZONE_CACHESPREAD
A> - uma_zsecond_add()
A> More generally it looks like the support for multiple zones using the same keg
A> is quite useful.  On the other hand the support for a zone using multiple kegs
A> is of questionable utility and at present that capability is not used.

I second on that.

-- 
Totus tuus, Glebius.

From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 12:16:42 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 758A3166;
 Tue, 30 Sep 2014 12:16:42 +0000 (UTC)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 7EB25FF7;
 Tue, 30 Sep 2014 12:16:41 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA22062;
 Tue, 30 Sep 2014 15:16:40 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1XYwMB-000GCH-N7; Tue, 30 Sep 2014 15:16:39 +0300
Message-ID: <542A9EF0.3050405@FreeBSD.org>
Date: Tue, 30 Sep 2014 15:15:44 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.2
MIME-Version: 1.0
To: Gleb Smirnoff <glebius@FreeBSD.org>
Subject: Re: uk_slabsize, uk_ppera, uk_ipers, uk_pages
References: <542A916A.2060703@FreeBSD.org>
 <20140930114424.GD73266@glebius.int.ru>
In-Reply-To: <20140930114424.GD73266@glebius.int.ru>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Cc: freebsd-hackers@FreeBSD.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 12:16:42 -0000

On 30/09/2014 14:44, Gleb Smirnoff wrote:
>   Andriy,
> 
> On Tue, Sep 30, 2014 at 02:18:02PM +0300, Andriy Gapon wrote:
> A> Now a second problem.  Even the names uk_ppera (pages per allocation) and
> A> uk_ipers (items per slab) leave some room for ambiguity.  What is a relation
> A> between the allocation and the slab?  It seems that some code assumes that the
> A> slab takes the whole allocation (that is, one slab per allocation), other code
> A> places multiple slabs into a single allocation, while other code looks
> A> inconsistent in this respect.
> <skip>
> A> My personal opinion is that we should settle on the rule that the slab and the
> A> allocation map 1:1 and fix the code that does not conform to that.
> A> It seems that this is how the code that allocates and frees slabs actually
> A> works.  I do not see any good reason to support multiple slabs per an allocation.
> 
> In case of UMA_ZONE_PCPU, the slab is ncpu times smaller than the allocation.
> 
> BUT, whenever you do uma_zalloc() you allocate not a single item, but ncpu items
> at a time. That's why all statistics that you quoted work correctly.

This is not true for kegs with multi-page slabs.  Consider a zone with 8KB items
on a system 4KB pages. Its keg uses slabs with the size of two pages, uk_ppera
is 2.  There is only one item per slab, uk_ipers is 1. Let's say there are two
slabs allocated. Then uk_pages is 4.  So, uk_ipers * uk_pages would give 4, but
in reality there are only two items.  The correct calculation must be (uk_pages
/ uk_ppera) * uk_ipers.

If you have enough CPUs for a pcpu zone to use multi-page slabs / allocations,
then the above will also be applicable. Consider "64 pcpu" and 8 CPUs.  You have
uk_ppera = 2, uk_ipers = 128.  If there is only 1 "real" slab allocated that's 2
pages, so uk_pages * uk_ipers = 256, but in reality the correct number of
provided items is (uk_pages / uk_ppera) * uk_ipers = 128.

BTW, it's a pity that you omitted the code that demonstrated the problem from
the quote.

> And that's
> why we do not have 1:1 mapping of slab and allocation and we need to have
> uk_ppera and uk_ipers.

We do have 1:1 mapping between allocations and "real" slabs.  The imaginary
"slabs" specific to pcpu zones do not affect how the keg code works.

We do need both uk_ppera and uk_ipers, of course, because one allows to convert
between a number of pages and a number of slabs and the other allows to convert
between a number of items and the number of slabs.

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 12:27:02 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 491987E1
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 12:27:02 +0000 (UTC)
Received: from eu1sys200aog104.obsmtp.com (eu1sys200aog104.obsmtp.com
 [207.126.144.117])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 990B219F
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 12:27:00 +0000 (UTC)
Received: from mail-we0-f177.google.com ([74.125.82.177]) (using TLSv1) by
 eu1sys200aob104.postini.com ([207.126.147.11]) with SMTP
 ID DSNKVCqhjVqti65x8Jti6vtMegtU9eqHEklH@postini.com;
 Tue, 30 Sep 2014 12:27:01 UTC
Received: by mail-we0-f177.google.com with SMTP id k48so86720wev.36
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 05:26:53 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to
 :in-reply-to;
 bh=QHVwLQcm8vaQtesTYRfUewrr+/GJ5jrqDq0OEJ6qZMM=;
 b=kS5kAzo0UV91FYnnn35yLVONN1hMoGJSA8mXb4REp+DU6+oVrIoz8PN35K8IN4bXiI
 3vgxXAACP2ZI+7hUTMVDKWa8xgH2ejoaO4itRGgdF/DduY4GhJgeLGLXu77WD9JfrCUx
 AcCU6VLX/1FeF8Vsh++/uKOPKeFNnXPylCfqvpBBM+lhvPZBsoOWWkb4nyTalNUEYuVG
 q0bfLpTRsjsgqSg7/uF973QILNBxYqDjOwi0aqrtwUZ5zSwWtoFoImS/T7ew1dYiOI4z
 qhzl7dYeZcvh6DHVQZK12SWWaUK46KJAYE+OhS7xamye1nv4p7dratXlLpqKJdRGbTYn
 83Cw==
X-Received: by 10.180.76.100 with SMTP id j4mr5069383wiw.51.1412078276097;
 Tue, 30 Sep 2014 04:57:56 -0700 (PDT)
X-Gm-Message-State: ALoCoQnn/j1SlF70/8wtiB1P30wHK7w074Ee7wgVu2oMjTBapgpsxccw0cXUdjvJj4Y7S7gxI/wQQ2fYUW3pUS+s/s71r4/9jUng9fWblgGypBYXy6BtbhagHON7GC4uZMjGwmyPmBmOS8yw68Jv/CPE3ult/86IOQ==
X-Received: by 10.180.76.100 with SMTP id j4mr5069370wiw.51.1412078275989;
 Tue, 30 Sep 2014 04:57:55 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk.
 [137.222.187.221])
 by mx.google.com with ESMTPSA id cy10sm18993813wjb.21.2014.09.30.04.57.54
 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 30 Sep 2014 04:57:55 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1])
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s8UBvrLT079813
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Tue, 30 Sep 2014 12:57:54 +0100 (BST)
 (envelope-from mexas@mech-as221.men.bris.ac.uk)
Received: (from mexas@localhost)
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s8UBvr8f079812;
 Tue, 30 Sep 2014 12:57:53 +0100 (BST) (envelope-from mexas)
Date: Tue, 30 Sep 2014 12:57:53 +0100 (BST)
From: Anton Shterenlikht <mexas@bris.ac.uk>
Message-Id: <201409301157.s8UBvr8f079812@mech-as221.men.bris.ac.uk>
To: mexas@bristol.ac.uk, wojtek@puchar.net
Subject: Re: cluster FS?
Reply-To: mexas@bristol.ac.uk
In-Reply-To: <alpine.BSF.2.00.1409301300350.864@laptop>
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 12:27:02 -0000

>From wojtek@puchar.net Tue Sep 30 12:14:35 2014
>
>as disk array presents block devices, not files it is not possible to have 
>filesystem read write access with more than one computer to the same block 
>device.
>There is no AFAIK filesystems that can communicate between nodes to 
>synchronize state after writes and prevent conflict.

The hardware is inherited from a VMS cluster,
which did precisely that. I don't remember now
what FS VMS used. I guess I'm trying
to replicate a VMS cluster with FreeBSD means.

>> I want to have all nodes equal, i.e. no master/slave
>> or server/client model. Also, the disk array
>> provides adequate RAID already, so that is not
>
>instead of using disk arrays (expensive) it's better to run FreeBSD as 

well.. I have a populated array already,
so no extra costs are involved.

>file server with good deal of disks and connectivity and export 
>filesystems using eg. NFS.

but again, what if the NFS server dies?
The data is no longer available.

Thanks

Anton


From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 12:36:11 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2276DBC3;
 Tue, 30 Sep 2014 12:36:11 +0000 (UTC)
Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 94D382F5;
 Tue, 30 Sep 2014 12:36:09 +0000 (UTC)
Received: from cell.glebius.int.ru (localhost [127.0.0.1])
 by cell.glebius.int.ru (8.14.9/8.14.9) with ESMTP id s8UCa7Kg074382
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Tue, 30 Sep 2014 16:36:07 +0400 (MSK)
 (envelope-from glebius@FreeBSD.org)
Received: (from glebius@localhost)
 by cell.glebius.int.ru (8.14.9/8.14.9/Submit) id s8UCa7pW074381;
 Tue, 30 Sep 2014 16:36:07 +0400 (MSK)
 (envelope-from glebius@FreeBSD.org)
X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to
 glebius@FreeBSD.org using -f
Date: Tue, 30 Sep 2014 16:36:07 +0400
From: Gleb Smirnoff <glebius@FreeBSD.org>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: uk_slabsize, uk_ppera, uk_ipers, uk_pages
Message-ID: <20140930123607.GE73266@glebius.int.ru>
References: <542A916A.2060703@FreeBSD.org>
 <20140930114424.GD73266@glebius.int.ru>
 <542A9EF0.3050405@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <542A9EF0.3050405@FreeBSD.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: freebsd-hackers@FreeBSD.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 12:36:11 -0000

On Tue, Sep 30, 2014 at 03:15:44PM +0300, Andriy Gapon wrote:
A> This is not true for kegs with multi-page slabs.  Consider a zone with 8KB items
A> on a system 4KB pages. Its keg uses slabs with the size of two pages, uk_ppera
A> is 2.  There is only one item per slab, uk_ipers is 1. Let's say there are two
A> slabs allocated. Then uk_pages is 4.  So, uk_ipers * uk_pages would give 4, but
A> in reality there are only two items.  The correct calculation must be (uk_pages
A> / uk_ppera) * uk_ipers.
A> 
A> If you have enough CPUs for a pcpu zone to use multi-page slabs / allocations,
A> then the above will also be applicable. Consider "64 pcpu" and 8 CPUs.  You have
A> uk_ppera = 2, uk_ipers = 128.  If there is only 1 "real" slab allocated that's 2
A> pages, so uk_pages * uk_ipers = 256, but in reality the correct number of
A> provided items is (uk_pages / uk_ppera) * uk_ipers = 128.

You are right.

-- 
Totus tuus, Glebius.

From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 22:53:41 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6CF9D1A5
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 22:53:41 +0000 (UTC)
Received: from na01-bn1-obe.outbound.protection.outlook.com
 (mail-bn1bon0074.outbound.protection.outlook.com [157.56.111.74])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "mail.protection.outlook.com",
 Issuer "MSIT Machine Auth CA 2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1C8E9FAA
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 22:53:39 +0000 (UTC)
Received: from DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) by
 DM2PR0801MB0943.namprd08.prod.outlook.com (25.160.131.26) with Microsoft SMTP
 Server (TLS) id 15.0.1039.15; Tue, 30 Sep 2014 22:53:59 +0000
Received: from DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) by
 DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) with
 mapi id 15.00.1039.011; Tue, 30 Sep 2014 22:53:40 +0000
From: "Pokala, Ravi" <rpokala@panasas.com>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: dumpsys/savecore on AF-4Kn drives?
Thread-Topic: dumpsys/savecore on AF-4Kn drives?
Thread-Index: AQHP3QFgvwLpNOmcHUaIK0Ete9ljBw==
Date: Tue, 30 Sep 2014 22:53:40 +0000
Message-ID: <D050827F.121A5C%rpokala@panasas.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
user-agent: Microsoft-MacOutlook/14.4.4.140807
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [64.80.217.3]
x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR0801MB0943;
x-forefront-prvs: 0350D7A55D
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(6009001)(164054003)(199003)(189002)(2351001)(229853001)(107886001)(110136001)(54356999)(107046002)(4396001)(120916001)(76482002)(97736003)(99286002)(101416001)(87936001)(105586002)(64706001)(106116001)(2656002)(77096002)(31966008)(83506001)(66066001)(80022003)(106356001)(85306004)(95666004)(20776003)(21056001)(50986999)(85852003)(99396003)(92566001)(10300001)(36756003)(46102003)(86362001)(92726001);
 DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR0801MB0943;
 H:DM2PR0801MB0944.namprd08.prod.outlook.com; FPR:; MLV:sfv; PTR:InfoNoRecords;
 A:1; MX:1; LANG:en; 
Content-Type: text/plain; charset="us-ascii"
Content-ID: <1C6A75B497BE8A49B67831132974C60C@namprd08.prod.outlook.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: panasas.com
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 22:53:41 -0000

Hi folks,

Does anyone out there have AF-4Kn drives (both logical and physical sector
size is 4KB)? Have you been able to drop a core to one, and successfully
save the core on the way back up?

I'm working on adding AF-4Kn support to an older version of FreeBSD (based
on 7 - yeah, I know... :-P), using -CURRENT as a reference. Things look
good at the GEOM level and higher; the GEOM utils report correct sizes,
UFS runs fine, etc. If I manually break into the debugger and 'call
doadump', it appears to work; at least, it does not report any errors. But
when I reboot, `savecore' complains:

    error reading dump header at offset 0 in /dev/mirror/gm1: Invalid
argument

(Yes, it's dumping to a mirror; no, that's not the problem: the mirror is
configured using the 'prefer' balancing algorithm, as described in
gmirror(8), and we've been doing this without issue for years.)

I'm trying to figure out if the problem is on the dumpsys side, the
savecore side, or if they're both broken for AF-4Kn. In particular,
'struct kerneldumpheader' is 512 bytes, and it looks like most calls to
dump_write() in the full-dump context (not minidumps) pass either the size
of the structure, or an explicit 512, for the 'length' argument. That's
the case in both the 7-ish version I'm porting to, and in -CURRENT.

There's no AF-4Kn-aware bootstrap in the version we're using (emaste@ -
does the new UEFI bootstrap in 10-STABLE work w/ AF-4Kn drives?), so one
of the drives is 512n, and I could probably find some space on there to
save the core to. But that device is small, and we have other uses for it,
so I'd like to avoid reserving a large chunk of it.

Any thoughts?

Thanks,

Ravi


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 01:10:21 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2382AAF6
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 01:10:21 +0000 (UTC)
Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 074F1F1D
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 01:10:20 +0000 (UTC)
Received: from [192.168.1.172] (pool-173-52-87-124.nycmny.fios.verizon.net
 [173.52.87.124])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested) (Authenticated sender: ryao)
 by smtp.gentoo.org (Postfix) with ESMTPSA id 3214334003C;
 Wed,  1 Oct 2014 01:10:09 +0000 (UTC)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (1.0)
Subject: Re: cluster FS?
From: Richard Yao <ryao@gentoo.org>
X-Mailer: iPad Mail (11D257)
In-Reply-To: <alpine.BSF.2.00.1409301300350.864@laptop>
Date: Tue, 30 Sep 2014 21:10:07 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <A42D6469-5A59-4AC7-9C43-690AF7AC4736@gentoo.org>
References: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk>
 <alpine.BSF.2.00.1409301300350.864@laptop>
To: Wojciech Puchar <wojtek@puchar.net>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "mexas@bristol.ac.uk" <mexas@bristol.ac.uk>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 01:10:21 -0000

On Sep 30, 2014, at 7:04 AM, Wojciech Puchar <wojtek@puchar.net> wrote:

>>=20
>> It seems to me (just from reading the handbook)
>> that none of NFS, HAST or iSCSI provide this.
>=20
> none of following are filesystems at all. NFS is remote access to filesyst=
em, the rest presents raw block device.
>=20
>> My specific needs are as follows.
>> I have multiple nodes and a disk array.
>> Each node is connected by fibre to the disk array.
>> I want to have each node read/write access
>> to all disks on disk array.
>> So that if any node fails, the
>> data is still accessible
>> via the remaining nodes.
>=20
> as disk array presents block devices, not files it is not possible to have=
 filesystem read write access with more than one computer to the same block d=
evice.
> There is no AFAIK filesystems that can communicate between nodes to synchr=
onize state after writes and prevent conflict.

Linux tends to have most of the work in this area. In specific, Lustre, Ceph=
 and Gluster. Gluster is FUSE-based and the server will run on FreeBSD:

https://wiki.freebsd.org/GlusterFS

The client likely can run on FreeBSD too, but it might be that no one has te=
sted it because the FreeBSD support was done before FreeBSD supported FUSE.=

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 01:14:30 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 36A97C03
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 01:14:30 +0000 (UTC)
Received: from mx1.scaleengine.net (beauharnois2.bhs1.scaleengine.net
 [142.4.218.15]) by mx1.freebsd.org (Postfix) with ESMTP id EBB8BFD0
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 01:14:29 +0000 (UTC)
Received: from [192.168.1.2] (Seawolf.HML3.ScaleEngine.net [209.51.186.28])
 (Authenticated sender: allanjude.freebsd@scaleengine.com)
 by mx1.scaleengine.net (Postfix) with ESMTPSA id BB30F57F65
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 01:14:28 +0000 (UTC)
Message-ID: <542B557F.4050603@freebsd.org>
Date: Tue, 30 Sep 2014 21:14:39 -0400
From: Allan Jude <allanjude@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.2
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Subject: Re: cluster FS?
References: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk>
In-Reply-To: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk>
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature";
 boundary="uwrg0P5ODEVL921K6DrL39Cl0LCCjugsE"
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 01:14:30 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--uwrg0P5ODEVL921K6DrL39Cl0LCCjugsE
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 2014-09-30 04:45, Anton Shterenlikht wrote:
> Hello
>=20
> Not sure if this is the right list...
> I wanted to ask about a cluster file system.
> Is there something like this on FreeBSD?
>=20
> It seems to me (just from reading the handbook)
> that none of NFS, HAST or iSCSI provide this.
>=20
> My specific needs are as follows.
> I have multiple nodes and a disk array.
> Each node is connected by fibre to the disk array.
> I want to have each node read/write access
> to all disks on disk array.
> So that if any node fails, the
> data is still accessible
> via the remaining nodes.
>=20
> I want to have all nodes equal, i.e. no master/slave
> or server/client model. Also, the disk array
> provides adequate RAID already, so that is not
> needed either.
>=20
> In the archives I see that the demands for
> a cluster FS support on FreeBSD have been expressed
> periodically over a very long time, but seems
> there's never been any resolution.
> Some people mention GFS, but I've no idea
> if this what I'm trying to describe.
>=20
> So is what I'm describing a cluster FS at all?
> Is there something like this on FreeBSD already?
> Is there someting in ports that can be used
> to achive this?
>=20
> Thanks
>=20
> Anton
>=20
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.o=
rg"
>=20

What you are describing doesn't really seem to be a 'cluster' FS.

In a cluster, the disks would reside in multiple machines, and the 'file
system' would withstand any one of those machines going down. That is
quite a bit different than just wanting a bunch of clients to have
concurrent access to a single disk array.

If you explain your use-case in more detail, we may be able to guide you
in the right direction.

--=20
Allan Jude


--uwrg0P5ODEVL921K6DrL39Cl0LCCjugsE
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)

iQIcBAEBAgAGBQJUK1WCAAoJEJrBFpNRJZKfY6YP/1I225shSB9C0Vnkw9oNBLxy
JfJ7nxyghsFcCseUe+N7ggYCMxr6DO+z4RTG4XuCJ2v7ntlcBkdO8LwnuUL+blkY
noGpPzJHuAsX/iujTsNe2XPukuCK3guEFKyO1MMbG1WQnNbRCq3F5zPwOoVUYhHC
urFZeoSLYnZWFL6deFJcrTxDVuXh0gc/O/d9lOIieWUVtgFDnBLLmo6LZ6t9qc0P
EBOsM28dINchOgOoN2fLyhkvISN1ZwbIN6p8HNM2vYsWZg7toOlWjKr4KUMtyDvp
KEpFpBMTOI058qxsNDqjgpboj5izhp7N2o7rtFp1ks2JR0ar1Y2iHzm8lTTuXpqQ
Rp9xfhDt8D7yxy3zDI49+mMRCHWnqqg8GGaC5qCs0urd4SGto4ZzV6X91qdnrmSo
/adc4PZtJYHECiSB14D0tMDiDJW/w40F/j1oXlA87OwbQtrbDgomq/Kj4JnPAN/3
8vxF3oFoQO5fMfgjmIV2MfUsbt7F9zYFoiZlG0Yyw5rqyNz00nKiTj1p4eFtpaNR
isEsIjBCjF3Th8mqtwfWeWMLX99UeWN6XMrMVKhxnT89jwgvp9cQ710Q+ZX/teZR
UjBR7I1HFa01wmCcwy/4lTsoI8eaD+KGGAwUwjXXdHlOpjqJttPxdP6JNfHWR0Gh
kA/buRftqIGfw+Y88EFx
=qAKB
-----END PGP SIGNATURE-----

--uwrg0P5ODEVL921K6DrL39Cl0LCCjugsE--

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 01:44:57 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A211A419;
 Wed,  1 Oct 2014 01:44:57 +0000 (UTC)
Received: from mail-yh0-x22e.google.com (mail-yh0-x22e.google.com
 [IPv6:2607:f8b0:4002:c01::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 575FE32A;
 Wed,  1 Oct 2014 01:44:57 +0000 (UTC)
Received: by mail-yh0-f46.google.com with SMTP id f73so66122yha.19
 for <multiple recipients>; Tue, 30 Sep 2014 18:44:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=xgvuuBGKIKC6Xe/fgpV03bJrGcOCBPG1QiO3xHcvkL0=;
 b=v3HTGx10lw/rImfiYABAK6ybADOTRL1iNLnRtv5W0rUX7kmZdGBCpsfpocGFrjqjIC
 V5OPWq2YK9ZdaGls2z7kxfPWp6KqObJvfiKEHjmgj6BHLGAZ51uoB26kTh8e992IxxRU
 a/CH1cmnAPteZtYn5XItR+8WaXul24XHIyEbXUZcw4xpRgekim3OTByrP7DTH1KiZ36t
 fCJARcXvzj9e9Zzz7xeVnuP7s1t0iBVutZdB9mIXIYNU75XTtH3jw5xgOSLxI0l5yJZd
 Pw9J1njmlHF06JI7NeVpT1eehDFJHM9tLyErwsK9V9lqG9PI+eXrn23biLsguuFOHlNm
 Qjbw==
MIME-Version: 1.0
X-Received: by 10.236.127.140 with SMTP id d12mr75723501yhi.37.1412127896572; 
 Tue, 30 Sep 2014 18:44:56 -0700 (PDT)
Received: by 10.170.206.10 with HTTP; Tue, 30 Sep 2014 18:44:56 -0700 (PDT)
In-Reply-To: <542B557F.4050603@freebsd.org>
References: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk>
 <542B557F.4050603@freebsd.org>
Date: Tue, 30 Sep 2014 21:44:56 -0400
Message-ID: <CAOgwaMsAj5+oiUZEUtuc8uLHtyOm8oVdv0TUYbATm1bqdhHbWQ@mail.gmail.com>
Subject: Re: cluster FS?
From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
To: Allan Jude <allanjude@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 01:44:57 -0000

On Tue, Sep 30, 2014 at 9:14 PM, Allan Jude <allanjude@freebsd.org> wrote:

> On 2014-09-30 04:45, Anton Shterenlikht wrote:
> > Hello
> >
> > Not sure if this is the right list...
> > I wanted to ask about a cluster file system.
> > Is there something like this on FreeBSD?
> >
> > It seems to me (just from reading the handbook)
> > that none of NFS, HAST or iSCSI provide this.
> >
> > My specific needs are as follows.
> > I have multiple nodes and a disk array.
> > Each node is connected by fibre to the disk array.
> > I want to have each node read/write access
> > to all disks on disk array.
> > So that if any node fails, the
> > data is still accessible
> > via the remaining nodes.
> >
> > I want to have all nodes equal, i.e. no master/slave
> > or server/client model. Also, the disk array
> > provides adequate RAID already, so that is not
> > needed either.
> >
> > In the archives I see that the demands for
> > a cluster FS support on FreeBSD have been expressed
> > periodically over a very long time, but seems
> > there's never been any resolution.
> > Some people mention GFS, but I've no idea
> > if this what I'm trying to describe.
> >
> > So is what I'm describing a cluster FS at all?
> > Is there something like this on FreeBSD already?
> > Is there someting in ports that can be used
> > to achive this?
> >
> > Thanks
> >
> > Anton
> >
> > _______________________________________________
> > freebsd-hackers@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > To unsubscribe, send any mail to "
> freebsd-hackers-unsubscribe@freebsd.org"
> >
>
> What you are describing doesn't really seem to be a 'cluster' FS.
>
> In a cluster, the disks would reside in multiple machines, and the 'file
> system' would withstand any one of those machines going down. That is
> quite a bit different than just wanting a bunch of clients to have
> concurrent access to a single disk array.
>
> If you explain your use-case in more detail, we may be able to guide you
> in the right direction.
>
> --
> Allan Jude
>
>

The following pages and their associated pages may be useful for
definitions of terms and available capabilities :

http://en.wikipedia.org/wiki/Parallel_Virtual_Machine
http://en.wikipedia.org/wiki/Linda_%28coordination_language%29

http://en.wikipedia.org/wiki/Category:Parallel_computing
http://en.wikipedia.org/wiki/Category:Concurrent_computing
http://en.wikipedia.org/wiki/Category:Distributed_computing


http://en.wikipedia.org/wiki/Network-attached_storage
http://en.wikipedia.org/wiki/Clustered_file_system
http://en.wikipedia.org/wiki/Category:Shared_disk_file_systems
http://en.wikipedia.org/wiki/Category:Network_file_systems


http://en.wikipedia.org/wiki/Ceph_%28software%29
http://en.wikipedia.org/wiki/XtreemFS


The above problem seems to be "Network-attached_storage" .


Thank you very much .


Mehmet Erol Sanliturk

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 03:22:36 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D1436438
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 03:22:36 +0000 (UTC)
Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 906A4EEC
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 03:22:36 +0000 (UTC)
Received: (qmail 15878 invoked by uid 1000); 1 Oct 2014 03:15:53 -0000
Date: Tue, 30 Sep 2014 23:15:53 -0400
From: Larry Baird <lab@gta.com>
To: freebsd-hackers@freebsd.org
Subject: Kernel/Compiler bug
Message-ID: <20141001031553.GA14360@gta.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.23 (2014-03-12)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 03:22:36 -0000

I have run into a compiler optimization bug with clang version 3.4.1 and
"-O0" when compiling a 10.1 i386 kernel. When debugging kernels using kgbd I
like to disable compiler optimization.  I have been fighting a kernel double
fault bug for a while.  I thought is was a modification I had made.  Today I
finally stumbled upon the fact that it is a compiler lack of optimization
bug. (-:

It is easy to duplicate the issue with a GENERIC kernel and 10.1-BETA3.
Edit /sys/conf/kmod.pre.mk changing first _MINUS_O to '-O0'.

--- /sys/conf/kern.pre.mk       2014-09-26 06:33:38.000000000 -0400
+++ kern.pre.mk 2014-09-30 22:59:51.000000000 -0400
@@ -26,7 +26,7 @@
 SIZE?=         size

 .if defined(DEBUG)
-_MINUS_O=      -O
+_MINUS_O=      -O0
 CTFFLAGS+=     -g
 .else
 .if ${MACHINE_CPUARCH} == "powerpc"

Build GENERIC as usual and you will get a double faulting kernel. 
Should this be reported as a FreeBSD kernel bug or as a clang optimization bug?

To get a backtrace I created a kernel conf file called GDB containing:

include GENERIC
options KDB
options KDB_TRACE
options DDB
options GDB
options ALT_BREAK_TO_DEBUGGER # break is CR ~ ^b

This resulted in the following panic:

/boot/kernel/kernel text=0x1890d80 data=0xebdf0+0x163d60 syms=[0x4+0x126190+0x4+0x18bb01]
Booting...
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.1-BETA3 #0: Tue Sep 30 22:40:18 EDT 2014
    lab@test2.gta.com:/usr/obj/usr/src/sys/GDB i386
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: AMD FX(tm)-8150 Eight-Core Processor            (3573.27-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x600f12  Family = 0x15  Model = 0x1  Stepping = 2
  Features=0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x201<SSE3,SSSE3>
  AMD Features=0x2a100800<SYSCALL,NX,FFXSR,RDTSCP,LM>
  AMD Features2=0x13<LAHF,CMP,CR8>
real memory  = 2147418112 (2047 MB)
avail memory = 2072879104 (1976 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <VBOX   VBOXAPIC>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
pnpbios: Bad PnP BIOS data checksum
random device not loaded; using insecure entropy
ioapic0 <Version 1.1> irqs 0-23 on motherboard
random: <Software, Yarrow> initialized
kbd1 at kbdmux0
acpi0: <VBOX VBOXXSDT> on motherboard
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43,0x50-0x53 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX4 UDMA33 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xd000-0xd00f at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
vgapci0: <VGA-compatible display> mem 0xe0000000-0xe0ffffff irq 18 at device 2.0 on pci0
vgapci0: Boot video device
em0: <Intel(R) PRO/1000 Legacy Network Connection 1.0.6> port 0xd010-0xd017 mem 0xf0000000-0xf001ffff irq 19 at device 3.0 on pci0
em0: Ethernet address: 08:00:27:32:5e:fe
pcm0: <Intel ICH (82801AA)> port 0xd100-0xd1ff,0xd200-0xd23f irq 21 at device 5.0 on pci0
pcm0: <SigmaTel STAC9700/83/84 AC97 Codec>
ohci0: <OHCI (generic) USB controller> mem 0xf0804000-0xf0804fff irq 22 at device 6.0 on pci0
usbus0 on ohci0
pci0: <bridge> at device 7.0 (no driver attached)
ehci0: <Intel 82801FB (ICH6) USB 2.0 controller> mem 0xf0805000-0xf0805fff irq 19 at device 11.0 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
acpi_acad0: <AC Adapter> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xe2000-0xe2fff pnpid ORM0000 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
atrtc0: <AT realtime clock> at port 0x70 irq 8 on isa0
Event timer "RTC" frequency 32768 Hz quality 0
ppc0: parallel port not found.
Timecounters tick every 10.000 msec
pcm0: ac97 link rate calibration timed out after 1998076 us
em0: link state changed to UP
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 480Mbps High Speed USB v2.0
ugen0.1: <Apple> at usbus0
uhub0: <Apple OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <Intel> at usbus1
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <VBOX HARDDISK 1.0> ATA-6
Fatal double fault:
eip = 0xc10dbf34
esp = 0xe27f1000
ebp = 0xe27f1004
cpuid = 0; apic id = 00
panic: double fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper(c1ad615d,c1e7090c,5,16,0,...) at db_trace_self_wrapper+0x38/frame 0xc1e708d8
kdb_backtrace(c1c81330,0,c1c81eaf,c1e709e4,a,...) at kdb_backtrace+0x49/frame 0xc1e70940
vpanic(c1c81eaf,c1e709e4,c1e709e4,c1c81eaf,c1e70a50,...) at vpanic+0x209/frame 0xc1e709c0
panic(c1c81eaf,0,0,d,b,...) at panic+0x26/frame 0xc1e709d8
dblfault_handler() at dblfault_handler+0x14b/frame 0xc1e709d8
--- trap 0x17, eip = 0xc10dbf34, esp = 0xe27f1000, ebp = 0xe27f1004 ---
critical_enter(0,c76a3c40) at critical_enter+0x4/frame 0xe27f1004
spinlock_enter(0,0,0,0,0,...) at spinlock_enter+0x61/frame 0xe27f1014
sched_setcpu(c782b000,0,0,0,0,...) at sched_setcpu+0x7d/frame 0xe27f1068
sched_add(c782b000,0,0,0,c1e56abc,e5,c782b2e0,c782b000) at sched_add+0x10d/frame 0xe27f10c4
sched_wakeup(c782b000,0,0,0,0,...) at sched_wakeup+0xe6/frame 0xe27f10ec
setrunnable(c782b000,0,0,0,0,...) at setrunnable+0x145/frame 0xe27f111c
sleepq_resume_thread(c757d2c0,c782b000,0,37d,0,...) at sleepq_resume_thread+0x2b4/frame 0xe27f1164
sleepq_timeout(c782b000,4,e6,eeea40f0,e27f126c,...) at sleepq_timeout+0xf3/frame 0xe27f11d0
softclock_call_cc(c782b264,c1eb4700,1,ac,1f,...) at softclock_call_cc+0x3d0/frame 0xe27f1318
callout_process(50170178,3,fffffffc,16a3c40,0,...) at callout_process+0x4d5/frame 0xe27f1430
handleevents(50170178,3,0,0,0,...) at handleevents+0x4fc/frame 0xe27f1558
timercb(c1e75d78,0,0,0,0,...) at timercb+0x70c/frame 0xe27f1630
lapic_handle_timer(e27f1680) at lapic_handle_timer+0x10b/frame 0xe27f1674
Xtimerint() at Xtimerint+0x20/frame 0xe27f1674
--- interrupt, eip = 0xc1936fcf, esp = 0xe27f16c0, ebp = 0xe27f16c4 ---
write_eflags(80246,80246) at write_eflags+0xf/frame 0xe27f16c4
intr_restore(80246,80246,c76a3c40) at intr_restore+0x17/frame 0xe27f16d4
spinlock_exit(c1e377b4,4,c76a3c40,c113f1a0,c248ffc8,...) at spinlock_exit+0x52/frame 0xe27f16e8
cnputs(e27f1754,ffffffff,1,a,e27f1874,...) at cnputs+0x16e/frame 0xe27f1720
_vprintf(ffffffff,5,c19a5b0c,e27f1874,5,...) at _vprintf+0x182/frame 0xe27f181c
vprintf(c19a5b0c,e27f1874,6,e27f1874,c19a5b0c,...) at vprintf+0x45/frame 0xe27f184c
printf(c19a5b0c,e27f18d4,e27f18c4,c19d6aff,6,...) at printf+0x21/frame 0xe27f1868
ata_print_ident(c7ad699c,c19af72b,0,c19d6aac,0,...) at ata_print_ident+0x121/frame 0xe27f1914
xpt_announce_periph(c76a0100,e27f1b1c,c19af9bf,19000,0,...) at xpt_announce_periph+0x13a/frame 0xe27f1990
adaregister(c76a0100,e27f2340,0,0,0,...) at adaregister+0x1212/frame 0xe27f1d14
cam_periph_alloc(c0506b40,c05080d0,c0508190,c0508360,c19af72b,...) at cam_periph_alloc+0x510/frame 0xe27f1dc0
adaasync(0,80,e27f27c0,e27f2340,0,...) at adaasync+0x1d8/frame 0xe27f2308
xptsetasyncfunc(c7ad6800,e27f2a50,c7828800,e27f29e8,c04bea45,...) at xptsetasyncfunc+0x13e/frame 0xe27f27ec
xptdefdevicefunc(c7ad6800,e27f29e0,c76a3c40,0,0,...) at xptdefdevicefunc+0x46/frame 0xe27f2820
xptdevicetraverse(c769fd00,0,c04c7970,e27f29e0,0,...) at xptdevicetraverse+0x2c5/frame 0xe27f28b8
xptdeftargetfunc(c769fd00,e27f29e0,4,c1d7cf08,16a3c40,...) at xptdeftargetfunc+0x7a/frame 0xe27f28ec
xpttargettraverse(c7858700,0,c04c7410,e27f29e0,0,...) at xpttargettraverse+0x222/frame 0xe27f2968
xptdefbusfunc(c7858700,e27f29e0,1,c1c933b8,c7858700,...) at xptdefbusfunc+0x7a/frame 0xe27f299c
xptbustraverse(0,c04c6fe0,e27f29e0,0,2,...) at xptbustraverse+0x99/frame 0xe27f29c8
xpt_for_all_devices(c04c69f0,e27f2a50,4,ffffffff,ffffffff,...) at xpt_for_all_devices+0x5b/frame 0xe27f2a00
xpt_register_async(80,c05041a0,0,0,0,...) at xpt_register_async+0x2b4/frame 0xe27f2af4
adainit(1,2,2,0,2,...) at adainit+0x3d/frame 0xe27f2b48
periphdriver_init(2,c769f2a8,1000000,4,2,...) at periphdriver_init+0x7f/frame 0xe27f2b64
xpt_finishconfig_task(c7837780,1,4,0,0,...) at xpt_finishconfig_task+0x26/frame 0xe27f2b88
taskqueue_run_locked(c769f280,4,c76a3c40,0,0,...) at taskqueue_run_locked+0x1c7/frame 0xe27f2bec
taskqueue_thread_loop(c1eb6928,e27f2d08,0,0,0,...) at taskqueue_thread_loop+0x1cb/frame 0xe27f2c80
fork_exit(c1151cd0,c1eb6928,e27f2d08) at fork_exit+0x179/frame 0xe27f2cf4
fork_trampoline() at fork_trampoline+0x8/frame 0xe27f2cf4
--- trap 0, eip = 0, esp = 0xe27f2d40, ebp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100025 ]
Stopped at      breakpoint+0x4: popl    %ebp
db>


-- 
------------------------------------------------------------------------
Larry Baird
Global Technology Associates, Inc. 1992-2012 	| http://www.gta.com
Celebrating Twenty Years of Software Innovation | Orlando, FL
Email: lab@gta.com                 		| TEL 407-380-0220

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 04:46:37 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E9DDEDAC
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 04:46:37 +0000 (UTC)
Received: from mail-lb0-x231.google.com (mail-lb0-x231.google.com
 [IPv6:2a00:1450:4010:c04::231])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 766478FC
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 04:46:37 +0000 (UTC)
Received: by mail-lb0-f177.google.com with SMTP id w7so45649lbi.8
 for <freebsd-hackers@freebsd.org>; Tue, 30 Sep 2014 21:46:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=U+sYvnEghbeJANflAMUJ6tmJ1Fp8PKXu1tpGfNQqqa0=;
 b=vsMT+NhDDhYj83Bc4FiRZZFZTU4F4+jkOyMaabTzIg68S7yxyLgLU2MWUNulO78PLZ
 /6MKAMWUr944odYBF2h0GP4pw6dM8pIL53qaOF/db1/GtqL4BVvrbpovUf6nmCcT8aDg
 SNHWicj9W/iTC9jgUpwq+QAPvspRre2CGVvXWvi2y/fhc/S+JY/tTEjRWBzMOwD4639/
 ieFHfeDq+nzNDD/c4/LXKo3UorPJrVQYZkYD8pSydgxHmFIgkzhEVD+fffhjAQpOiIMY
 aiszY+SLd4CL62ZI4leaEj85avIh88e7mgsVWuVTVLLncQPevbjtRUce607Pd9p7zaqG
 kiEQ==
MIME-Version: 1.0
X-Received: by 10.112.13.132 with SMTP id h4mr49310987lbc.45.1412138795339;
 Tue, 30 Sep 2014 21:46:35 -0700 (PDT)
Received: by 10.25.21.197 with HTTP; Tue, 30 Sep 2014 21:46:35 -0700 (PDT)
In-Reply-To: <20141001031553.GA14360@gta.com>
References: <20141001031553.GA14360@gta.com>
Date: Wed, 1 Oct 2014 00:46:35 -0400
Message-ID: <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
Subject: Re: Kernel/Compiler bug
From: Ryan Stone <rysto32@gmail.com>
To: Larry Baird <lab@gta.com>
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 04:46:38 -0000

This may not be a compiler bug.  A quick look at the esp values
provided in that backtrace shows that at least 7KB has been used on
the stack.  The stack for kernel threads is only 8KB, and a stack
overflow can cause a double fault like that.

My suspicion would be that without optimizations on clang uses a lot
more stack space and you push over the limit.  There's a kernel build
option for the stack size that you could change to confirm.  I believe
that it's called KSTACK_PAGES.  Try increasing it to 4.

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 08:13:57 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6CB4F6A7
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 08:13:57 +0000 (UTC)
Received: from eu1sys200aog104.obsmtp.com (eu1sys200aog104.obsmtp.com
 [207.126.144.117])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id BD95CB10
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 08:13:56 +0000 (UTC)
Received: from mail-wi0-f180.google.com ([209.85.212.180]) (using TLSv1) by
 eu1sys200aob104.postini.com ([207.126.147.11]) with SMTP
 ID DSNKVCu3wpPgNln48l5JfLhTGG76+P0hCBqY@postini.com;
 Wed, 01 Oct 2014 08:13:56 UTC
Received: by mail-wi0-f180.google.com with SMTP id em10so947979wid.1
 for <freebsd-hackers@freebsd.org>; Wed, 01 Oct 2014 01:13:54 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to
 :in-reply-to;
 bh=7vg4c7WR0eKID4OszivvJskEQvYPYDMHTGRwgtJTTEE=;
 b=WHp3Vet45NR7wXpGHRgZZ9H92GpaItLFFbmgS6b8LaqyCulNAAYla0VxYYd1iQpLM4
 gqwbfU8fg3IgfbUrz95JAe5vSPmHmGpG72l4fg8+/nVVTm8+gGmRRjUfLHtZwQFlM1Uq
 5+gywVyOUfiA9OZBR40A+Ho5XjNyHOp2wDe5ABBHTIy6iBh4JHVfwwr+FZZqkuTVJQdj
 xKAEcDVFys4Xkf3l/tllshwxmDQk6EhTRXXCnb6Gfv/xFFZN7IsJWsoZd9h4t3sUmWCT
 nMsmNUBhUfPkG0JlJf7uN0m2aABuNJ5AyC0eHGPle2opcyI7EhKapX7YWx7roZRG0iCR
 kPxQ==
X-Received: by 10.180.83.103 with SMTP id p7mr12010340wiy.67.1412150902463;
 Wed, 01 Oct 2014 01:08:22 -0700 (PDT)
X-Gm-Message-State: ALoCoQnCpkBeGo8E+XtE+2kzgmueepPCvkUjU27Hj84aWxy7kAEMNwkK5QL3ul2sxZMblej5X2wo1kcrrgDJuepkuVDjmEnvzMdHhCBj+RVyLkokpIG5f8OKzzknoK1hoJI3lHjeXIYcnAPL8cNTWBXVEMeYNNchZw==
X-Received: by 10.180.83.103 with SMTP id p7mr12010322wiy.67.1412150902335;
 Wed, 01 Oct 2014 01:08:22 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk.
 [137.222.187.221])
 by mx.google.com with ESMTPSA id ny6sm17678031wic.22.2014.10.01.01.08.21
 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 01 Oct 2014 01:08:21 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1])
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s9188KNW083914
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 1 Oct 2014 09:08:20 +0100 (BST)
 (envelope-from mexas@mech-as221.men.bris.ac.uk)
Received: (from mexas@localhost)
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s9188KVc083913;
 Wed, 1 Oct 2014 09:08:20 +0100 (BST) (envelope-from mexas)
Date: Wed, 1 Oct 2014 09:08:20 +0100 (BST)
From: Anton Shterenlikht <mexas@bris.ac.uk>
Message-Id: <201410010808.s9188KVc083913@mech-as221.men.bris.ac.uk>
To: allanjude@freebsd.org, m.e.sanliturk@gmail.com
Subject: Re: cluster FS?
Reply-To: mexas@bristol.ac.uk
In-Reply-To: <CAOgwaMsAj5+oiUZEUtuc8uLHtyOm8oVdv0TUYbATm1bqdhHbWQ@mail.gmail.com>
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 08:13:57 -0000

>From owner-freebsd-hackers@freebsd.org Wed Oct  1 03:25:08 2014
>
>On Tue, Sep 30, 2014 at 9:14 PM, Allan Jude <allanjude@freebsd.org> wrote:
>
>> On 2014-09-30 04:45, Anton Shterenlikht wrote:
>> > Hello
>> >
>> > Not sure if this is the right list...
>> > I wanted to ask about a cluster file system.
>> > Is there something like this on FreeBSD?
>> >
>> > It seems to me (just from reading the handbook)
>> > that none of NFS, HAST or iSCSI provide this.
>> >
>> > My specific needs are as follows.
>> > I have multiple nodes and a disk array.
>> > Each node is connected by fibre to the disk array.
>> > I want to have each node read/write access
>> > to all disks on disk array.
>> > So that if any node fails, the
>> > data is still accessible
>> > via the remaining nodes.
>> >
>> > I want to have all nodes equal, i.e. no master/slave
>> > or server/client model. Also, the disk array
>> > provides adequate RAID already, so that is not
>> > needed either.
>> >
>> > In the archives I see that the demands for
>> > a cluster FS support on FreeBSD have been expressed
>> > periodically over a very long time, but seems
>> > there's never been any resolution.
>> > Some people mention GFS, but I've no idea
>> > if this what I'm trying to describe.
>> >
>> > So is what I'm describing a cluster FS at all?
>> > Is there something like this on FreeBSD already?
>> > Is there someting in ports that can be used
>> > to achive this?
>> >
>> > Thanks
>> >
>> > Anton
>> >
>> > _______________________________________________
>> > freebsd-hackers@freebsd.org mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> > To unsubscribe, send any mail to "
>> freebsd-hackers-unsubscribe@freebsd.org"
>> >
>>
>> What you are describing doesn't really seem to be a 'cluster' FS.
>>
>> In a cluster, the disks would reside in multiple machines, and the 'file
>> system' would withstand any one of those machines going down. That is
>> quite a bit different than just wanting a bunch of clients to have
>> concurrent access to a single disk array.
>>
>> If you explain your use-case in more detail, we may be able to guide you
>> in the right direction.
>>
>> --
>> Allan Jude
>>
>>
>
>The following pages and their associated pages may be useful for
>definitions of terms and available capabilities :
>
>http://en.wikipedia.org/wiki/Parallel_Virtual_Machine
>http://en.wikipedia.org/wiki/Linda_%28coordination_language%29
>
>http://en.wikipedia.org/wiki/Category:Parallel_computing
>http://en.wikipedia.org/wiki/Category:Concurrent_computing
>http://en.wikipedia.org/wiki/Category:Distributed_computing
>
>
>http://en.wikipedia.org/wiki/Network-attached_storage
>http://en.wikipedia.org/wiki/Clustered_file_system
>http://en.wikipedia.org/wiki/Category:Shared_disk_file_systems
>http://en.wikipedia.org/wiki/Category:Network_file_systems
>
>
>http://en.wikipedia.org/wiki/Ceph_%28software%29
>http://en.wikipedia.org/wiki/XtreemFS
>
>
>
>The above problem seems to be "Network-attached_storage" .

Now I'm even more confused.

I think what I have is called SAN.
The disk array is HP MSA1000:
 http://www8.hp.com/h20195/v2/GetDocument.aspx?docname=c04324510

*quote*
The HP StorageWorks Modular Smart Array 1000 (MSA1000)
is a 2 Gb Fibre Channel storage system
designed for the entry-level to mid-range Storage Area Network (SAN).
*end quote*

The disk array has 8-port 2 Gb Fibre Channel Fabric Switch.
At present I connect 3 FreeBSD 10 nodes to the disk array via fibre.
However, only one node at a time is able to mount disks.
What I'm looking for is the solution to
be able to mount the disks on the disk array for
read/write access from all nodes, up to 8.
So that if a node fails, the data is still accessible
via the other nodes.

The model that I'm describing is a VMS cluster model.
I'm not sure if it makes sense for FreeBSD.

Thanks

Anton

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 08:20:40 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9E408903;
 Wed,  1 Oct 2014 08:20:40 +0000 (UTC)
Received: from mail.iXsystems.com (mail.ixsystems.com [12.229.62.4])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client CN "*.ixsystems.com",
 Issuer "Go Daddy Secure Certification Authority" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 8483AB7A;
 Wed,  1 Oct 2014 08:20:40 +0000 (UTC)
Received: from localhost (mail.ixsystems.com [10.2.55.1])
 by mail.iXsystems.com (Postfix) with ESMTP id 72DE780D96;
 Wed,  1 Oct 2014 01:20:39 -0700 (PDT)
Received: from mail.iXsystems.com ([10.2.55.1])
 by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP
 id 02411-09; Wed,  1 Oct 2014 01:20:39 -0700 (PDT)
Received: from [10.8.0.26] (unknown [10.8.0.26])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.iXsystems.com (Postfix) with ESMTPSA id F134880D90;
 Wed,  1 Oct 2014 01:20:37 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1988\))
Subject: Re: cluster FS?
From: Jordan Hubbard <jkh@mail.turbofuzz.com>
In-Reply-To: <201410010808.s9188KVc083913@mech-as221.men.bris.ac.uk>
Date: Wed, 1 Oct 2014 11:20:33 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <A70BA2F6-5EC6-490C-B012-D35A54D5FA9D@mail.turbofuzz.com>
References: <201410010808.s9188KVc083913@mech-as221.men.bris.ac.uk>
To: mexas@bristol.ac.uk
X-Mailer: Apple Mail (2.1988)
Cc: freebsd-hackers@freebsd.org, allanjude@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 08:20:40 -0000


> On Oct 1, 2014, at 11:08 AM, Anton Shterenlikht <mexas@bris.ac.uk> =
wrote:
>=20
> The model that I'm describing is a VMS cluster model.
> I'm not sure if it makes sense for FreeBSD.

It does not.  FreeBSD does not currently offer any form of clustered =
filesystem support, nor would a SAN provide this in any case since =
it=E2=80=99s just a shared fabric for a single set of storage devices.   =
You could front-end your SAN with a NAS, but that would simply re-export =
the SAN as a shared file system such as NFS, which would let multiple =
clients see it but still provide no fail-over if the NAS or primary SAN =
controller died.

You are trying to create an active/active fail-over system with multiple =
modes.  You cannot get there from where you are starting.  This is =
basically a =E2=80=9Cstart over=E2=80=9D proposition, and why folks like =
NetApp and EMC sell a lot of fileservers to replace existing SAN =
solutions.

- Jordan


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 09:02:33 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0513DACA
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 09:02:33 +0000 (UTC)
Received: from eu1sys200aog124.obsmtp.com (eu1sys200aog124.obsmtp.com
 [207.126.144.157])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 53A8E17A
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 09:02:31 +0000 (UTC)
Received: from mail-wi0-f178.google.com ([209.85.212.178]) (using TLSv1) by
 eu1sys200aob124.postini.com ([207.126.147.11]) with SMTP
 ID DSNKVCvDH+Ra4b6MoMec82Wsdw7pdHjBNe+5@postini.com;
 Wed, 01 Oct 2014 09:02:32 UTC
Received: by mail-wi0-f178.google.com with SMTP id cc10so1072574wib.5
 for <freebsd-hackers@freebsd.org>; Wed, 01 Oct 2014 02:02:23 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to
 :in-reply-to;
 bh=aIuwTz+5Yt8whj6pUmpm5BIqAWp1e5WY6UTMvKepGmo=;
 b=Srgmv72spFjOpbigYHp7RpFA9OMYeQFb+HJPOth0CDAoO+ZkDE5FSIEHP3DqAoEudT
 dph1S70UdsN/L0lDSQLDtdDM+uGM5VwMwGzBPQpUebKIGtwbuDeImNgb3+vxoFKD4YZR
 PWg36Y01lr6LINnir+oeONFy6l+FSA4siN5luDQ20xiSId0oKB2oWsyECtn80uKeXrh0
 /kE6wyopU55ZUlcOilrywQ+NgT7XoU7YmOSC2ASHTU9mBpKeKes+M9m7vXOx5KGQtPu0
 nj/a9LBFcHI22IKiaBYnoQFbi1wqVUPgExs4K8XwshjfHYGaH44h5ZM09x6D/v/G78J7
 YgTw==
X-Gm-Message-State: ALoCoQnq3sD16mZsuwlH0oPp9+tez8q5B4ZH61MCxwuPbN1Ok8F0e8J20FYmY31KCcxEUSK1UQdEmzTN+rs8MBZK+mStNuT+9qyTt9jUQxlZq86kGJdCUnAt99ipb1bsb0W30TIuQlRetBh9BYpu/ygbEgHjdEOLuQ==
X-Received: by 10.180.95.66 with SMTP id di2mr12603972wib.60.1412154143633;
 Wed, 01 Oct 2014 02:02:23 -0700 (PDT)
X-Received: by 10.180.95.66 with SMTP id di2mr12603955wib.60.1412154143533;
 Wed, 01 Oct 2014 02:02:23 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk.
 [137.222.187.221])
 by mx.google.com with ESMTPSA id k2sm422846wjy.34.2014.10.01.02.02.22
 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 01 Oct 2014 02:02:22 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1])
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s9192L50084233
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 1 Oct 2014 10:02:21 +0100 (BST)
 (envelope-from mexas@mech-as221.men.bris.ac.uk)
Received: (from mexas@localhost)
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s9192Lhb084232;
 Wed, 1 Oct 2014 10:02:21 +0100 (BST) (envelope-from mexas)
Date: Wed, 1 Oct 2014 10:02:21 +0100 (BST)
From: Anton Shterenlikht <mexas@bris.ac.uk>
Message-Id: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk>
To: jkh@mail.turbofuzz.com, mexas@bristol.ac.uk
Subject: Re: cluster FS?
Reply-To: mexas@bristol.ac.uk
In-Reply-To: <A70BA2F6-5EC6-490C-B012-D35A54D5FA9D@mail.turbofuzz.com>
Cc: freebsd-hackers@freebsd.org, allanjude@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 09:02:33 -0000

>From jkh@mail.turbofuzz.com Wed Oct  1 09:26:57 2014
>
>You are trying to create an active/active fail-over system with multiple modes.  You cannot get there from where you are starting.  This is basically a “start over” proposition, and why folks like NetApp and EMC sell a lot of fileservers to replace existing SAN solutions.

So are you saying that the SAN model
is not good for active/active failover
with multiple nodes?

Clearly if SAN itself fails, then the data
is not accessible. From what I understand,
in really mission critical systems people
use multiple SANs with multiple nodes, with
some extra data synchronisation mechanisms
between those multiple SANs.

Are you saying there are better solutions
for high availability?

Thanks

Anton


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 09:12:59 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 22251DF0;
 Wed,  1 Oct 2014 09:12:59 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A431D300;
 Wed,  1 Oct 2014 09:12:58 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s919CqT7008194
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Wed, 1 Oct 2014 12:12:52 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s919CqT7008194
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s919Cqtl008193;
 Wed, 1 Oct 2014 12:12:52 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 1 Oct 2014 12:12:52 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: leon zadorin <leonleon77@gmail.com>
Subject: Re: does linsysfs support mmap on pci resources (e.g. pci device's
 registers etc.)
Message-ID: <20141001091252.GP26076@kib.kiev.ua>
References: <CAPpySAaJ2CbsOMrvsrrstD_EQxt3d015T6HgF-Y9zPC7WAh4vA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAPpySAaJ2CbsOMrvsrrstD_EQxt3d015T6HgF-Y9zPC7WAh4vA@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: freebsd-emulation@freebsd.org, hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 09:12:59 -0000

You choice of the list to ask the question is weird.  I added hackers@
as more suitable ML.

On Wed, Oct 01, 2014 at 03:44:48PM +1000, leon zadorin wrote:
> Hello everyone,
> Sorry if this is a bit of a noob question -- I'm just starting on this
> topic... does FreeBSD's emulation of sysfs (from linux world) support
> "mmap" on pci resources?
> 
> Something similar to the following in the linux environment:
> 
> fd = open("/sys/devices/pci0001\:00/0001\:00\:07.0/resource0", O_RDWR | O_SYNC);
> ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> printf("PCI BAR0 0x0000 = 0x%4x\n",  *((unsigned short *) ptr);
> 
> (above taken from
> http://billfarrow.blogspot.com.au/2010/09/userspace-access-to-pci-memory.html)
> 
> The reason I am asking is because I would like to map pci device
> registers/memory in user space (and read/write some of the device's
> registers from userspace). The reasons are auxiliary to this post
> (e.g. kernel-bypass, system call bypass, etc.) At this stage it would
> suffice to simply accept that user space pci-register access is needed
> without paying the price of any system/ioctl/etc. call on every
> access-instance to device's config/control register(s).
> 
> I would prefer to avoid writing additional explicit (albeit generic)
> pci related kernel module in order to provide "mmapping" of the given
> pci resources to userspace  if there is already such a generic way to
> do it via sysfs "syntax" (I would like to reduce any of the
> specific/additional code re-writing at the kernel level as much as
> possible).

AFAIK, there is no facilities in FreeBSD kernel which allow you to get
the configuration registers or memory BARs mmapped into the userspace.
The linsysfs is out of question for this sort of hacks. The native
FreeBSD' /dev/pci does not support mmaping either.

It should be not too hard to extend the /dev/pci to do what you described.
Start looking at the sys/dev/pci/pci_user.c

PCIe configuration window is active, so you could access it by hand by
mmapping /dev/mem. Also, the window is mapped into KVA, so you could
access it by /dev/kmem as well. /dev/mem would be easier, I think,
because it needs the physical address, which can be learned from ACPI
MCFG much easier than the value of the static symbol pcie_base.

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 09:38:38 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0A459441;
 Wed,  1 Oct 2014 09:38:38 +0000 (UTC)
Received: from mail.iXsystems.com (mail.ixsystems.com [12.229.62.4])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client CN "*.ixsystems.com",
 Issuer "Go Daddy Secure Certification Authority" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E49B474E;
 Wed,  1 Oct 2014 09:38:37 +0000 (UTC)
Received: from localhost (mail.ixsystems.com [10.2.55.1])
 by mail.iXsystems.com (Postfix) with ESMTP id 769FB7AC86;
 Wed,  1 Oct 2014 02:38:36 -0700 (PDT)
Received: from mail.iXsystems.com ([10.2.55.1])
 by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP
 id 07002-03; Wed,  1 Oct 2014 02:38:36 -0700 (PDT)
Received: from [10.8.0.26] (unknown [10.8.0.26])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.iXsystems.com (Postfix) with ESMTPSA id 3F2E27AC82;
 Wed,  1 Oct 2014 02:38:35 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1988\))
Subject: Re: cluster FS?
From: Jordan Hubbard <jkh@mail.turbofuzz.com>
In-Reply-To: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk>
Date: Wed, 1 Oct 2014 12:38:32 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com>
References: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk>
To: mexas@bristol.ac.uk
X-Mailer: Apple Mail (2.1988)
Cc: freebsd-hackers@freebsd.org, allanjude@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 09:38:38 -0000


> On Oct 1, 2014, at 12:02 PM, Anton Shterenlikht <mexas@bris.ac.uk> =
wrote:
>=20
> So are you saying that the SAN model
> is not good for active/active failover
> with multiple nodes?

Correct.  SAN is active/passive.

For more information on high availability solutions, I suggest you check =
out the big file server vendors - there=E2=80=99s far more pertinent =
information in their various whitepapers then you=E2=80=99ll ever get on =
freebsd-hackers. :)

- Jordan


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 10:17:28 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B79461EB
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 10:17:28 +0000 (UTC)
Received: from eu1sys200aog106.obsmtp.com (eu1sys200aog106.obsmtp.com
 [207.126.144.121])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 15910BAC
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 10:17:27 +0000 (UTC)
Received: from mail-wi0-f177.google.com ([209.85.212.177]) (using TLSv1) by
 eu1sys200aob106.postini.com ([207.126.147.11]) with SMTP
 ID DSNKVCvUnz558kOhXJ4zAHlAAANK9pjTgsUs@postini.com;
 Wed, 01 Oct 2014 10:17:28 UTC
Received: by mail-wi0-f177.google.com with SMTP id ho1so40717wib.4
 for <freebsd-hackers@freebsd.org>; Wed, 01 Oct 2014 03:17:03 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to
 :in-reply-to;
 bh=N7r4MxrrXIrcekcGQkG+MlfBVexufpd2gO/cBXqt2PU=;
 b=KcIlxVOWPe5l1GvmfLmxwoRoAHWlGXnP3u/7Jdjf9/Z0xJDuoYOorPi/0PE+3Dq/aw
 +LMn5nIT3Da3H3rP7JFQTEqDcOnuuVVy/T1ux3qvARTw7MB5DV/tuwesEtVPSmJIafHH
 raqjm+mX2NT1Krg18v50FADlgYNZn3giqqbHCDik7WXahINknhQDluz4zwBjLTCYDiod
 koAHLYkTmhxY7KRaj1fgVGyauNIfaTFjhZ15JqiquFPivT0/Ck+yT3fpmIizwZ3F4Qjd
 R0cjE1cUd7b/RLgjeFTRVGLW8uQHflWDOkvwhMSp1jOLOVRs+4nkCfDSB77wxF9oVbU6
 VFeA==
X-Gm-Message-State: ALoCoQms+eMIlG2C8+AbkByiRAncr7S1Qao3bXryqGepd6ad7LwVeyZvGmlj7717ZUoSDrfz68K9/tQjcmid5+eOJIEdn0ukqp8mca2xrHFtc3LnDXMAqrq3Gz+JYmf4teiWAnByGdJMGS/+4jHcwStx/EeAA8SWyw==
X-Received: by 10.180.80.198 with SMTP id t6mr13323767wix.6.1412158623695;
 Wed, 01 Oct 2014 03:17:03 -0700 (PDT)
X-Received: by 10.180.80.198 with SMTP id t6mr13323744wix.6.1412158623519;
 Wed, 01 Oct 2014 03:17:03 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk.
 [137.222.187.221])
 by mx.google.com with ESMTPSA id au4sm655264wjc.15.2014.10.01.03.17.02
 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 01 Oct 2014 03:17:02 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1])
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s91AH1K0084405
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 1 Oct 2014 11:17:01 +0100 (BST)
 (envelope-from mexas@mech-as221.men.bris.ac.uk)
Received: (from mexas@localhost)
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s91AH1Lo084404;
 Wed, 1 Oct 2014 11:17:01 +0100 (BST) (envelope-from mexas)
Date: Wed, 1 Oct 2014 11:17:01 +0100 (BST)
From: Anton Shterenlikht <mexas@bris.ac.uk>
Message-Id: <201410011017.s91AH1Lo084404@mech-as221.men.bris.ac.uk>
To: jkh@mail.turbofuzz.com, mexas@bristol.ac.uk
Subject: Re: cluster FS?
Reply-To: mexas@bristol.ac.uk
In-Reply-To: <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com>
Cc: freebsd-hackers@freebsd.org, allanjude@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 10:17:28 -0000

>From jkh@mail.turbofuzz.com Wed Oct  1 10:42:50 2014
>
>
>> On Oct 1, 2014, at 12:02 PM, Anton Shterenlikht <mexas@bris.ac.uk> wrote:
>> 
>> So are you saying that the SAN model
>> is not good for active/active failover
>> with multiple nodes?
>
>Correct.  SAN is active/passive.
>
>For more information on high availability solutions, I suggest you check out the big file server vendors - there’s far more pertinent information in their various whitepapers then you’ll ever get on freebsd-hackers. :)

I thought HP was the "big fileserver vendor"...

Also, the SAN array I'm using does support
active/active model since 2006:

http://eis.bris.ac.uk/~mexas/aa.pdf

*quote*
HP StorageWorks 1000 Modular Smart Array
Announcing active/active support

A recent web release of alternative MSA controller
firmware includes important new features,
including active/active controllers
*end quote*

Or am I confusing the issues again?

Many thanks for your time.
I do appreciate your replies.

Anton


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 10:16:30 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 96817195;
 Wed,  1 Oct 2014 10:16:30 +0000 (UTC)
Received: from mail-ig0-x229.google.com (mail-ig0-x229.google.com
 [IPv6:2607:f8b0:4001:c05::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5E701BA4;
 Wed,  1 Oct 2014 10:16:30 +0000 (UTC)
Received: by mail-ig0-f169.google.com with SMTP id uq10so6058igb.2
 for <multiple recipients>; Wed, 01 Oct 2014 03:16:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=UKWJQCZfy2zGhm12ph4huStaXHkz3lVgaU3eUEKBYaQ=;
 b=MLBjSu6Ylhwbyfr2XgE7lSPs7opw32oCOtCTiJzolb+BUjR40GqIVIUp0QzRkeB7lZ
 MW8G46i1bA28Kk20wzOh8Q0BLFY1pb/Cq0Q6s+HvkkzfRUvo+Ee4sHV3Nwuk0xB75fxd
 kMnrW1igjmcvJnilG1t2xHnZbiQxlvppne+kZ81BJTJPQB3AQ/eh3RjCKt0VWLjF+3ug
 hCN9SDEk8ocMWy02EyTdu+nT7NmcAICjh2L6g3R+dSbcdnnnXkWUCFhmFP8O++l6sTRL
 RCt9M9DhDxFMfijJ52UIkH78DGZ3x26Cf7U7q5anpTyPnXm7ZgeFuzQyWhpZmlDRoUuy
 0FLA==
MIME-Version: 1.0
X-Received: by 10.50.73.130 with SMTP id l2mr17423611igv.42.1412158589726;
 Wed, 01 Oct 2014 03:16:29 -0700 (PDT)
Received: by 10.50.2.69 with HTTP; Wed, 1 Oct 2014 03:16:29 -0700 (PDT)
In-Reply-To: <20141001091252.GP26076@kib.kiev.ua>
References: <CAPpySAaJ2CbsOMrvsrrstD_EQxt3d015T6HgF-Y9zPC7WAh4vA@mail.gmail.com>
 <20141001091252.GP26076@kib.kiev.ua>
Date: Wed, 1 Oct 2014 20:16:29 +1000
Message-ID: <CAPpySAbeajga3ciPOD=X3yVfJPjZV+WuGM3YGt2DhXjyW-z_jA@mail.gmail.com>
Subject: Re: does linsysfs support mmap on pci resources (e.g. pci device's
 registers etc.)
From: leon zadorin <leonleon77@gmail.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-Mailman-Approved-At: Wed, 01 Oct 2014 11:13:50 +0000
Cc: freebsd-emulation@freebsd.org, hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 10:16:30 -0000

On Wed, Oct 1, 2014 at 7:12 PM, Konstantin Belousov <kostikbel@gmail.com> wrote:
> You choice of the list to ask the question is weird.  I added hackers@
> as more suitable ML.

I see,

Sure thing about adding the post to another list -- sorry about doing
the original post to this list (freebsd-emulation).

I had read the lists's description:
"A list for the Development of Emulators of other operating systems
and enviroments for FreeBSD. These include: BSDI, Linux, and some
microsoft products.
" https://lists.freebsd.org/mailman/listinfo/freebsd-emulation

and given that my post was essentially about whether FreeBSD emulated
linux OS "sysfs" feature(s) I thought it might be of some relevance to
the list... but being a noob to the list I have possibly not made the
best choice :)

> On Wed, Oct 01, 2014 at 03:44:48PM +1000, leon zadorin wrote:
>> Hello everyone,
>> Sorry if this is a bit of a noob question -- I'm just starting on this
>> topic... does FreeBSD's emulation of sysfs (from linux world) support
>> "mmap" on pci resources?
>>
>> Something similar to the following in the linux environment:
>>
>> fd = open("/sys/devices/pci0001\:00/0001\:00\:07.0/resource0", O_RDWR | O_SYNC);
>> ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> printf("PCI BAR0 0x0000 = 0x%4x\n",  *((unsigned short *) ptr);
>>
>> (above taken from
>> http://billfarrow.blogspot.com.au/2010/09/userspace-access-to-pci-memory.html)
[...]
> AFAIK, there is no facilities in FreeBSD kernel which allow you to get
> the configuration registers or memory BARs mmapped into the userspace.
> The linsysfs is out of question for this sort of hacks. The native
> FreeBSD' /dev/pci does not support mmaping either.
>
> It should be not too hard to extend the /dev/pci to do what you described.
> Start looking at the sys/dev/pci/pci_user.c
>
> PCIe configuration window is active, so you could access it by hand by
> mmapping /dev/mem. Also, the window is mapped into KVA, so you could
> access it by /dev/kmem as well. /dev/mem would be easier, I think,
> because it needs the physical address, which can be learned from ACPI
> MCFG much easier than the value of the static symbol pcie_base.

I see -- thanks for the pointers! I shall consider my options :)

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 11:03:23 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 806DA129
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 11:03:23 +0000 (UTC)
Received: from mail-yk0-x22d.google.com (mail-yk0-x22d.google.com
 [IPv6:2607:f8b0:4002:c07::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4442DF6
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 11:03:23 +0000 (UTC)
Received: by mail-yk0-f173.google.com with SMTP id 200so22098ykr.32
 for <freebsd-hackers@freebsd.org>; Wed, 01 Oct 2014 04:03:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=0zuYDM9ISq6JEAr6aV02oqJu8Sw1mk/saBPQLcwmqbk=;
 b=WDXQP4N2bhWGjTrSo5LKqzYiYnxzsyvbZRhT6bImkPbTmR1ZFxDtOG/lI8s/2lAr9I
 q+vP1nxnrvoQDHEWuuK4DoYgZ7NkiRL5tMTriqsubr4Ne6CnVjkrPiUaMf5GTSCA//Vf
 oPk061GmbR65Csdfz7wAPlczHS8g2IZxbASbyoNBmEliPqRR+saNPjG9CQN+cc8LsRgw
 sFtGB16EjRTBWj1oq9cTFCukH5HwYm10/3l2aS4yZQJ9Zdk6GAgbO927nE4EUHA6AvTc
 TvQ34SMzG13+QQmdCIwE5WhOv3bJjjmZKR157hENCUgr1NPhhxyfjthfX80L3kqLdxSV
 2/BQ==
MIME-Version: 1.0
X-Received: by 10.236.128.33 with SMTP id e21mr2618520yhi.187.1412161402277;
 Wed, 01 Oct 2014 04:03:22 -0700 (PDT)
Received: by 10.170.156.139 with HTTP; Wed, 1 Oct 2014 04:03:22 -0700 (PDT)
In-Reply-To: <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com>
References: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk>
 <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com>
Date: Wed, 1 Oct 2014 12:03:22 +0100
Message-ID: <CALfReydSkufU84UftsQoJd9RrkTj0FEzSDOQVcQJ_c7HBg9Jbw@mail.gmail.com>
Subject: Re: cluster FS?
From: krad <kraduk@gmail.com>
To: mexas@bristol.ac.uk
X-Mailman-Approved-At: Wed, 01 Oct 2014 11:14:44 +0000
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 11:03:23 -0000

These are my definitions, hopefully it makes some stuff a little clearer

Cluster file system: a file system that resides on a block device that
multiple machines have rw access to, but that consistency is guaranteed. A
good real world example of this is an VMware ESX datastore. ie a lun is
presented to all the esxi hosts in the cluster, all of which can access it
simultaneously. The key thing here is the guarantee of consistency.

Distributed file system: A network file system that is created out of
multiple nodes working together to provide a fault tolerant service.
examples of this is luster, glusterfs, moosefs, p-nfs, openafs. One of the
key things to understand here is that these file systems generally sit on
top of the normal os file systems and each node has its own discreet
storage. All replication is done via the network.

Looking at your setup, if you want to provide a fault tolerant setup with
your existing san there are two main paths I can think of. I am making the
assumption the san is fault tolerant to your requirements

option one
create a set of LUNs and present them to your file server nodes.
on one node create the file systems of your choice (prob zfs)
setup carp in a master/slave setup with a vip, and import/export functions
for the file systems
export your file systems via nfs/cifs

If you are dead set on using freebsd for this it will be more tricky to do
this as a lot of work will have to be done by yourself. The main thing is
making sure you dont have fs mounted in both nodes at once in a split brain
scenario.  If you can use other OS's something like sun cluster/veritas
cluster/red hat cluster can do all of this for you. The advantages of this
arch is that if you go for one of the commercial solutions you will have
support and there are plenty of people out there with experience in this.

option two
use a distributed file system

Basically here you would create 2x sets of luns and present one set to each
node, and only one node. Format and mount up the luns to your preferences
on each system.
Install and configure the distributed file system of your choice and use
the newly mounted file systems on each node as your datastores

You should probably look at moosefs and glusterfs 1st, and then maybe
openafs if you are going to use freebsd as the host system, but if you went
for linux you would have a bigger choice at present

On the two nodes of the distributed file system you would then want a
relatively simple carp setup to float a VIP between the boxes. All clients
would use this vip for their connection points. Also make sure the
distributed FS is mounted back onto the Node as a normal mount point. This
allows you to re-export it via cifs and NFS


Finally for the clients. They have 3 basic ways of connecting to the vip.
These should cover most eventualities

1. Native distributed fs client.
2. NFS.
3. CIFS

The advantages of this over option one is it scales very well depending on
your distributed fs of choice. It also means you can easily break away from
you san over time if you want to, as all you need to do is add more nodes
not on the san, and replicate to storage to them, then drop out the san
nodes.

I hoe this helps a little


On 1 October 2014 10:38, Jordan Hubbard <jkh@mail.turbofuzz.com> wrote:

>
> > On Oct 1, 2014, at 12:02 PM, Anton Shterenlikht <mexas@bris.ac.uk>
> wrote:
> >
> > So are you saying that the SAN model
> > is not good for active/active failover
> > with multiple nodes?
>
> Correct.  SAN is active/passive.
>
> For more information on high availability solutions, I suggest you check
> out the big file server vendors - there=E2=80=99s far more pertinent info=
rmation in
> their various whitepapers then you=E2=80=99ll ever get on freebsd-hackers=
. :)
>
> - Jordan
>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org=
"
>

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 12:40:21 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9EB6A25E
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 12:40:21 +0000 (UTC)
Received: from mail.iXsystems.com (mail.ixsystems.com [12.229.62.4])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client CN "*.ixsystems.com",
 Issuer "Go Daddy Secure Certification Authority" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 83698E1D
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 12:40:21 +0000 (UTC)
Received: from localhost (mail.ixsystems.com [10.2.55.1])
 by mail.iXsystems.com (Postfix) with ESMTP id B99287FA59;
 Wed,  1 Oct 2014 05:40:20 -0700 (PDT)
Received: from mail.iXsystems.com ([10.2.55.1])
 by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP
 id 26186-06; Wed,  1 Oct 2014 05:40:20 -0700 (PDT)
Received: from [10.8.0.38] (unknown [10.8.0.38])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.iXsystems.com (Postfix) with ESMTPSA id A96437FA56;
 Wed,  1 Oct 2014 05:40:19 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1988\))
Subject: Re: cluster FS?
From: Jordan Hubbard <jkh@mail.turbofuzz.com>
In-Reply-To: <CALfReydSkufU84UftsQoJd9RrkTj0FEzSDOQVcQJ_c7HBg9Jbw@mail.gmail.com>
Date: Wed, 1 Oct 2014 15:40:16 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <D4F2247F-0FFD-4D32-A61B-FDFF39A2E2E5@mail.turbofuzz.com>
References: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk>
 <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com>
 <CALfReydSkufU84UftsQoJd9RrkTj0FEzSDOQVcQJ_c7HBg9Jbw@mail.gmail.com>
To: krad <kraduk@gmail.com>
X-Mailer: Apple Mail (2.1988)
Cc: freebsd-hackers@freebsd.org, mexas@bristol.ac.uk
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 12:40:21 -0000


> On Oct 1, 2014, at 2:03 PM, krad <kraduk@gmail.com> wrote:
>=20
> These are my definitions, hopefully it makes some stuff a little =
clearer

Thanks for the exposition - if that doesn=E2=80=99t help Anton, I =
don=E2=80=99t know what will. :-)

To answer Anton=E2=80=99s previous question, he just needs to read the =
PDF he cited a little more closely.  HP has obviously provided some sort =
of concurrent access mode to their SAN, but it's only active/active if =
you have one of the supported operating systems.  Presumably, HP also =
provides drivers for those OSes which provide some sort of interlock =
support, though again, it=E2=80=99s not clear just what sort of =
filesystems you can put on the SAN and still keep the active/active =
concurrency.  It=E2=80=99s very tricky, and the penalty for getting it =
wrong is corrupted data, so I=E2=80=99d tend to put my money on an =
actual filesystem-level solution which provides concurrent access, like =
glusterfs.  That just went BETA with FreeBSD support, so who knows, =
maybe it=E2=80=99s becoming a viable solution.  I have zero experience =
with deploying glusterfs, however, so I cannot speak to that.

- Jordan


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 13:40:46 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 25C18848
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 13:40:46 +0000 (UTC)
Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23])
 by mx1.freebsd.org (Postfix) with ESMTP id CE9C9816
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 13:40:45 +0000 (UTC)
Received: (qmail 57398 invoked by uid 1000); 1 Oct 2014 13:40:44 -0000
Date: Wed, 1 Oct 2014 09:40:44 -0400
From: Larry Baird <lab@gta.com>
To: Ryan Stone <rysto32@gmail.com>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141001134044.GA57022@gta.com>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 13:40:46 -0000

Ryan,

On Wed, Oct 01, 2014 at 12:46:35AM -0400, Ryan Stone wrote:
> This may not be a compiler bug.  A quick look at the esp values
> provided in that backtrace shows that at least 7KB has been used on
> the stack.  The stack for kernel threads is only 8KB, and a stack
> overflow can cause a double fault like that.
> 
> My suspicion would be that without optimizations on clang uses a lot
> more stack space and you push over the limit.  There's a kernel build
> option for the stack size that you could change to confirm.  I believe
> that it's called KSTACK_PAGES.  Try increasing it to 4.
Good catch.  Increasing KSTACK_PAGES does fix the issue.  I wonder with
optimization, how close to stack overflow does the kernel get during boot?

Thank you,
Larry

-- 
------------------------------------------------------------------------
Larry Baird
Global Technology Associates, Inc. 1992-2012 	| http://www.gta.com
Celebrating Twenty Years of Software Innovation | Orlando, FL
Email: lab@gta.com                 		| TEL 407-380-0220

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 13:48:32 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A2249A48
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 13:48:32 +0000 (UTC)
Received: from eu1sys200aog113.obsmtp.com (eu1sys200aog113.obsmtp.com
 [207.126.144.135])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F275F8F8
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 13:48:31 +0000 (UTC)
Received: from mail-wi0-f172.google.com ([209.85.212.172]) (using TLSv1) by
 eu1sys200aob113.postini.com ([207.126.147.11]) with SMTP
 ID DSNKVCwGFL+8d+LKzl0cuTyRTDQqw+GpPyFT@postini.com;
 Wed, 01 Oct 2014 13:48:32 UTC
Received: by mail-wi0-f172.google.com with SMTP id n3so618506wiv.11
 for <freebsd-hackers@freebsd.org>; Wed, 01 Oct 2014 06:48:04 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to
 :in-reply-to;
 bh=GDaz6fv1JfHIyZ/Me8JYDxLcSB3R0DOIXHmANdG+f6Q=;
 b=FYu453AkOfZf6rz2kUZymBqAt3HAwyxOG1ut6h8V294X++a+CobY3SKAU1bJ1B21zC
 zruuaO4tc2YZqLRNwO3+jWrFSMsTcXjWrhBjEQU6EcHhjCvCaUnGnSLJUCN/SOMf87F/
 0CkD0mMsTpmbrmtpXe4aWeH5XrIMBbNLsqivU2xMZE2WWYMzBz2rQbJq5SLO6MpjpCkX
 nFbHBV+swSSiAJ00xGYFvkDfv6Grj5iNw1tw6PoeMUZ+I0FSQK/XGngsfw8bLLcHTUz/
 Yu+f36tCZUSUG1bmmxrjWmAStFXYXVz+cMHzb81+6lwOo7VtOrQu5QkzvCCqyMAuYGKy
 kWQQ==
X-Gm-Message-State: ALoCoQm6b4W0EtiteSSqRV8SehQC+5IVAfOKnHwB+5dwnz+46SNDmyY1E9JL4AQeuKndOZs6yImRGQzspNDtkLql+NtfYdVdG1WbtZ51N/9dgVcVgPniuXyXdYacNiQaaD4Kr6KOQmfvxCe6pu/su/237f7rk5fcCg==
X-Received: by 10.194.204.232 with SMTP id lb8mr64768831wjc.0.1412171284067;
 Wed, 01 Oct 2014 06:48:04 -0700 (PDT)
X-Received: by 10.194.204.232 with SMTP id lb8mr64768749wjc.0.1412171283531;
 Wed, 01 Oct 2014 06:48:03 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk.
 [137.222.187.221])
 by mx.google.com with ESMTPSA id cz3sm1245432wjb.23.2014.10.01.06.48.02
 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 01 Oct 2014 06:48:02 -0700 (PDT)
Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1])
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s91Dm19V084972
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 1 Oct 2014 14:48:01 +0100 (BST)
 (envelope-from mexas@mech-as221.men.bris.ac.uk)
Received: (from mexas@localhost)
 by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s91Dm1n3084971;
 Wed, 1 Oct 2014 14:48:01 +0100 (BST) (envelope-from mexas)
Date: Wed, 1 Oct 2014 14:48:01 +0100 (BST)
From: Anton Shterenlikht <mexas@bris.ac.uk>
Message-Id: <201410011348.s91Dm1n3084971@mech-as221.men.bris.ac.uk>
To: jkh@mail.turbofuzz.com, kraduk@gmail.com
Subject: Re: cluster FS?
Reply-To: mexas@bristol.ac.uk
In-Reply-To: <D4F2247F-0FFD-4D32-A61B-FDFF39A2E2E5@mail.turbofuzz.com>
Cc: freebsd-hackers@freebsd.org, mexas@bristol.ac.uk
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 13:48:32 -0000

>From jkh@mail.turbofuzz.com Wed Oct  1 14:22:36 2014
>
>> On Oct 1, 2014, at 2:03 PM, krad <kraduk@gmail.com> wrote:
>> 
>> These are my definitions, hopefully it makes some stuff a little clearer
>
>Thanks for the exposition - if that doesn’t help Anton, I don’t know what will. :-)

It did. Thanks a lot.

>To answer Anton’s previous question, he just needs to read the PDF he cited a little more closely.  HP has obviously provided some sort of concurrent access mode to their SAN, but it's only active/active if you have one of the supported operating systems.  Presumably, HP also provides drivers for those OSes which provide some sort of interlock support, though again, it’s not clear just what sort of filesystems you can put on the SAN and still keep the active/active concurrency.  It’s very tricky, and the penalty for getting it wrong is corrupted data, so I’d tend to put my money on an actual filesystem-level solution which provides concurrent access, like glusterfs.  That just went BETA with FreeBSD support, so who knows, maybe it’s becoming a viable solution.  I have zero experience with deploying glusterfs, however, so I cannot speak to that.
>

ok, I get it - my plans are way beyond my IT expertise (amateur).

Thanks again

Anton

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 17:27:55 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C88DBF10
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 17:27:55 +0000 (UTC)
Received: from na01-bn1-obe.outbound.protection.outlook.com
 (mail-bn1on0083.outbound.protection.outlook.com [157.56.110.83])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "mail.protection.outlook.com",
 Issuer "MSIT Machine Auth CA 2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 723E898E
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 17:27:54 +0000 (UTC)
Received: from DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) by
 DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) with Microsoft SMTP
 Server (TLS) id 15.0.1039.15; Wed, 1 Oct 2014 17:12:19 +0000
Received: from DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) by
 DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) with
 mapi id 15.00.1039.011; Wed, 1 Oct 2014 17:12:19 +0000
From: "Pokala, Ravi" <rpokala@panasas.com>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Re: dumpsys/savecore on AF-4Kn drives?
Thread-Topic: dumpsys/savecore on AF-4Kn drives?
Thread-Index: AQHP3QFgvwLpNOmcHUaIK0Ete9ljB5wbBlIA
Date: Wed, 1 Oct 2014 17:12:19 +0000
Message-ID: <D0518336.121BA9%rpokala@panasas.com>
References: <D050827F.121A5C%rpokala@panasas.com>
In-Reply-To: <D050827F.121A5C%rpokala@panasas.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
user-agent: Microsoft-MacOutlook/14.4.4.140807
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [24.6.178.251]
x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR0801MB0944;
x-forefront-prvs: 0351D213B3
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(6009001)(13464003)(189002)(164054003)(377454003)(51704005)(199003)(20776003)(83506001)(101416001)(2351001)(107886001)(99396003)(80022003)(46102003)(85852003)(54356999)(76176999)(21056001)(50986999)(86362001)(64706001)(2656002)(19580395003)(92566001)(92726001)(19580405001)(66066001)(31966008)(85306004)(87936001)(105586002)(120916001)(95666004)(4396001)(10300001)(76482002)(107046002)(77096002)(110136001)(36756003)(99286002)(97736003)(106356001)(106116001);
 DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR0801MB0944;
 H:DM2PR0801MB0944.namprd08.prod.outlook.com; FPR:; MLV:sfv; PTR:InfoNoRecords;
 MX:1; A:1; LANG:en; 
Content-Type: text/plain; charset="us-ascii"
Content-ID: <056BA62C25CA864480C517738B4FCDB0@namprd08.prod.outlook.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: panasas.com
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 17:27:55 -0000

Re-posting something that was sent to me off-list, so the thread stays
up-to-date:

From: <Meyer>, Conrad <conrad.meyer@isilon.com>
Date: Wednesday, October 1, 2014 at 6:05 AM
To: Ravi Pokala <rpokala@panasas.com>
Subject: RE: dumpsys/savecore on AF-4Kn drives?

Ravi,

Skipping freebsd-hackers@ as I can't get Outlook to reply in the right way.

savecore(1) uses raw read(2) calls to the passed device. FreeBSD DevFS
doesn't support non-native block sizes, so that's probably where EINVAL is
coming from. To support 4Kn from that end you could probably convert
savecore(1) from read(2) and friends to fread(3) (assuming libc does the
right thing re: native sector size).

The kernel dump code is all DEV_BSIZE (512), but the backing dump device
is free to do Read-Modify-Write to satisfy those 512 byte writes during
dump. I don't know if gmirror does this, but if you were able to create a
dump without error/panic, it probably does.

Most of this code hasn't been touched since 2002 or so, 4Kn is a project
:). Good luck!

Hope that helps,
Conrad


-----Original Message-----
From: <Pokala>, Ravi Pokala <rpokala@panasas.com>
Date: Tuesday, September 30, 2014 at 3:53 PM
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: dumpsys/savecore on AF-4Kn drives?

>Hi folks,
>
>Does anyone out there have AF-4Kn drives (both logical and physical sector
>size is 4KB)? Have you been able to drop a core to one, and successfully
>save the core on the way back up?
>
>I'm working on adding AF-4Kn support to an older version of FreeBSD (based
>on 7 - yeah, I know... :-P), using -CURRENT as a reference. Things look
>good at the GEOM level and higher; the GEOM utils report correct sizes,
>UFS runs fine, etc. If I manually break into the debugger and 'call
>doadump', it appears to work; at least, it does not report any errors. But
>when I reboot, `savecore' complains:
>
>    error reading dump header at offset 0 in /dev/mirror/gm1: Invalid
>argument
>
>(Yes, it's dumping to a mirror; no, that's not the problem: the mirror is
>configured using the 'prefer' balancing algorithm, as described in
>gmirror(8), and we've been doing this without issue for years.)
>
>I'm trying to figure out if the problem is on the dumpsys side, the
>savecore side, or if they're both broken for AF-4Kn. In particular,
>'struct kerneldumpheader' is 512 bytes, and it looks like most calls to
>dump_write() in the full-dump context (not minidumps) pass either the size
>of the structure, or an explicit 512, for the 'length' argument. That's
>the case in both the 7-ish version I'm porting to, and in -CURRENT.
>
>There's no AF-4Kn-aware bootstrap in the version we're using (emaste@ -
>does the new UEFI bootstrap in 10-STABLE work w/ AF-4Kn drives?), so one
>of the drives is 512n, and I could probably find some space on there to
>save the core to. But that device is small, and we have other uses for it,
>so I'd like to avoid reserving a large chunk of it.
>
>Any thoughts?
>
>Thanks,
>
>Ravi
>


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 19:38:17 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 1AF2EC68
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 19:38:17 +0000 (UTC)
Received: from tensor.andric.com (tensor.andric.com [87.251.56.140])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client CN "tensor.andric.com", Issuer "CAcert Class 3 Root" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id CAB8EBB0
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 19:38:16 +0000 (UTC)
Received: from [IPv6:2001:7b8:3a7::e57d:9fd2:d3a8:dc94] (unknown
 [IPv6:2001:7b8:3a7:0:e57d:9fd2:d3a8:dc94])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 1255EB80A;
 Wed,  1 Oct 2014 21:38:07 +0200 (CEST)
Content-Type: multipart/signed;
 boundary="Apple-Mail=_FE3828C5-4D92-4392-AE93-72655D74AB90";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Subject: Re: Kernel/Compiler bug
From: Dimitry Andric <dim@FreeBSD.org>
In-Reply-To: <20141001134044.GA57022@gta.com>
Date: Wed, 1 Oct 2014 21:37:54 +0200
Message-Id: <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
To: Larry Baird <lab@gta.com>
X-Mailer: Apple Mail (2.1878.6)
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 19:38:17 -0000


--Apple-Mail=_FE3828C5-4D92-4392-AE93-72655D74AB90
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=us-ascii

On 01 Oct 2014, at 15:40, Larry Baird <lab@gta.com> wrote:
> Ryan,
> 
> On Wed, Oct 01, 2014 at 12:46:35AM -0400, Ryan Stone wrote:
>> This may not be a compiler bug.  A quick look at the esp values
>> provided in that backtrace shows that at least 7KB has been used on
>> the stack.  The stack for kernel threads is only 8KB, and a stack
>> overflow can cause a double fault like that.
>> 
>> My suspicion would be that without optimizations on clang uses a lot
>> more stack space and you push over the limit.  There's a kernel build
>> option for the stack size that you could change to confirm.  I believe
>> that it's called KSTACK_PAGES.  Try increasing it to 4.
> Good catch.  Increasing KSTACK_PAGES does fix the issue.  I wonder with
> optimization, how close to stack overflow does the kernel get during boot?

It obviously depends on which optimization flags you use, which drivers
you include, and so on.  There was a thread some time ago about somebody
banging into the limit when mounting certain ZFS filesystems, here:

https://lists.freebsd.org/pipermail/freebsd-current/2012-December/038208.html

This is why Kostik added printing of the frame addresses to the panic
backtrace output, so you can easily see if you hit the stack limit.

That said, 8k is not much these days, especially not with fairly
complicated code like ZFS, combined with high optimization, which can
inline a lot of functions, causing even more stack usage.  I would just
bump KSTACK_PAGES.

-Dimitry


--Apple-Mail=_FE3828C5-4D92-4392-AE93-72655D74AB90
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iEYEARECAAYFAlQsWBoACgkQsF6jCi4glqN8zgCeNe0ZiuINVUj9/pZCd3fUiu0R
2uEAoJc3rkdOrAgsYfXSuqrzltEVscAQ
=uHtI
-----END PGP SIGNATURE-----

--Apple-Mail=_FE3828C5-4D92-4392-AE93-72655D74AB90--

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct  1 23:21:37 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 927DFD36
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 23:21:37 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 73378CF8
 for <freebsd-hackers@freebsd.org>; Wed,  1 Oct 2014 23:21:37 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.9/8.14.9) with ESMTP id s91NLb6i037175
 for <freebsd-hackers@freebsd.org>; Wed, 1 Oct 2014 23:21:37 GMT
 (envelope-from bdrewery@freefall.freebsd.org)
Received: (from bdrewery@localhost)
 by freefall.freebsd.org (8.14.9/8.14.9/Submit) id s91NLbRu037172
 for freebsd-hackers@freebsd.org; Wed, 1 Oct 2014 23:21:37 GMT
 (envelope-from bdrewery)
Received: (qmail 42342 invoked from network); 1 Oct 2014 18:21:32 -0500
Received: from unknown (HELO ?10.10.0.24?) (freebsd@shatow.net@10.10.0.24)
 by sweb.xzibition.com with ESMTPA; 1 Oct 2014 18:21:32 -0500
Message-ID: <542C8C75.30007@FreeBSD.org>
Date: Wed, 01 Oct 2014 18:21:25 -0500
From: Bryan Drewery <bdrewery@FreeBSD.org>
Organization: FreeBSD
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.2
MIME-Version: 1.0
To: Dimitry Andric <dim@FreeBSD.org>, Larry Baird <lab@gta.com>
Subject: Re: Kernel/Compiler bug
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
In-Reply-To: <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
OpenPGP: id=6E4697CF;
	url=http://www.shatow.net/bryan/bryan2.asc
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature";
 boundary="1UPenIEbfu6kooDcV3CLQPGkblXxSIh6L"
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Oct 2014 23:21:37 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--1UPenIEbfu6kooDcV3CLQPGkblXxSIh6L
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 10/1/2014 2:37 PM, Dimitry Andric wrote:
> On 01 Oct 2014, at 15:40, Larry Baird <lab@gta.com> wrote:
>> Ryan,
>>
>> On Wed, Oct 01, 2014 at 12:46:35AM -0400, Ryan Stone wrote:
>>> This may not be a compiler bug.  A quick look at the esp values
>>> provided in that backtrace shows that at least 7KB has been used on
>>> the stack.  The stack for kernel threads is only 8KB, and a stack
>>> overflow can cause a double fault like that.
>>>
>>> My suspicion would be that without optimizations on clang uses a lot
>>> more stack space and you push over the limit.  There's a kernel build=

>>> option for the stack size that you could change to confirm.  I believ=
e
>>> that it's called KSTACK_PAGES.  Try increasing it to 4.
>> Good catch.  Increasing KSTACK_PAGES does fix the issue.  I wonder wit=
h
>> optimization, how close to stack overflow does the kernel get during b=
oot?
>=20
> It obviously depends on which optimization flags you use, which drivers=

> you include, and so on.  There was a thread some time ago about somebod=
y
> banging into the limit when mounting certain ZFS filesystems, here:
>=20
> https://lists.freebsd.org/pipermail/freebsd-current/2012-December/03820=
8.html
>=20
> This is why Kostik added printing of the frame addresses to the panic
> backtrace output, so you can easily see if you hit the stack limit.
>=20
> That said, 8k is not much these days, especially not with fairly
> complicated code like ZFS, combined with high optimization, which can
> inline a lot of functions, causing even more stack usage.  I would just=

> bump KSTACK_PAGES.
>=20
> -Dimitry
>=20

Is this something that can be bumped in the tree for GENERIC?

--=20
Regards,
Bryan Drewery


--1UPenIEbfu6kooDcV3CLQPGkblXxSIh6L
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)

iQEcBAEBAgAGBQJULIx1AAoJEDXXcbtuRpfPHvkH/09cR3hY2SktVv5v4QjRlgMO
07+o6Dc/FGwHLpvwuq9XZXyAlr40j2We3la6sXPFnBcx1uQnLz9TNmEinohmLqlg
zVMSUJd97OJRbEEwHsl/jmnSrVAJa+KIO748C0Lu9hgcPQc4eDY86N/nzTTpK4Vm
99+tEAGeIAnsUGaxg7sQNt6GsydcfAngp/UZ7NKPiQoMTJVW/F7cFT9iCIGWurnh
udyhNMVmwQDOWuwD+QmWgmCuXGAPHiVME9F/DmTKBXPtFlEpx3XQdy1LCOybL2wM
oaAjKO4EMzgl6Z1X6JTOrA2ZpgZb1EPheBzmc8z/2rgrJpJEeIS+FEdXrEfh/AE=
=JsN4
-----END PGP SIGNATURE-----

--1UPenIEbfu6kooDcV3CLQPGkblXxSIh6L--

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 07:55:45 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0B62E827;
 Thu,  2 Oct 2014 07:55:45 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A12E987B;
 Thu,  2 Oct 2014 07:55:44 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s927tcGT030413
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Thu, 2 Oct 2014 10:55:38 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s927tcGT030413
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s927tbCm030410;
 Thu, 2 Oct 2014 10:55:37 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 2 Oct 2014 10:55:37 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bryan Drewery <bdrewery@FreeBSD.org>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141002075537.GU26076@kib.kiev.ua>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <542C8C75.30007@FreeBSD.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Dimitry Andric <dim@FreeBSD.org>,
 Larry Baird <lab@gta.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 07:55:45 -0000

On Wed, Oct 01, 2014 at 06:21:25PM -0500, Bryan Drewery wrote:
> On 10/1/2014 2:37 PM, Dimitry Andric wrote:
> > On 01 Oct 2014, at 15:40, Larry Baird <lab@gta.com> wrote:
> >> Ryan,
> >>
> >> On Wed, Oct 01, 2014 at 12:46:35AM -0400, Ryan Stone wrote:
> >>> This may not be a compiler bug.  A quick look at the esp values
> >>> provided in that backtrace shows that at least 7KB has been used on
> >>> the stack.  The stack for kernel threads is only 8KB, and a stack
> >>> overflow can cause a double fault like that.
> >>>
> >>> My suspicion would be that without optimizations on clang uses a lot
> >>> more stack space and you push over the limit.  There's a kernel build
> >>> option for the stack size that you could change to confirm.  I believe
> >>> that it's called KSTACK_PAGES.  Try increasing it to 4.
> >> Good catch.  Increasing KSTACK_PAGES does fix the issue.  I wonder with
> >> optimization, how close to stack overflow does the kernel get during boot?
> > 
> > It obviously depends on which optimization flags you use, which drivers
> > you include, and so on.  There was a thread some time ago about somebody
> > banging into the limit when mounting certain ZFS filesystems, here:
> > 
> > https://lists.freebsd.org/pipermail/freebsd-current/2012-December/038208.html
> > 
> > This is why Kostik added printing of the frame addresses to the panic
> > backtrace output, so you can easily see if you hit the stack limit.
> > 
> > That said, 8k is not much these days, especially not with fairly
> > complicated code like ZFS, combined with high optimization, which can
> > inline a lot of functions, causing even more stack usage.  I would just
> > bump KSTACK_PAGES.
> > 
> > -Dimitry
> > 
> 
> Is this something that can be bumped in the tree for GENERIC?

The cost of the increased size for kernel stack is significant, even
on architectures with ample KVA.  This must not be done just because 
some non-default kernel settings cause stack overflow.  If somebody
feels himself qualified enough to tune compiler options, it must
understand the consequences and do other required adjustments,
including kernel stack size tuning.

FWIW, there was old reason why -O0 did not worked for the kernel.
The cpufunc.h inlines are not provided in non-inline version, and
at least gcc at -O0 level sometimes generated the call to nonexisting
function, leading to linking failure.  It is curious that clang always
inlines at -O0, but it is possible, although unlikely, that kernel
source was changed to be immune.

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 14:02:39 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8D72DD65
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 14:02:39 +0000 (UTC)
Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 4051384A
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 14:02:38 +0000 (UTC)
Received: (qmail 59208 invoked by uid 1000); 2 Oct 2014 14:02:32 -0000
Date: Thu, 2 Oct 2014 10:02:32 -0400
From: Larry Baird <lab@gta.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141002140232.GA52387@gta.com>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141002075537.GU26076@kib.kiev.ua>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Dimitry Andric <dim@FreeBSD.org>,
 Bryan Drewery <bdrewery@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 14:02:39 -0000

> > Is this something that can be bumped in the tree for GENERIC?
> 
> The cost of the increased size for kernel stack is significant, even
> on architectures with ample KVA.  This must not be done just because 
> some non-default kernel settings cause stack overflow.  If somebody
> feels himself qualified enough to tune compiler options, it must
> understand the consequences and do other required adjustments,
> including kernel stack size tuning.
> 
> FWIW, there was old reason why -O0 did not worked for the kernel.
> The cpufunc.h inlines are not provided in non-inline version, and
> at least gcc at -O0 level sometimes generated the call to nonexisting
> function, leading to linking failure.  It is curious that clang always
> inlines at -O0, but it is possible, although unlikely, that kernel
> source was changed to be immune.

Overall I aggree with  your comments.  The fact is that I have been using
-O0 and -O1 on custom kernels for years. It makes using kgdb much more
effective.  Both optimization levels work for a custom kernel I have for
FreeBSD 10.0 but do not work for FreeBSD 10.1. I just tried turning off
optimization for a FreeBSD 10.0 release GENERIC kernel. Same issue.  My
concern is that opimized kernels may be close to the edge as well.  Since
people have been runing 10.0 for a while without issue, maybe me concern
is unfounded.  Anybody have any thoughts on how to instrument a kernel
build option to check for maximum used stack depth? It would be nice to
prove that my concern is unfounded.

Larry

-- 
------------------------------------------------------------------------
Larry Baird
Global Technology Associates, Inc. 1992-2012 	| http://www.gta.com
Celebrating Twenty Years of Software Innovation | Orlando, FL
Email: lab@gta.com                 		| TEL 407-380-0220

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 14:33:54 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 30C5D75A;
 Thu,  2 Oct 2014 14:33:54 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C7BA1BBE;
 Thu,  2 Oct 2014 14:33:53 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s92EXlaF037366
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Thu, 2 Oct 2014 17:33:47 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s92EXlaF037366
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s92EXjoP037365;
 Thu, 2 Oct 2014 17:33:45 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 2 Oct 2014 17:33:45 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Larry Baird <lab@gta.com>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141002143345.GY26076@kib.kiev.ua>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
 <20141002140232.GA52387@gta.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141002140232.GA52387@gta.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Dimitry Andric <dim@FreeBSD.org>,
 Bryan Drewery <bdrewery@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 14:33:54 -0000

On Thu, Oct 02, 2014 at 10:02:32AM -0400, Larry Baird wrote:
> > > Is this something that can be bumped in the tree for GENERIC?
> > 
> > The cost of the increased size for kernel stack is significant, even
> > on architectures with ample KVA.  This must not be done just because 
> > some non-default kernel settings cause stack overflow.  If somebody
> > feels himself qualified enough to tune compiler options, it must
> > understand the consequences and do other required adjustments,
> > including kernel stack size tuning.
> > 
> > FWIW, there was old reason why -O0 did not worked for the kernel.
> > The cpufunc.h inlines are not provided in non-inline version, and
> > at least gcc at -O0 level sometimes generated the call to nonexisting
> > function, leading to linking failure.  It is curious that clang always
> > inlines at -O0, but it is possible, although unlikely, that kernel
> > source was changed to be immune.
> 
> Overall I aggree with  your comments.  The fact is that I have been using
> -O0 and -O1 on custom kernels for years. It makes using kgdb much more
> effective.  Both optimization levels work for a custom kernel I have for
> FreeBSD 10.0 but do not work for FreeBSD 10.1. I just tried turning off
> optimization for a FreeBSD 10.0 release GENERIC kernel. Same issue.  My
> concern is that opimized kernels may be close to the edge as well.  Since
> people have been runing 10.0 for a while without issue, maybe me concern
> is unfounded.  Anybody have any thoughts on how to instrument a kernel
> build option to check for maximum used stack depth? It would be nice to
> prove that my concern is unfounded.

The easiest thing to do is to record the stack depth for kernel mode
on entry into interrupt.  Interrupt handlers are usually well written
and do not consume a lot of stack.

Look at the intr_event_handle(), which is the entry point. The mode can
be deduced from trapframe passed. The kernel stack for the thread is
described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages
(size), so the top of the stack is at td_kstack + td_kstack_size [*].
The current stack consumption could be taken from reading %rsp register,
or you may take the address of any local variable as well.

* - there are pcb and usermode fpu save area at the top of the stack, and
actual kernel stack top is right below fpu save area.  This should not
be important for your measurements, since you are looking at how close
the %rsp gets to the bottom.

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 15:08:31 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2D40156E
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 15:08:31 +0000 (UTC)
Received: from exchange.glccom.com (exchange.glccom.com [209.152.99.146])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (Client CN "exchange.glccom.com",
 Issuer "Network Solutions DV Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id EEC87F73
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 15:08:30 +0000 (UTC)
Received: from karen-pc.local.glccom.com (192.168.10.71) by
 exchange.glccom.com (209.152.99.146) with Microsoft SMTP Server (TLS) id
 8.3.83.0; Thu, 2 Oct 2014 10:07:06 -0500
From: Paul Albrecht <palbrecht@glccom.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Subject: freebsd 10 kqueue timer regression
Message-ID: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
Date: Thu, 2 Oct 2014 10:07:04 -0500
To: <freebsd-hackers@freebsd.org>
MIME-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
X-Mailer: Apple Mail (2.1878.6)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 15:08:31 -0000


Hi,

What=92s up with freebsd 10? I=92m testing some code that uses the =
kqueue timer for timing and it doesn=92t work because the precision of =
the timer is off.=

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 15:10:41 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 90DB0866
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 15:10:41 +0000 (UTC)
Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com
 [IPv6:2607:f8b0:4001:c05::22b])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5F5B1B5
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 15:10:41 +0000 (UTC)
Received: by mail-ig0-f171.google.com with SMTP id h15so2175934igd.4
 for <freebsd-hackers@freebsd.org>; Thu, 02 Oct 2014 08:10:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:content-type;
 bh=rQuiAGSD8ZoajlbtoQHJCQ69z3n47Z6lxvuNveFxn+Q=;
 b=YE1IMbTlSaKi4iSEfh6DNEipkNW+dAlCjAcbfL/RmcuSSYTx5jmfIws8Eby3Dmd2/0
 dKFuIoiTYm4yBCQYv2fxCFUawo1eTEmMVzp2I8ASIz6e6MOLtgP8/lpKKE3IV/Pt3Vii
 hOpwIBKsn64yAvNEXb6AA5guRaMhwxmlg1sataB2j0MLk4vnzZBmePP6NWgP11HSUA3H
 Y+W3NW6aVlL01M6eHoBbxlZJZFEHa9KAoTiI0jjfxSinNPnvOR4SLsA71GwhCccYRlvo
 gFDcs+REAvQUfy3a/Wd9lJLSXFy+uR/Po9d0lUvaqbwyBkGSSNKvpmgqWpTxrNaz7OQ3
 5zfA==
X-Received: by 10.42.233.75 with SMTP id jx11mr5990979icb.22.1412262640695;
 Thu, 02 Oct 2014 08:10:40 -0700 (PDT)
MIME-Version: 1.0
Sender: carpeddiem@gmail.com
Received: by 10.107.44.196 with HTTP; Thu, 2 Oct 2014 08:10:20 -0700 (PDT)
In-Reply-To: <20141002075537.GU26076@kib.kiev.ua>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
From: Ed Maste <emaste@freebsd.org>
Date: Thu, 2 Oct 2014 11:10:20 -0400
X-Google-Sender-Auth: sW-PFH2ZYQAdAilBzlLv48p6vxY
Message-ID: <CAPyFy2CCViaUifsO-1xTZ8k22Y5SH5n2Hauo5nU5hyQdHVx=og@mail.gmail.com>
Subject: Re: Kernel/Compiler bug
To: Konstantin Belousov <kostikbel@gmail.com>, 
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 15:10:41 -0000

On 2 October 2014 03:55, Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> The cost of the increased size for kernel stack is significant, even
> on architectures with ample KVA.  This must not be done just because
> some non-default kernel settings cause stack overflow.  If somebody
> feels himself qualified enough to tune compiler options, it must
> understand the consequences and do other required adjustments,
> including kernel stack size tuning.

I wonder if we should have a comment in kern.pre.mk, even if it's just
an explicit notice that changing -O can have adverse effects. For
better or worse it's a fairly common desire to try changing the
kernel's -O.

Of course, kern.pre.mk is not intended to accommodate user-facing
changes. I suspect it's reasonably common for developers to grep '-O2'
in sys/conf and discover where it's getting set though.

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 15:48:51 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D5F38640
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 15:48:51 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B5E7E681
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 15:48:51 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.9/8.14.9) with ESMTP id s92Fmp8k068946
 for <freebsd-hackers@freebsd.org>; Thu, 2 Oct 2014 15:48:51 GMT
 (envelope-from bdrewery@freefall.freebsd.org)
Received: (from bdrewery@localhost)
 by freefall.freebsd.org (8.14.9/8.14.9/Submit) id s92FmpVo068945
 for freebsd-hackers@freebsd.org; Thu, 2 Oct 2014 15:48:51 GMT
 (envelope-from bdrewery)
Received: (qmail 5099 invoked from network); 2 Oct 2014 10:48:46 -0500
Received: from unknown (HELO ?10.10.0.24?) (freebsd@shatow.net@10.10.0.24)
 by sweb.xzibition.com with ESMTPA; 2 Oct 2014 10:48:46 -0500
Message-ID: <542D73D3.9040109@FreeBSD.org>
Date: Thu, 02 Oct 2014 10:48:35 -0500
From: Bryan Drewery <bdrewery@FreeBSD.org>
Organization: FreeBSD
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.2
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Subject: Re: Kernel/Compiler bug
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
 <20141002140232.GA52387@gta.com>
In-Reply-To: <20141002140232.GA52387@gta.com>
OpenPGP: id=6E4697CF;
	url=http://www.shatow.net/bryan/bryan2.asc
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature";
 boundary="DlOo11nMxAEEGbmuWhmuxvn8p6c5Ufx60"
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 15:48:52 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--DlOo11nMxAEEGbmuWhmuxvn8p6c5Ufx60
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 10/2/2014 9:02 AM, Larry Baird wrote:
>>> Is this something that can be bumped in the tree for GENERIC?
>>
>> The cost of the increased size for kernel stack is significant, even
>> on architectures with ample KVA.  This must not be done just because=20
>> some non-default kernel settings cause stack overflow.  If somebody
>> feels himself qualified enough to tune compiler options, it must
>> understand the consequences and do other required adjustments,
>> including kernel stack size tuning.
>>
>> FWIW, there was old reason why -O0 did not worked for the kernel.
>> The cpufunc.h inlines are not provided in non-inline version, and
>> at least gcc at -O0 level sometimes generated the call to nonexisting
>> function, leading to linking failure.  It is curious that clang always=

>> inlines at -O0, but it is possible, although unlikely, that kernel
>> source was changed to be immune.
>=20
> Overall I aggree with  your comments.  The fact is that I have been usi=
ng
> -O0 and -O1 on custom kernels for years. It makes using kgdb much more
> effective.  Both optimization levels work for a custom kernel I have fo=
r
> FreeBSD 10.0 but do not work for FreeBSD 10.1. I just tried turning off=

> optimization for a FreeBSD 10.0 release GENERIC kernel. Same issue.  My=

> concern is that opimized kernels may be close to the edge as well.  Sin=
ce
> people have been runing 10.0 for a while without issue, maybe me concer=
n
> is unfounded.  Anybody have any thoughts on how to instrument a kernel
> build option to check for maximum used stack depth? It would be nice to=

> prove that my concern is unfounded.
>=20
> Larry
>=20

I think at the very least we should have a DISABLE_OPTIMIZATION option
that sets -O0 and increases stack size.

I built a kernel with -O0 some months ago and hit a panic on boot and
did not look into why. It makes sense now though. It would have been
nice if it were more obviously documented or automatically handled.

--=20
Regards,
Bryan Drewery


--DlOo11nMxAEEGbmuWhmuxvn8p6c5Ufx60
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)

iQEcBAEBAgAGBQJULXPXAAoJEDXXcbtuRpfPMOwH/1zk51rg4TtQO3hfH0fQiAp6
eb9JC98X/PFukhkUd3byag2juvuP4hZ1loy6VFn8vE5t0+CriH1QhZk8R67bxDpC
uY23gko7PnH+1YL6nsbw7PFJjqQIirtMklTnxecSgiURfKD8btoY+dH4EDjgjJsq
6ButnRgPo8Lz0K5H5JHyatBdUg3dhHk8O0k98HYgVtmcIGhioewW82XsB+2iWdNi
mKOBvtD1NSObRByn/4GLNP6VSOPKU6Zh+BdfRofuMTynSQwdRpT+PjwgznQyLNRE
cYz9UOCPvHQPa08kVlq6ssJSIH19vKOHIVbW4b/dZlF8kiQLlFr6VbILejYpaF0=
=wNoz
-----END PGP SIGNATURE-----

--DlOo11nMxAEEGbmuWhmuxvn8p6c5Ufx60--

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 16:38:07 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B2F11F03
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 16:38:07 +0000 (UTC)
Received: from exchange.glccom.com (exchange.glccom.com [209.152.99.146])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (Client CN "exchange.glccom.com",
 Issuer "Network Solutions DV Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7F9F7D04
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 16:38:07 +0000 (UTC)
Received: from karen-pc.local.glccom.com (192.168.10.71) by
 exchange.glccom.com (209.152.99.146) with Microsoft SMTP Server (TLS) id
 8.3.83.0; Thu, 2 Oct 2014 11:35:59 -0500
From: Paul Albrecht <palbrecht@glccom.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Subject: What happened to the kqueue timer fix?
Message-ID: <80825686-58E8-4042-96C8-B86818F1E138@glccom.com>
Date: Thu, 2 Oct 2014 11:35:58 -0500
To: <freebsd-hackers@freebsd.org>
MIME-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
X-Mailer: Apple Mail (2.1878.6)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 16:38:07 -0000


I asked about this problem a while back a got a fix. Here=92s a link to =
the relevant freebsd-hackers list thread: =
http://lists.freebsd.org/pipermail/freebsd-hackers/2012-July/039907.html=

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 17:18:45 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 88C7D250
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 17:18:45 +0000 (UTC)
Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com
 [IPv6:2a00:1450:400c:c00::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 23009230
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 17:18:44 +0000 (UTC)
Received: by mail-wg0-f46.google.com with SMTP id l18so929145wgh.29
 for <freebsd-hackers@freebsd.org>; Thu, 02 Oct 2014 10:18:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type:content-transfer-encoding;
 bh=A0aeVpkE/JUSi89V/LNMJlYXGf8tdWFzjoGLZE/wUcI=;
 b=x15e9Av+4tij0skVDuYDcGRAD6AQSnhdYEHMnN5bCZlJmiTdB4Hw+CIAMiSv47MSQa
 hZ/9mNWU7RZiu+jdYlHC3Xt7/nqXEjPWahGJt/Gt/YXxLA1yCW8VgWzlLLK2o4HQ4D7M
 4bXJ46sNz+DNbXHJ0nYUzNxXGwBIEgnJJ25qpjU7j0Ew8Hp/jD6AaAduWcQWKS2p/7e5
 FfKHmpneCwHrAZ9xGZQhzihJ5atbs8lWDkX6ygNq37+6Oj6aDwKDO5sZOIOaPLEH8eJX
 o4LezUWNMXUffJLojvvxxR/bObo9CnjlGgF/KI3NaewsXYjEdcEDE/R/kv3xDwd83g1m
 EiOA==
MIME-Version: 1.0
X-Received: by 10.194.177.226 with SMTP id ct2mr443250wjc.20.1412270323410;
 Thu, 02 Oct 2014 10:18:43 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.216.106.136 with HTTP; Thu, 2 Oct 2014 10:18:43 -0700 (PDT)
In-Reply-To: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
Date: Thu, 2 Oct 2014 10:18:43 -0700
X-Google-Sender-Auth: -KkcerpE5_vTv5dqcJl_P5KClsM
Message-ID: <CAJ-VmonJQKWeW7K6+jY6=FpmZrm+6HQOuBmhhjJEapyVpwNFdQ@mail.gmail.com>
Subject: Re: freebsd 10 kqueue timer regression
From: Adrian Chadd <adrian@freebsd.org>
To: Paul Albrecht <palbrecht@glccom.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 17:18:45 -0000

On 2 October 2014 08:07, Paul Albrecht <palbrecht@glccom.com> wrote:
>
> Hi,
>
> What=E2=80=99s up with freebsd 10? I=E2=80=99m testing some code that use=
s the kqueue timer for timing and it doesn=E2=80=99t work because the preci=
sion of the timer is off.

Can you provide a test case for it?

I just chased down one of those recently; maybe it's the same thing
(callout() API changes.)


-a

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 17:28:50 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C2AED922;
 Thu,  2 Oct 2014 17:28:50 +0000 (UTC)
Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 952283A5;
 Thu,  2 Oct 2014 17:28:49 +0000 (UTC)
Received: from [73.34.117.227] (helo=ilsoft.org)
 by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1XZkBG-000COj-Ly; Thu, 02 Oct 2014 17:28:42 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s92HSfks019546;
 Thu, 2 Oct 2014 11:28:41 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 73.34.117.227
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX184q0nZxJ/ydENIXYKy40Pa
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: freebsd 10 kqueue timer regression
From: Ian Lepore <ian@FreeBSD.org>
To: Adrian Chadd <adrian@freebsd.org>
In-Reply-To: <CAJ-VmonJQKWeW7K6+jY6=FpmZrm+6HQOuBmhhjJEapyVpwNFdQ@mail.gmail.com>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmonJQKWeW7K6+jY6=FpmZrm+6HQOuBmhhjJEapyVpwNFdQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-7"
Date: Thu, 02 Oct 2014 11:28:40 -0600
Message-ID: <1412270920.12052.3.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by ilsoft.org id
 s92HSfks019546
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Paul Albrecht <palbrecht@glccom.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 17:28:50 -0000

On Thu, 2014-10-02 at 10:18 -0700, Adrian Chadd wrote:
> On 2 October 2014 08:07, Paul Albrecht <palbrecht@glccom.com> wrote:
> >
> > Hi,
> >
> > What=A2s up with freebsd 10? I=A2m testing some code that uses the kq=
ueue timer for timing and it doesn=A2t work because the precision of the =
timer is off.
>=20
> Can you provide a test case for it?
>=20
> I just chased down one of those recently; maybe it's the same thing
> (callout() API changes.)
>=20

The old mail thread he cited contains test code:

http://lists.freebsd.org/pipermail/freebsd-hackers/2012-July/039907.html

-- Ian


From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 17:51:58 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C0E872D4;
 Thu,  2 Oct 2014 17:51:58 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "funkthat.com", Issuer "funkthat.com" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 95E168CD;
 Thu,  2 Oct 2014 17:51:58 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s92HpuTd078739
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Thu, 2 Oct 2014 10:51:57 -0700 (PDT)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s92HpuKv078738;
 Thu, 2 Oct 2014 10:51:56 -0700 (PDT) (envelope-from jmg)
Date: Thu, 2 Oct 2014 10:51:56 -0700
From: John-Mark Gurney <jmg@funkthat.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141002175156.GM43300@funkthat.com>
Mail-Followup-To: Konstantin Belousov <kostikbel@gmail.com>,
 Larry Baird <lab@gta.com>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Dimitry Andric <dim@freebsd.org>,
 Bryan Drewery <bdrewery@freebsd.org>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
 <20141002140232.GA52387@gta.com> <20141002143345.GY26076@kib.kiev.ua>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141002143345.GY26076@kib.kiev.ua>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Thu, 02 Oct 2014 10:51:57 -0700 (PDT)
Cc: Dimitry Andric <dim@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Larry Baird <lab@gta.com>,
 Bryan Drewery <bdrewery@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 17:51:59 -0000

Konstantin Belousov wrote this message on Thu, Oct 02, 2014 at 17:33 +0300:
> On Thu, Oct 02, 2014 at 10:02:32AM -0400, Larry Baird wrote:
> > > > Is this something that can be bumped in the tree for GENERIC?
> > > 
> > > The cost of the increased size for kernel stack is significant, even
> > > on architectures with ample KVA.  This must not be done just because 
> > > some non-default kernel settings cause stack overflow.  If somebody
> > > feels himself qualified enough to tune compiler options, it must
> > > understand the consequences and do other required adjustments,
> > > including kernel stack size tuning.
> > > 
> > > FWIW, there was old reason why -O0 did not worked for the kernel.
> > > The cpufunc.h inlines are not provided in non-inline version, and
> > > at least gcc at -O0 level sometimes generated the call to nonexisting
> > > function, leading to linking failure.  It is curious that clang always
> > > inlines at -O0, but it is possible, although unlikely, that kernel
> > > source was changed to be immune.
> > 
> > Overall I aggree with  your comments.  The fact is that I have been using
> > -O0 and -O1 on custom kernels for years. It makes using kgdb much more
> > effective.  Both optimization levels work for a custom kernel I have for
> > FreeBSD 10.0 but do not work for FreeBSD 10.1. I just tried turning off
> > optimization for a FreeBSD 10.0 release GENERIC kernel. Same issue.  My
> > concern is that opimized kernels may be close to the edge as well.  Since
> > people have been runing 10.0 for a while without issue, maybe me concern
> > is unfounded.  Anybody have any thoughts on how to instrument a kernel
> > build option to check for maximum used stack depth? It would be nice to
> > prove that my concern is unfounded.
> 
> The easiest thing to do is to record the stack depth for kernel mode
> on entry into interrupt.  Interrupt handlers are usually well written
> and do not consume a lot of stack.
> 
> Look at the intr_event_handle(), which is the entry point. The mode can
> be deduced from trapframe passed. The kernel stack for the thread is
> described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages
> (size), so the top of the stack is at td_kstack + td_kstack_size [*].
> The current stack consumption could be taken from reading %rsp register,
> or you may take the address of any local variable as well.
> 
> * - there are pcb and usermode fpu save area at the top of the stack, and
> actual kernel stack top is right below fpu save area.  This should not
> be important for your measurements, since you are looking at how close
> the %rsp gets to the bottom.

There once was a script that would print out stack usage for each
function in the kernel...  This could help identify functions that
use too much stack...  I poked around in tools, but couldn't find it..

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 18:13:52 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id AA0A8491;
 Thu,  2 Oct 2014 18:13:52 +0000 (UTC)
Received: from exchange.glccom.com (exchange.glccom.com [209.152.99.146])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (Client CN "exchange.glccom.com",
 Issuer "Network Solutions DV Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5FB02B6B;
 Thu,  2 Oct 2014 18:13:51 +0000 (UTC)
Received: from karen-pc.local.glccom.com (192.168.10.71) by
 exchange.glccom.com (209.152.99.146) with Microsoft SMTP Server (TLS) id
 8.3.83.0; Thu, 2 Oct 2014 13:13:38 -0500
MIME-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Subject: Re: freebsd 10 kqueue timer regression
From: Paul Albrecht <palbrecht@glccom.com>
In-Reply-To: <CAJ-VmonJQKWeW7K6+jY6=FpmZrm+6HQOuBmhhjJEapyVpwNFdQ@mail.gmail.com>
Date: Thu, 2 Oct 2014 13:13:36 -0500
Message-ID: <8587D819-AA2F-4387-A4E9-523014384672@glccom.com>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmonJQKWeW7K6+jY6=FpmZrm+6HQOuBmhhjJEapyVpwNFdQ@mail.gmail.com>
To: Adrian Chadd <adrian@freebsd.org>
X-Mailer: Apple Mail (2.1878.6)
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 18:13:52 -0000


On Oct 2, 2014, at 12:18 PM, Adrian Chadd <adrian@freebsd.org> wrote:

> On 2 October 2014 08:07, Paul Albrecht <palbrecht@glccom.com> wrote:
>>=20
>> Hi,
>>=20
>> What=92s up with freebsd 10? I=92m testing some code that uses the =
kqueue timer for timing and it doesn=92t work because the precision of =
the timer is off.
>=20
> Can you provide a test case for it?

Here=92s the code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/event.h>
#include <sys/time.h>

int
main(void)
{
        int i,msec;
        int kq,nev;
        struct kevent inqueue;
        struct kevent outqueue;
        struct timeval start,end;

        if ((kq =3D kqueue()) =3D=3D -1) {
                fprintf(stderr, "kqueue error!? errno =3D %s", =
strerror(errno));
                exit(EXIT_FAILURE);
        }
        EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);

        gettimeofday(&start, 0);
        for (i =3D 0; i < 50; i++) {
                if ((nev =3D kevent(kq, &inqueue, 1, &outqueue, 1, =
NULL)) =3D=3D -1) {
                        fprintf(stderr, "kevent error!? errno =3D %s", =
strerror(errno));
                        exit(EXIT_FAILURE);
                } else if (outqueue.flags & EV_ERROR) {
                        fprintf(stderr, "EV_ERROR: %s\n", =
strerror(outqueue.data));
                        exit(EXIT_FAILURE);
                }
        }
        gettimeofday(&end, 0);

        msec =3D ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 + =
end.tv_usec - start.tv_usec) / 1000) - 1000);

        printf("msec =3D %d\n", msec);

        close(kq);
        return EXIT_SUCCESS;
}

When I run it on my system I get these results:

./a.out
msec =3D 1072
./a.out
msec =3D 1071
./a.out
msec =3D 1071

Which is over about 3.5 times the wait time per second.


>=20
> I just chased down one of those recently; maybe it's the same thing
> (callout() API changes.)
>=20
>=20
> -a


From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 19:42:41 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 1AA02A3
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 19:42:41 +0000 (UTC)
Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com
 [IPv6:2a00:1450:400c:c00::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A93AF827
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 19:42:40 +0000 (UTC)
Received: by mail-wg0-f46.google.com with SMTP id l18so1245952wgh.29
 for <freebsd-hackers@freebsd.org>; Thu, 02 Oct 2014 12:42:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type:content-transfer-encoding;
 bh=z8Gjy26b2CKg3+TWhoREzfBDFEGGVL+SUgvqozwiyhE=;
 b=RDhbVZNUovkijEBLo1LT4ZaKZjrr2DXYkBqvq0NDDrRu28f2fy9U66bs+56LGNJYdZ
 prtdhPh9Jd7/UTLeBMj9NQsaA2mzpEUFHmE8es21Q4hBJSOFzHMVZaeYi/T8Y28HdGCQ
 NyjjlzJiEGVASJaiyN2F7f/ysXElwcvb9gY08ZwbaFnZlt6Jnk2m+Nt7ZevL044tsbD8
 c6xp8Bg2bgUybOrHxeRoVAnEgOlJlKAuLa4Go07SHvWm6Y5nbhbc0mwBN0mOB6wJjLud
 KyQhZTJRAMZvk88Wi76GGjbyVZFdvPnNbntdlAauXJpHllZCs1Uw/2vVvJFBR61+dV+w
 sjiA==
MIME-Version: 1.0
X-Received: by 10.194.202.138 with SMTP id ki10mr1331291wjc.68.1412278959001; 
 Thu, 02 Oct 2014 12:42:39 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.216.106.136 with HTTP; Thu, 2 Oct 2014 12:42:38 -0700 (PDT)
In-Reply-To: <8587D819-AA2F-4387-A4E9-523014384672@glccom.com>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmonJQKWeW7K6+jY6=FpmZrm+6HQOuBmhhjJEapyVpwNFdQ@mail.gmail.com>
 <8587D819-AA2F-4387-A4E9-523014384672@glccom.com>
Date: Thu, 2 Oct 2014 12:42:38 -0700
X-Google-Sender-Auth: 0bIwtvTWfQ-Olm2GjOTW7wmlS5Y
Message-ID: <CAJ-VmomKFd_oTXp1bVhU22MgHG6U1V7mr6iwWmoyUGKpSqPy1Q@mail.gmail.com>
Subject: Re: freebsd 10 kqueue timer regression
From: Adrian Chadd <adrian@freebsd.org>
To: Paul Albrecht <palbrecht@glccom.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 19:42:41 -0000

Right, and jhb@ mentioned callout() and related stuff.

Let me take a look. I bet it's not doing things "right".


-a

On 2 October 2014 11:13, Paul Albrecht <palbrecht@glccom.com> wrote:
>
> On Oct 2, 2014, at 12:18 PM, Adrian Chadd <adrian@freebsd.org> wrote:
>
> On 2 October 2014 08:07, Paul Albrecht <palbrecht@glccom.com> wrote:
>
>
> Hi,
>
> What=E2=80=99s up with freebsd 10? I=E2=80=99m testing some code that use=
s the kqueue timer
> for timing and it doesn=E2=80=99t work because the precision of the timer=
 is off.
>
>
> Can you provide a test case for it?
>
>
> Here=E2=80=99s the code:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <errno.h>
> #include <sys/types.h>
> #include <sys/event.h>
> #include <sys/time.h>
>
> int
> main(void)
> {
>         int i,msec;
>         int kq,nev;
>         struct kevent inqueue;
>         struct kevent outqueue;
>         struct timeval start,end;
>
>         if ((kq =3D kqueue()) =3D=3D -1) {
>                 fprintf(stderr, "kqueue error!? errno =3D %s",
> strerror(errno));
>                 exit(EXIT_FAILURE);
>         }
>         EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);
>
>         gettimeofday(&start, 0);
>         for (i =3D 0; i < 50; i++) {
>                 if ((nev =3D kevent(kq, &inqueue, 1, &outqueue, 1, NULL))=
 =3D=3D
> -1) {
>                         fprintf(stderr, "kevent error!? errno =3D %s",
> strerror(errno));
>                         exit(EXIT_FAILURE);
>                 } else if (outqueue.flags & EV_ERROR) {
>                         fprintf(stderr, "EV_ERROR: %s\n",
> strerror(outqueue.data));
>                         exit(EXIT_FAILURE);
>                 }
>         }
>         gettimeofday(&end, 0);
>
>         msec =3D ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 +
> end.tv_usec - start.tv_usec) / 1000) - 1000);
>
>         printf("msec =3D %d\n", msec);
>
>         close(kq);
>         return EXIT_SUCCESS;
> }
>
> When I run it on my system I get these results:
>
> ./a.out
> msec =3D 1072
> ./a.out
> msec =3D 1071
> ./a.out
> msec =3D 1071
>
> Which is over about 3.5 times the wait time per second.
>
>
>
> I just chased down one of those recently; maybe it's the same thing
> (callout() API changes.)
>
>
> -a
>
>

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 19:47:30 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 157E621A
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 19:47:30 +0000 (UTC)
Received: from mail-wg0-x22c.google.com (mail-wg0-x22c.google.com
 [IPv6:2a00:1450:400c:c00::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A40FB86D
 for <freebsd-hackers@freebsd.org>; Thu,  2 Oct 2014 19:47:29 +0000 (UTC)
Received: by mail-wg0-f44.google.com with SMTP id y10so4072329wgg.3
 for <freebsd-hackers@freebsd.org>; Thu, 02 Oct 2014 12:47:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=9jDOrJzOz5mQQdPz44lMLmbvyRzG05ua41sSHeW4nN4=;
 b=eHgBvQQI6mcmt5/TlHDxeE59lTWbnvTvN6L66zWWbx8v8nWu/vLjw7s69OGZXQ7H8l
 NWI7frkjrW1o5pKM4f6p1hmvMD0NuJxEJOAdQdBuNZNLNsrCY54AmdR/d9xQ6cOejmv6
 PiOaxTjuRhSMMee+ebWa2Or8OhtiJZKYvctU2TjLGiY+nJpQNC/1ADkM5YkvGMFlIjY7
 gn4HlD+sHyJez57oWQkdcr8GAGkwZPDIR0QIj5ghWDN5tMAtWzW9tkY47ReQoriXWm1i
 i4REj4mIYJ3+Hwk8aUy1N1pTPQiU/4giQcCuzYQeviYQvkkJii+jxR+p89gos2ev/IeT
 u3Gw==
MIME-Version: 1.0
X-Received: by 10.180.74.203 with SMTP id w11mr6953561wiv.26.1412279248013;
 Thu, 02 Oct 2014 12:47:28 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.216.106.136 with HTTP; Thu, 2 Oct 2014 12:47:27 -0700 (PDT)
In-Reply-To: <CAJ-VmomKFd_oTXp1bVhU22MgHG6U1V7mr6iwWmoyUGKpSqPy1Q@mail.gmail.com>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmonJQKWeW7K6+jY6=FpmZrm+6HQOuBmhhjJEapyVpwNFdQ@mail.gmail.com>
 <8587D819-AA2F-4387-A4E9-523014384672@glccom.com>
 <CAJ-VmomKFd_oTXp1bVhU22MgHG6U1V7mr6iwWmoyUGKpSqPy1Q@mail.gmail.com>
Date: Thu, 2 Oct 2014 12:47:27 -0700
X-Google-Sender-Auth: fUnUrAZBtFheyBVcIIbks7rIbtQ
Message-ID: <CAJ-VmokPNgckHiR0znp6p4u2NRO0aOR_eaOVaBWe7cWDp2_o5g@mail.gmail.com>
Subject: Re: freebsd 10 kqueue timer regression
From: Adrian Chadd <adrian@freebsd.org>
To: Paul Albrecht <palbrecht@glccom.com>
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 19:47:30 -0000

I'm confused; it's doing 50 loops of a 20msec timer, right? So that's 1000ms.


-a

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 19:53:36 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3F65A49B;
 Thu,  2 Oct 2014 19:53:36 +0000 (UTC)
Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 11181945;
 Thu,  2 Oct 2014 19:53:35 +0000 (UTC)
Received: from [73.34.117.227] (helo=ilsoft.org)
 by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1XZmRN-000IkX-G5; Thu, 02 Oct 2014 19:53:29 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s92JrSJY019832;
 Thu, 2 Oct 2014 13:53:28 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 73.34.117.227
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX19t8LOfVzClEo9w+8sK1wFp
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: freebsd 10 kqueue timer regression
From: Ian Lepore <ian@FreeBSD.org>
To: Adrian Chadd <adrian@freebsd.org>
In-Reply-To: <CAJ-VmokPNgckHiR0znp6p4u2NRO0aOR_eaOVaBWe7cWDp2_o5g@mail.gmail.com>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmonJQKWeW7K6+jY6=FpmZrm+6HQOuBmhhjJEapyVpwNFdQ@mail.gmail.com>
 <8587D819-AA2F-4387-A4E9-523014384672@glccom.com>
 <CAJ-VmomKFd_oTXp1bVhU22MgHG6U1V7mr6iwWmoyUGKpSqPy1Q@mail.gmail.com>
 <CAJ-VmokPNgckHiR0znp6p4u2NRO0aOR_eaOVaBWe7cWDp2_o5g@mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"
Date: Thu, 02 Oct 2014 13:53:28 -0600
Message-ID: <1412279608.12052.24.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Paul Albrecht <palbrecht@glccom.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 19:53:36 -0000

On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote:
> I'm confused; it's doing 50 loops of a 20msec timer, right? So that's 1000ms.

Yes, so the entire loop should take 1000ms maybe + 1ms.  Instead it
takes 1070.  When I run it on an armv6 system running -current it takes
1050.  When I run it on my 8.4 desktop (pre-eventtimers) it takes 1013.

-- Ian


From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 21:21:35 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B5B55C2;
 Thu,  2 Oct 2014 21:21:35 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8E24D35C;
 Thu,  2 Oct 2014 21:21:35 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2870BB91F;
 Thu,  2 Oct 2014 17:21:34 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hackers@freebsd.org
Subject: Re: freebsd 10 kqueue timer regression
Date: Thu, 2 Oct 2014 16:00:17 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmokPNgckHiR0znp6p4u2NRO0aOR_eaOVaBWe7cWDp2_o5g@mail.gmail.com>
 <1412279608.12052.24.camel@revolution.hippie.lan>
In-Reply-To: <1412279608.12052.24.camel@revolution.hippie.lan>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201410021600.17740.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Thu, 02 Oct 2014 17:21:34 -0400 (EDT)
Cc: Adrian Chadd <adrian@freebsd.org>, Paul Albrecht <palbrecht@glccom.com>,
 Ian Lepore <ian@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 21:21:35 -0000

On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote:
> On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote:
> > I'm confused; it's doing 50 loops of a 20msec timer, right? So that's 
1000ms.
> 
> Yes, so the entire loop should take 1000ms maybe + 1ms.  Instead it
> takes 1070.  When I run it on an armv6 system running -current it takes
> 1050.  When I run it on my 8.4 desktop (pre-eventtimers) it takes 1013.
> 
> -- Ian

What if you set kern.eventtimer.periodic=1?

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct  2 22:15:10 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A4222E06;
 Thu,  2 Oct 2014 22:15:10 +0000 (UTC)
Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 75DCCAB5;
 Thu,  2 Oct 2014 22:15:09 +0000 (UTC)
Received: from [73.34.117.227] (helo=ilsoft.org)
 by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1XZoeR-0004W8-SO; Thu, 02 Oct 2014 22:15:08 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s92MF634020039;
 Thu, 2 Oct 2014 16:15:06 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 73.34.117.227
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX192pQr6xDvLmqfkPHo4BYGb
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: freebsd 10 kqueue timer regression
From: Ian Lepore <ian@FreeBSD.org>
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <201410021600.17740.jhb@freebsd.org>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmokPNgckHiR0znp6p4u2NRO0aOR_eaOVaBWe7cWDp2_o5g@mail.gmail.com>
 <1412279608.12052.24.camel@revolution.hippie.lan>
 <201410021600.17740.jhb@freebsd.org>
Content-Type: text/plain; charset="us-ascii"
Date: Thu, 02 Oct 2014 16:15:06 -0600
Message-ID: <1412288106.12052.39.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, Adrian Chadd <adrian@freebsd.org>,
 Paul Albrecht <palbrecht@glccom.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Oct 2014 22:15:10 -0000

On Thu, 2014-10-02 at 16:00 -0400, John Baldwin wrote:
> On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote:
> > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote:
> > > I'm confused; it's doing 50 loops of a 20msec timer, right? So that's 
> 1000ms.
> > 
> > Yes, so the entire loop should take 1000ms maybe + 1ms.  Instead it
> > takes 1070.  When I run it on an armv6 system running -current it takes
> > 1050.  When I run it on my 8.4 desktop (pre-eventtimers) it takes 1013.
> > 
> > -- Ian
> 
> What if you set kern.eventtimer.periodic=1?
> 

Some interesting results...

           HZ   100   500    1000
---------------------------------
periodic=0     1050  1050    1080
periodic=1     1110  1012    1049


The 1080 number was +/- 3ms, all the other numbers were +/- 1ms (except
for one outlier of 24363 at 100Hz non-periodic which I'm going to
pretend didn't happen).

The 1050 numbers are probably each 20ms sleep actually taking 21ms, but
the old tvtohz code with -1 adjustments from the old email thread isn't
in play anymore.  I don't know how to account for the other numbers at
all.  There's all kinds of stuff I don't understand in the new code
involving tick thresholds and such.

-- Ian


From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 00:49:53 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 88E6BEBF;
 Fri,  3 Oct 2014 00:49:53 +0000 (UTC)
Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 45670AE2;
 Fri,  3 Oct 2014 00:49:52 +0000 (UTC)
Received: from [73.34.117.227] (helo=ilsoft.org)
 by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1XZr4B-0007Lv-KM; Fri, 03 Oct 2014 00:49:51 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s930nofc020256;
 Thu, 2 Oct 2014 18:49:50 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 73.34.117.227
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX19n/udeWeX6r3o/g/NlB5ed
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: freebsd 10 kqueue timer regression
From: Ian Lepore <ian@FreeBSD.org>
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <1412288106.12052.39.camel@revolution.hippie.lan>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmokPNgckHiR0znp6p4u2NRO0aOR_eaOVaBWe7cWDp2_o5g@mail.gmail.com>
 <1412279608.12052.24.camel@revolution.hippie.lan>
 <201410021600.17740.jhb@freebsd.org>
 <1412288106.12052.39.camel@revolution.hippie.lan>
Content-Type: multipart/mixed; boundary="=-5ZH0QgKRoICHPsi7fRHH"
Date: Thu, 02 Oct 2014 18:49:49 -0600
Message-ID: <1412297389.12052.46.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Cc: freebsd-hackers@freebsd.org, Adrian Chadd <adrian@freebsd.org>,
 Paul Albrecht <palbrecht@glccom.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 00:49:53 -0000


--=-5ZH0QgKRoICHPsi7fRHH
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On Thu, 2014-10-02 at 16:15 -0600, Ian Lepore wrote:
> On Thu, 2014-10-02 at 16:00 -0400, John Baldwin wrote:
> > On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote:
> > > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote:
> > > > I'm confused; it's doing 50 loops of a 20msec timer, right? So that's 
> > 1000ms.
> > > 
> > > Yes, so the entire loop should take 1000ms maybe + 1ms.  Instead it
> > > takes 1070.  When I run it on an armv6 system running -current it takes
> > > 1050.  When I run it on my 8.4 desktop (pre-eventtimers) it takes 1013.
> > > 
> > > -- Ian
> > 
> > What if you set kern.eventtimer.periodic=1?
> > 
> 
> Some interesting results...
> 
>            HZ   100   500    1000
> ---------------------------------
> periodic=0     1050  1050    1080
> periodic=1     1110  1012    1049
> 
> 
> The 1080 number was +/- 3ms, all the other numbers were +/- 1ms (except
> for one outlier of 24363 at 100Hz non-periodic which I'm going to
> pretend didn't happen).
> 
> The 1050 numbers are probably each 20ms sleep actually taking 21ms, but
> the old tvtohz code with -1 adjustments from the old email thread isn't
> in play anymore.  I don't know how to account for the other numbers at
> all.  There's all kinds of stuff I don't understand in the new code
> involving tick thresholds and such.
> 
> -- Ian
> 

The attached patch seems to fix the problem in what I think is the most
correct way: scheduling the callout with absolute times based on the
time the current event was scheduled for plus the requested interval.
The net effect should be metronomic events that do not drift (or phase
shift if you prefer) over time, regardless of any latency involved in
processing the events.

This makes all the numbers in the tests I ran above come out 1000.

It doesn't make me understand the strange results from the prior tests
any better.

-- Ian


--=-5ZH0QgKRoICHPsi7fRHH
Content-Disposition: inline; filename="kevent_timer_fix.diff"
Content-Type: text/x-patch; name="kevent_timer_fix.diff"; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Index: sys/sys/event.h
===================================================================
--- sys/sys/event.h	(revision 272181)
+++ sys/sys/event.h	(working copy)
@@ -221,6 +221,7 @@ struct knote {
 		struct		proc *p_proc;	/* proc pointer */
 		struct		aiocblist *p_aio;	/* AIO job pointer */
 		struct		aioliojob *p_lio;	/* LIO job pointer */
+		sbintime_t	*p_nexttime;	/* next timer event fires at */
 		void		*p_v;		/* generic other pointer */
 	} kn_ptr;
 	struct			filterops *kn_fop;
Index: sys/kern/kern_event.c
===================================================================
--- sys/kern/kern_event.c	(revision 272181)
+++ sys/kern/kern_event.c	(working copy)
@@ -569,9 +569,10 @@ filt_timerexpire(void *knx)
 
 	if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) {
 		calloutp = (struct callout *)kn->kn_hook;
-		callout_reset_sbt_on(calloutp,
-		    timer2sbintime(kn->kn_sdata, kn->kn_sfflags), 0,
-		    filt_timerexpire, kn, PCPU_GET(cpuid), 0);
+		*kn->kn_ptr.p_nexttime += timer2sbintime(kn->kn_sdata, 
+		    kn->kn_sfflags);
+		callout_reset_sbt_on(calloutp, *kn->kn_ptr.p_nexttime, 0,
+		    filt_timerexpire, kn, PCPU_GET(cpuid), C_ABSOLUTE);
 	}
 }
 
@@ -607,11 +608,13 @@ filt_timerattach(struct knote *kn)
 
 	kn->kn_flags |= EV_CLEAR;		/* automatically set */
 	kn->kn_status &= ~KN_DETACHED;		/* knlist_add clears it */
+	kn->kn_ptr.p_nexttime = malloc(sizeof(sbintime_t), M_KQUEUE, M_WAITOK);
 	calloutp = malloc(sizeof(*calloutp), M_KQUEUE, M_WAITOK);
 	callout_init(calloutp, CALLOUT_MPSAFE);
 	kn->kn_hook = calloutp;
-	callout_reset_sbt_on(calloutp, to, 0,
-	    filt_timerexpire, kn, PCPU_GET(cpuid), 0);
+	*kn->kn_ptr.p_nexttime = to + sbinuptime();
+	callout_reset_sbt_on(calloutp, *kn->kn_ptr.p_nexttime, 0,
+	    filt_timerexpire, kn, PCPU_GET(cpuid), C_ABSOLUTE);
 
 	return (0);
 }
@@ -625,6 +628,7 @@ filt_timerdetach(struct knote *kn)
 	calloutp = (struct callout *)kn->kn_hook;
 	callout_drain(calloutp);
 	free(calloutp, M_KQUEUE);
+	free(kn->kn_ptr.p_nexttime, M_KQUEUE);
 	old = atomic_fetch_sub_explicit(&kq_ncallouts, 1, memory_order_relaxed);
 	KASSERT(old > 0, ("Number of callouts cannot become negative"));
 	kn->kn_status |= KN_DETACHED;	/* knlist_remove sets it */

--=-5ZH0QgKRoICHPsi7fRHH--


From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 01:54:58 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id ED065E58
 for <freebsd-hackers@freebsd.org>; Fri,  3 Oct 2014 01:54:58 +0000 (UTC)
Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 9E7C411E
 for <freebsd-hackers@freebsd.org>; Fri,  3 Oct 2014 01:54:58 +0000 (UTC)
Received: (qmail 27664 invoked by uid 1000); 3 Oct 2014 01:54:56 -0000
Date: Thu, 2 Oct 2014 21:54:56 -0400
From: Larry Baird <lab@gta.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141003015456.GA27080@gta.com>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
 <20141002140232.GA52387@gta.com>
 <20141002143345.GY26076@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141002143345.GY26076@kib.kiev.ua>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Dimitry Andric <dim@FreeBSD.org>,
 Bryan Drewery <bdrewery@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 01:54:59 -0000

> The easiest thing to do is to record the stack depth for kernel mode
> on entry into interrupt.  Interrupt handlers are usually well written
> and do not consume a lot of stack.
> 
> Look at the intr_event_handle(), which is the entry point. The mode can
> be deduced from trapframe passed. The kernel stack for the thread is
> described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages
> (size), so the top of the stack is at td_kstack + td_kstack_size [*].
> The current stack consumption could be taken from reading %rsp register,
> or you may take the address of any local variable as well.
> 
> * - there are pcb and usermode fpu save area at the top of the stack, and
> actual kernel stack top is right below fpu save area.  This should not
> be important for your measurements, since you are looking at how close
> the %rsp gets to the bottom.

This idea worked very well.  Booting a GENERIC 10.1-BETA3 kernel I get a
maximum stack used of 5247 bytes. This was with a minimal virtual box
configuration. It would be interesting to hear about users with more exotic
hardware and or configurations. Not sure if I have the KASSERT correct.


Index: kern_intr.c
===================================================================
--- kern_intr.c	(revision 44897)
+++ kern_intr.c	(working copy)
@@ -1386,6 +1386,12 @@
 	}
 }
 
+static int max_kern_thread_stack;
+
+SYSCTL_INT(_kern, OID_AUTO, max_kern_thread_stack, CTLFLAG_RD,
+    &max_kern_thread_stack, 0,
+    "Maxiumum stack used by a kernel thread");
+
 /*
  * Main interrupt handling body.
  *
@@ -1407,6 +1413,22 @@
 
 	td = curthread;
 
+	/*
+	 * Check for maximum stack used bya kernel thread.
+	 */
+	if (!TRAPF_USERMODE(frame)) {
+	    char *top = (char *)(td->td_kstack + td->td_kstack_pages *
+		PAGE_SIZE - 1);
+	    char *current = (char *)&ih;
+	    int used = top - current;
+
+	    if (used > max_kern_thread_stack) {
+		max_kern_thread_stack = used;
+		KASSERT(max_kern_thread_stack < KSTACK_PAGES * PAGE_SIZE,
+		    "Maximum kernel thread stack exxceeded");
+	    }
+	}
+
 	/* An interrupt with no event or handlers is a stray interrupt. */
 	if (ie == NULL || TAILQ_EMPTY(&ie->ie_handlers))
 		return (EINVAL);

-- 
------------------------------------------------------------------------
Larry Baird
Global Technology Associates, Inc. 1992-2012 	| http://www.gta.com
Celebrating Twenty Years of Software Innovation | Orlando, FL
Email: lab@gta.com                 		| TEL 407-380-0220

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 07:35:23 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6DA0A14C;
 Fri,  3 Oct 2014 07:35:23 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id EC4B5263;
 Fri,  3 Oct 2014 07:35:22 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s937ZH5w065146
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 3 Oct 2014 10:35:17 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s937ZH5w065146
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s937ZHWC065143;
 Fri, 3 Oct 2014 10:35:17 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 3 Oct 2014 10:35:17 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Larry Baird <lab@gta.com>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141003073517.GC26076@kib.kiev.ua>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
 <20141002140232.GA52387@gta.com>
 <20141002143345.GY26076@kib.kiev.ua>
 <20141003015456.GA27080@gta.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141003015456.GA27080@gta.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Dimitry Andric <dim@FreeBSD.org>,
 Bryan Drewery <bdrewery@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 07:35:23 -0000

On Thu, Oct 02, 2014 at 09:54:56PM -0400, Larry Baird wrote:
> > The easiest thing to do is to record the stack depth for kernel mode
> > on entry into interrupt.  Interrupt handlers are usually well written
> > and do not consume a lot of stack.
> > 
> > Look at the intr_event_handle(), which is the entry point. The mode can
> > be deduced from trapframe passed. The kernel stack for the thread is
> > described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages
> > (size), so the top of the stack is at td_kstack + td_kstack_size [*].
> > The current stack consumption could be taken from reading %rsp register,
> > or you may take the address of any local variable as well.
> > 
> > * - there are pcb and usermode fpu save area at the top of the stack, and
> > actual kernel stack top is right below fpu save area.  This should not
> > be important for your measurements, since you are looking at how close
> > the %rsp gets to the bottom.
> 
> This idea worked very well.  Booting a GENERIC 10.1-BETA3 kernel I get a
> maximum stack used of 5247 bytes. This was with a minimal virtual box
> configuration. It would be interesting to hear about users with more exotic
> hardware and or configurations. Not sure if I have the KASSERT correct.
> 
I have several notes.  Mostly, it comes from my desire to make the patch
committable.

A global one is that the profiling of the stack use should be hidden
under some kernel config option.
> 
> 
> Index: kern_intr.c
> ===================================================================
> --- kern_intr.c	(revision 44897)
> +++ kern_intr.c	(working copy)
> @@ -1386,6 +1386,12 @@
>  	}
>  }
>  
> +static int max_kern_thread_stack;
Add 'usage' somewhere in the name of the var and sysctl ?

> +
> +SYSCTL_INT(_kern, OID_AUTO, max_kern_thread_stack, CTLFLAG_RD,
> +    &max_kern_thread_stack, 0,
> +    "Maxiumum stack used by a kernel thread");
> +
>  /*
>   * Main interrupt handling body.
>   *
> @@ -1407,6 +1413,22 @@
>  
>  	td = curthread;
>  
> +	/*
> +	 * Check for maximum stack used bya kernel thread.
> +	 */
> +	if (!TRAPF_USERMODE(frame)) {
Just a note, the test for interruption of the usermode is not strictly
needed, it only optimizes the execution, since interrupt from usermode
would have only the trap frame on the stack.  Might be, this should
be commented.

> +	    char *top = (char *)(td->td_kstack + td->td_kstack_pages *
> +		PAGE_SIZE - 1);
> +	    char *current = (char *)&ih;
Use the address of top ?  It should be deeper in the stack, and account
for the normal current function stack use, assuming compiler did not
flatten out the frame.  Anyway, it assumes that the stack grows down.

Also, there are some situations, where the hardware might switch to
dedicated stack for interrupt handling.  It is impossible right now
on amd64 and hw interrupts, but might become used in future, or on
other arches.  It makes sense to check that current value falls into
the td stack region, before using it.

> +	    int used = top - current;
> +
> +	    if (used > max_kern_thread_stack) {
> +		max_kern_thread_stack = used;
This should be a loop with atomic cas, to not loose update from other
thread.

> +		KASSERT(max_kern_thread_stack < KSTACK_PAGES * PAGE_SIZE,
> +		    "Maximum kernel thread stack exxceeded");
Assert is not needed, we put a guard page below the thread stack to
catch the overflow.  You have seen it already with double fault on x86.

> +	    }
> +	}
> +
>  	/* An interrupt with no event or handlers is a stray interrupt. */
>  	if (ie == NULL || TAILQ_EMPTY(&ie->ie_handlers))
>  		return (EINVAL);

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 14:04:17 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BE9A8AB9;
 Fri,  3 Oct 2014 14:04:17 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 94E33118;
 Fri,  3 Oct 2014 14:04:17 +0000 (UTC)
Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net
 [173.70.85.31])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7E2BCB9B0;
 Fri,  3 Oct 2014 10:04:16 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Ian Lepore <ian@freebsd.org>
Subject: Re: freebsd 10 kqueue timer regression
Date: Fri, 03 Oct 2014 08:50:12 -0400
Message-ID: <2499075.KMdpQjyIZI@ralph.baldwin.cx>
User-Agent: KMail/4.12.5 (FreeBSD/10.1-BETA2; KDE/4.12.5; amd64; ; )
In-Reply-To: <1412297389.12052.46.camel@revolution.hippie.lan>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <1412288106.12052.39.camel@revolution.hippie.lan>
 <1412297389.12052.46.camel@revolution.hippie.lan>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Fri, 03 Oct 2014 10:04:16 -0400 (EDT)
Cc: freebsd-hackers@freebsd.org, Adrian Chadd <adrian@freebsd.org>,
 Paul Albrecht <palbrecht@glccom.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 14:04:17 -0000

On Thursday, October 02, 2014 06:49:49 PM Ian Lepore wrote:
> On Thu, 2014-10-02 at 16:15 -0600, Ian Lepore wrote:
> > On Thu, 2014-10-02 at 16:00 -0400, John Baldwin wrote:
> > > On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote:
> > > > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote:
> > > > > I'm confused; it's doing 50 loops of a 20msec timer, right? So
> > > > > that's
> > > 
> > > 1000ms.
> > > 
> > > > Yes, so the entire loop should take 1000ms maybe + 1ms.  Instead it
> > > > takes 1070.  When I run it on an armv6 system running -current it
> > > > takes
> > > > 1050.  When I run it on my 8.4 desktop (pre-eventtimers) it takes
> > > > 1013.
> > > > 
> > > > -- Ian
> > > 
> > > What if you set kern.eventtimer.periodic=1?
> > 
> > Some interesting results...
> > 
> >            HZ   100   500    1000
> > 
> > ---------------------------------
> > periodic=0     1050  1050    1080
> > periodic=1     1110  1012    1049
> > 
> > 
> > The 1080 number was +/- 3ms, all the other numbers were +/- 1ms (except
> > for one outlier of 24363 at 100Hz non-periodic which I'm going to
> > pretend didn't happen).
> > 
> > The 1050 numbers are probably each 20ms sleep actually taking 21ms, but
> > the old tvtohz code with -1 adjustments from the old email thread isn't
> > in play anymore.  I don't know how to account for the other numbers at
> > all.  There's all kinds of stuff I don't understand in the new code
> > involving tick thresholds and such.
> > 
> > -- Ian
> 
> The attached patch seems to fix the problem in what I think is the most
> correct way: scheduling the callout with absolute times based on the
> time the current event was scheduled for plus the requested interval.
> The net effect should be metronomic events that do not drift (or phase
> shift if you prefer) over time, regardless of any latency involved in
> processing the events.
> 
> This makes all the numbers in the tests I ran above come out 1000.
> 
> It doesn't make me understand the strange results from the prior tests
> any better.
> 
> -- Ian

Are you running ntpd or ptpd?  If so, perhaps try the original tests without 
the patch.

That said, I think one of the reasons the old code worked was that the 
previous callout had the equivalent of the C_HARDCLOCK flag set.  Thus, when 
the timer interrupt fires and we rescheuled for N ticks, it was actually N 
ticks - <time since the timer interrupt that triggered this callout> (if that 
makes sense?).  Now I think what is happening is that that the 
callout_reset_sbt() is doing 'delta time t + <time since the timer interrupt>' 
and that that accounts for your numbers.  In that case, I wonder if the reason 
you are seeing such large gaps is that you have Cx states enabled (including 
C1E) and that those are making that second factor large enough to account for 
the skew.  Try setting machdep.idle=spin as a test (this is a better test than 
ntpd/ptpd I think).

The reason I don't like using C_ABSOLUTE is that in theory that will result in 
longer or shorter intervals if time is adjusted via ntp/ptp, whereas 
historically EVFILT_TIMER has been CLOCK_UPTIME based instead of 
CLOCK_REALTIME.

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 14:12:54 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2B9FAFCA
 for <freebsd-hackers@freebsd.org>; Fri,  3 Oct 2014 14:12:54 +0000 (UTC)
Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23])
 by mx1.freebsd.org (Postfix) with ESMTP id CD269210
 for <freebsd-hackers@freebsd.org>; Fri,  3 Oct 2014 14:12:53 +0000 (UTC)
Received: (qmail 69966 invoked by uid 1000); 3 Oct 2014 14:12:47 -0000
Date: Fri, 3 Oct 2014 10:12:47 -0400
From: Larry Baird <lab@gta.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141003141247.GA68109@gta.com>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
 <20141002140232.GA52387@gta.com>
 <20141002143345.GY26076@kib.kiev.ua>
 <20141003015456.GA27080@gta.com>
 <20141003073517.GC26076@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141003073517.GC26076@kib.kiev.ua>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Dimitry Andric <dim@FreeBSD.org>,
 Bryan Drewery <bdrewery@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 14:12:54 -0000

On Fri, Oct 03, 2014 at 10:35:17AM +0300, Konstantin Belousov wrote:
> On Thu, Oct 02, 2014 at 09:54:56PM -0400, Larry Baird wrote:
> > > The easiest thing to do is to record the stack depth for kernel mode
> > > on entry into interrupt.  Interrupt handlers are usually well written
> > > and do not consume a lot of stack.
> > > 
> > > Look at the intr_event_handle(), which is the entry point. The mode can
> > > be deduced from trapframe passed. The kernel stack for the thread is
> > > described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages
> > > (size), so the top of the stack is at td_kstack + td_kstack_size [*].
> > > The current stack consumption could be taken from reading %rsp register,
> > > or you may take the address of any local variable as well.
> > > 
> > > * - there are pcb and usermode fpu save area at the top of the stack, and
> > > actual kernel stack top is right below fpu save area.  This should not
> > > be important for your measurements, since you are looking at how close
> > > the %rsp gets to the bottom.
> > 
> > This idea worked very well.  Booting a GENERIC 10.1-BETA3 kernel I get a
> > maximum stack used of 5247 bytes. This was with a minimal virtual box
> > configuration. It would be interesting to hear about users with more exotic
> > hardware and or configurations. Not sure if I have the KASSERT correct.
> > 
> I have several notes.  Mostly, it comes from my desire to make the patch
> committable.
>
> A global one is that the profiling of the stack use should be hidden
> under some kernel config option.
My initial interest last night when I whipped up the code, was too prove that
kernel threads have some head room before 10.1 is released.  Now that I have
a better feeling about that, lets see about getting the code cleaned up.

Last night I briefly thought about a kernel config option.  My first thought
was DIAGNOSTIC, but DIAGNOSTIC brings in too many other bits. How does
DEBUG_KERNEL_THREAD_STACK sound for option?

> > 
> > Index: kern_intr.c
> > ===================================================================
> > --- kern_intr.c	(revision 44897)
> > +++ kern_intr.c	(working copy)
> > @@ -1386,6 +1386,12 @@
> >  	}
> >  }
> >  
> > +static int max_kern_thread_stack;
> Add 'usage' somewhere in the name of the var and sysctl ?
I was initially confused by this comment, but now I think I see what you are
asking. How about changing vaiable name to "max_kernal_thread_stack_used"?

> > +
> > +SYSCTL_INT(_kern, OID_AUTO, max_kern_thread_stack, CTLFLAG_RD,
> > +    &max_kern_thread_stack, 0,
> > +    "Maxiumum stack used by a kernel thread");
> > +
> >  /*
> >   * Main interrupt handling body.
> >   *
> > @@ -1407,6 +1413,22 @@
> >  
> >  	td = curthread;
> >  
> > +	/*
> > +	 * Check for maximum stack used bya kernel thread.
> > +	 */
> > +	if (!TRAPF_USERMODE(frame)) {
> Just a note, the test for interruption of the usermode is not strictly
> needed, it only optimizes the execution, since interrupt from usermode
> would have only the trap frame on the stack.  Might be, this should
> be commented.
> 
> > +	    char *top = (char *)(td->td_kstack + td->td_kstack_pages *
> > +		PAGE_SIZE - 1);
> > +	    char *current = (char *)&ih;
> Use the address of top ?  It should be deeper in the stack, and account
> for the normal current function stack use, assuming compiler did not
> flatten out the frame.  Anyway, it assumes that the stack grows down.
I was thinking it as top of memory used by stack verse bottom of stack.

> Also, there are some situations, where the hardware might switch to
> dedicated stack for interrupt handling.  It is impossible right now
> on amd64 and hw interrupts, but might become used in future, or on
> other arches.  It makes sense to check that current value falls into
> the td stack region, before using it.
> 
> > +	    int used = top - current;
> > +
> > +	    if (used > max_kern_thread_stack) {
> > +		max_kern_thread_stack = used;
> This should be a loop with atomic cas, to not loose update from other
> thread.
I briefly thought about using atomic update, but it was getting late and I
wanted to get something out the door. I'll update logic.

> > +		KASSERT(max_kern_thread_stack < KSTACK_PAGES * PAGE_SIZE,
> > +		    "Maximum kernel thread stack exxceeded");
> Assert is not needed, we put a guard page below the thread stack to
> catch the overflow.  You have seen it already with double fault on x86.
Double fault didn't really show where issue was.  I was hoping an assert
would slap a developer upside their head when their code exceeded the stack. (-:

> > +	    }
> > +	}
> > +
> >  	/* An interrupt with no event or handlers is a stray interrupt. */
> >  	if (ie == NULL || TAILQ_EMPTY(&ie->ie_handlers))
> >  		return (EINVAL);

I'll cleanup the code over the weekend and repost.

Larry

-- 
------------------------------------------------------------------------
Larry Baird
Global Technology Associates, Inc. 1992-2012 	| http://www.gta.com
Celebrating Twenty Years of Software Innovation | Orlando, FL
Email: lab@gta.com                 		| TEL 407-380-0220

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 16:17:01 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 1A2F242A;
 Fri,  3 Oct 2014 16:17:01 +0000 (UTC)
Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CDE451F6;
 Fri,  3 Oct 2014 16:17:00 +0000 (UTC)
Received: from [73.34.117.227] (helo=ilsoft.org)
 by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1Xa5XO-000175-Ua; Fri, 03 Oct 2014 16:16:59 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s93GGv94021728;
 Fri, 3 Oct 2014 10:16:57 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 73.34.117.227
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX194pVtioz1Omup/2aJae0vI
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: freebsd 10 kqueue timer regression
From: Ian Lepore <ian@FreeBSD.org>
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <2499075.KMdpQjyIZI@ralph.baldwin.cx>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <1412288106.12052.39.camel@revolution.hippie.lan>
 <1412297389.12052.46.camel@revolution.hippie.lan>
 <2499075.KMdpQjyIZI@ralph.baldwin.cx>
Content-Type: text/plain; charset="us-ascii"
Date: Fri, 03 Oct 2014 10:16:56 -0600
Message-ID: <1412353016.12052.82.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, Adrian Chadd <adrian@freebsd.org>,
 Paul Albrecht <palbrecht@glccom.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 16:17:01 -0000

On Fri, 2014-10-03 at 08:50 -0400, John Baldwin wrote:
> On Thursday, October 02, 2014 06:49:49 PM Ian Lepore wrote:
> > On Thu, 2014-10-02 at 16:15 -0600, Ian Lepore wrote:
> > > On Thu, 2014-10-02 at 16:00 -0400, John Baldwin wrote:
> > > > On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote:
> > > > > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote:
> > > > > > I'm confused; it's doing 50 loops of a 20msec timer, right? So
> > > > > > that's
> > > > 
> > > > 1000ms.
> > > > 
> > > > > Yes, so the entire loop should take 1000ms maybe + 1ms.  Instead it
> > > > > takes 1070.  When I run it on an armv6 system running -current it
> > > > > takes
> > > > > 1050.  When I run it on my 8.4 desktop (pre-eventtimers) it takes
> > > > > 1013.
> > > > > 
> > > > > -- Ian
> > > > 
> > > > What if you set kern.eventtimer.periodic=1?
> > > 
> > > Some interesting results...
> > > 
> > >            HZ   100   500    1000
> > > 
> > > ---------------------------------
> > > periodic=0     1050  1050    1080
> > > periodic=1     1110  1012    1049
> > > 
> > > 
> > > The 1080 number was +/- 3ms, all the other numbers were +/- 1ms (except
> > > for one outlier of 24363 at 100Hz non-periodic which I'm going to
> > > pretend didn't happen).
> > > 
> > > The 1050 numbers are probably each 20ms sleep actually taking 21ms, but
> > > the old tvtohz code with -1 adjustments from the old email thread isn't
> > > in play anymore.  I don't know how to account for the other numbers at
> > > all.  There's all kinds of stuff I don't understand in the new code
> > > involving tick thresholds and such.
> > > 
> > > -- Ian
> > 
> > The attached patch seems to fix the problem in what I think is the most
> > correct way: scheduling the callout with absolute times based on the
> > time the current event was scheduled for plus the requested interval.
> > The net effect should be metronomic events that do not drift (or phase
> > shift if you prefer) over time, regardless of any latency involved in
> > processing the events.
> > 
> > This makes all the numbers in the tests I ran above come out 1000.
> > 
> > It doesn't make me understand the strange results from the prior tests
> > any better.
> > 
> > -- Ian
> 
> Are you running ntpd or ptpd?  If so, perhaps try the original tests without 
> the patch.
> 

Yep, ntpd is running, and I was repeatedly running the test right after
rebooting to change kern.hz.  I realized that's the explanation for the
24-second outlier (which didn't take 24 seconds in reality) -- the
hardware I'm testing on has no RTC, and ntpd must have stepped the clock
24 seconds during the 1 second test loop.

I think ntpd probably explains another anomaly I saw in more testing
after that post last night.  I tested with a 20uS period instead of 20mS
and I kept seeing the loop take ~999810uS.  Initially I thought 190ppm
of clock error was a bit much to explain away easily, then I realized
it's quite likely that ntpd is steering hard to get on-time.  (The
normal drift rate for the timecounter clock on the system is ~18ppm.)

Taking ntpd out of the picture will remove artifacts from its clock
steering, but really then what the test program loop is doing is
measuring a pair of unsteered clocks against each other.  Depending on
the hardware, those clocks may derive from a common oscillator or not
(in which case they are most assuredly keeping time at slightly
different rates, for such is the nature of clocks).

With the hardware I'm testing on, I think I can arrange to have
eventtimers and timecounters both running from the same underlying
oscillator, then we should be truly measuring only the software
behavior.

> That said, I think one of the reasons the old code worked was that the 
> previous callout had the equivalent of the C_HARDCLOCK flag set.  Thus, when 
> the timer interrupt fires and we rescheuled for N ticks, it was actually N 
> ticks - <time since the timer interrupt that triggered this callout> (if that 
> makes sense?).  Now I think what is happening is that that the 
> callout_reset_sbt() is doing 'delta time t + <time since the timer interrupt>' 
> and that that accounts for your numbers.  

I'm lost with this paragraph.  I don't know what you mean by "old
code" (pre-eventtimer?  pre-my-changes?).

I do see stuff in the current code that, under some conditions, mixes in
data involving the prior timecounter tick and also stuff involving
precision values, but I just can't make sense of what it's doing.

> In that case, I wonder if the reason 
> you are seeing such large gaps is that you have Cx states enabled (including 
> C1E) and that those are making that second factor large enough to account for 
> the skew.  Try setting machdep.idle=spin as a test (this is a better test than 
> ntpd/ptpd I think).
> 

I'm testing on ARM hardware with no power management enabled other than
the idle loop executing a wait-for-interrupt instruction that does not
stop the eventtimer and timecounter clocks, and the latency for coming
out of that state is on the order of 1uS.

> The reason I don't like using C_ABSOLUTE is that in theory that will result in 
> longer or shorter intervals if time is adjusted via ntp/ptp, whereas 
> historically EVFILT_TIMER has been CLOCK_UPTIME based instead of 
> CLOCK_REALTIME.
> 

I don't see anything in the code that implies C_ABSOLUTE uses a
different clock.  In fact, it's even documented as being absolute time
since boot.  In callout_reset_sbt_on() if the absolute flag is set the
sbt you pass in is used as-is.  If the flag is not set, the sbt you pass
in is added to either the current uptime or the last tc tick uptime.

For getting a truly periodic timer event, this absolute time technique
is required.  Adding a delta to something that isn't the time the prior
was *scheduled* to occur will result in a phase shift in the event
occurrances over time (phase relative to uptime top-of-second).

-- Ian


From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 16:46:15 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2E4E492D
 for <freebsd-hackers@freebsd.org>; Fri,  3 Oct 2014 16:46:15 +0000 (UTC)
Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23])
 by mx1.freebsd.org (Postfix) with ESMTP id B48326ED
 for <freebsd-hackers@freebsd.org>; Fri,  3 Oct 2014 16:46:14 +0000 (UTC)
Received: (qmail 79250 invoked by uid 1000); 3 Oct 2014 16:46:13 -0000
Date: Fri, 3 Oct 2014 12:46:13 -0400
From: Larry Baird <lab@gta.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141003164613.GA78971@gta.com>
References: <20141001031553.GA14360@gta.com>
 <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
 <20141002140232.GA52387@gta.com>
 <20141002143345.GY26076@kib.kiev.ua>
 <20141003015456.GA27080@gta.com>
 <20141003073517.GC26076@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141003073517.GC26076@kib.kiev.ua>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Dimitry Andric <dim@FreeBSD.org>,
 Bryan Drewery <bdrewery@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 16:46:15 -0000

On Fri, Oct 03, 2014 at 10:35:17AM +0300, Konstantin Belousov wrote:
> I have several notes.  Mostly, it comes from my desire to make the patch
> committable.
I went ahead of made the changes over lunch.  Hopefully the patch below
addresses all of of the issues. Are you able to shepherd this into head or
should I open a PR?

Index: kern_intr.c
===================================================================
--- kern_intr.c	(revision 44897)
+++ kern_intr.c	(working copy)
@@ -1386,6 +1386,14 @@
 	}
 }
 
+#ifdef DEBUG_KERNEL_THREAD_STACK
+static int max_kern_thread_stack_used;
+
+SYSCTL_INT(_kern, OID_AUTO, max_kern_thread_stack_used, CTLFLAG_RD,
+    &max_kern_thread_stack_used, 0,
+    "Maxiumum stack depth used by a kernel thread");
+#endif /* DEBUG_KERNEL_THREAD_STACK */
+
 /*
  * Main interrupt handling body.
  *
@@ -1407,6 +1415,40 @@
 
 	td = curthread;
 
+#ifdef DEBUG_KERNEL_THREAD_STACK
+	/*
+	 * Track maximum stack used by a kernel thread.
+	 *
+	 * Testing for kernel thread isn't strictly needed. It optimizes the
+	 * execution, since interrupts from usermode will have only the trap
+	 * frame on the stack.
+	 */
+	char *bottom_of_stack;
+	char *current;
+	int used;
+
+	if (!TRAPF_USERMODE(frame)) {
+		bottom_of_stack = (char *)(td->td_kstack + td->td_kstack_pages *
+		    PAGE_SIZE - 1);
+		current = (char *)&ih;
+
+		/*
+		 * Try to detect if interrupt is using kernel thread stack.
+		 * Hardware could use a dedicated stack for interrupt handling.
+		 */
+		if (bottom_of_stack > current &&
+		    current > (char *)(td->td_kstack - PAGE_SIZE)) {
+			used = bottom_of_stack - current;
+
+			while (atomic_load_acq_int(&max_kern_thread_stack_used)
+			    < used) {
+				atomic_store_rel_int(&max_kern_thread_stack_used,
+				    used);
+			}
+		}
+	}
+#endif /* DEBUG_KERNEL_THREAD_STACK */
+
 	/* An interrupt with no event or handlers is a stray interrupt. */
 	if (ie == NULL || TAILQ_EMPTY(&ie->ie_handlers))
 		return (EINVAL);

-- 
------------------------------------------------------------------------
Larry Baird
Global Technology Associates, Inc. 1992-2012 	| http://www.gta.com
Celebrating Twenty Years of Software Innovation | Orlando, FL
Email: lab@gta.com                 		| TEL 407-380-0220

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 17:46:05 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2008DF68;
 Fri,  3 Oct 2014 17:46:05 +0000 (UTC)
Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com
 [IPv6:2a00:1450:400c:c05::236])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4DF60DB1;
 Fri,  3 Oct 2014 17:46:04 +0000 (UTC)
Received: by mail-wi0-f182.google.com with SMTP id n3so2867738wiv.9
 for <multiple recipients>; Fri, 03 Oct 2014 10:46:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=f6N5mVYeKcMB575ZygYSumWjMXlzTtDPCTU2vtQyugQ=;
 b=oj3r3twB+TpiX1NW4zmzKJTheHxe0oH0NIgmpnh69yD14MSHpdNdy0FEqGVmJdb328
 yDtwPuFHIBENPWD9C94NK5w26OyL5mU5nGhgKuuYqFovN+DyW/z1dufrIdQqf8UqB5pi
 n02WdgTTP4Ip8JLnXCYnjH2cieBZGpYX7WT6iOeD+bDf8oMg9ec4ejIRTuH9WMf3keB6
 0wZt/ijL3xafbaF6crOcrW7VlvTsXzRdUnyzNaw36cTj9uJTEjlbrA+HXVGNFzP7S6QC
 6Cwrgo6MDJ0nIJLKnZKP3Wzi+BCJGdgurl3tt9l6dkEOQddp5ifCUhxy2NeM8E6D9qYh
 igYQ==
MIME-Version: 1.0
X-Received: by 10.194.177.226 with SMTP id ct2mr9663735wjc.20.1412358362621;
 Fri, 03 Oct 2014 10:46:02 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.216.106.136 with HTTP; Fri, 3 Oct 2014 10:46:02 -0700 (PDT)
In-Reply-To: <1412297389.12052.46.camel@revolution.hippie.lan>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmokPNgckHiR0znp6p4u2NRO0aOR_eaOVaBWe7cWDp2_o5g@mail.gmail.com>
 <1412279608.12052.24.camel@revolution.hippie.lan>
 <201410021600.17740.jhb@freebsd.org>
 <1412288106.12052.39.camel@revolution.hippie.lan>
 <1412297389.12052.46.camel@revolution.hippie.lan>
Date: Fri, 3 Oct 2014 10:46:02 -0700
X-Google-Sender-Auth: vFmIDKZ133v5fmGxd_e9mBJH7vQ
Message-ID: <CAJ-Vmo=ApRriVD_tJ0Y6_p1RVkGtrinaHaxGX0Mb0GeXBGgKAQ@mail.gmail.com>
Subject: Re: freebsd 10 kqueue timer regression
From: Adrian Chadd <adrian@freebsd.org>
To: Ian Lepore <ian@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Paul Albrecht <palbrecht@glccom.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 17:46:05 -0000

Hi,

You could compile in KTR and get a KTR trace of the individual callouts firing.

(Or, dtrace, if you prefer and it's working on your platform.)

That'll tell you what's being queued at what time and what interval,
then when it's firing. It should then explain why the tests are
failing.

I'll see about doing that today - I have a -HEAD box I've been using
for testing timers/callouts on and it should still have the relevant
KTR bits enabled.


-a

On 2 October 2014 17:49, Ian Lepore <ian@freebsd.org> wrote:
> On Thu, 2014-10-02 at 16:15 -0600, Ian Lepore wrote:
>> On Thu, 2014-10-02 at 16:00 -0400, John Baldwin wrote:
>> > On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote:
>> > > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote:
>> > > > I'm confused; it's doing 50 loops of a 20msec timer, right? So that's
>> > 1000ms.
>> > >
>> > > Yes, so the entire loop should take 1000ms maybe + 1ms.  Instead it
>> > > takes 1070.  When I run it on an armv6 system running -current it takes
>> > > 1050.  When I run it on my 8.4 desktop (pre-eventtimers) it takes 1013.
>> > >
>> > > -- Ian
>> >
>> > What if you set kern.eventtimer.periodic=1?
>> >
>>
>> Some interesting results...
>>
>>            HZ   100   500    1000
>> ---------------------------------
>> periodic=0     1050  1050    1080
>> periodic=1     1110  1012    1049
>>
>>
>> The 1080 number was +/- 3ms, all the other numbers were +/- 1ms (except
>> for one outlier of 24363 at 100Hz non-periodic which I'm going to
>> pretend didn't happen).
>>
>> The 1050 numbers are probably each 20ms sleep actually taking 21ms, but
>> the old tvtohz code with -1 adjustments from the old email thread isn't
>> in play anymore.  I don't know how to account for the other numbers at
>> all.  There's all kinds of stuff I don't understand in the new code
>> involving tick thresholds and such.
>>
>> -- Ian
>>
>
> The attached patch seems to fix the problem in what I think is the most
> correct way: scheduling the callout with absolute times based on the
> time the current event was scheduled for plus the requested interval.
> The net effect should be metronomic events that do not drift (or phase
> shift if you prefer) over time, regardless of any latency involved in
> processing the events.
>
> This makes all the numbers in the tests I ran above come out 1000.
>
> It doesn't make me understand the strange results from the prior tests
> any better.
>
> -- Ian
>
>
> Index: sys/sys/event.h
> ===================================================================
> --- sys/sys/event.h     (revision 272181)
> +++ sys/sys/event.h     (working copy)
> @@ -221,6 +221,7 @@ struct knote {
>                 struct          proc *p_proc;   /* proc pointer */
>                 struct          aiocblist *p_aio;       /* AIO job pointer */
>                 struct          aioliojob *p_lio;       /* LIO job pointer */
> +               sbintime_t      *p_nexttime;    /* next timer event fires at */
>                 void            *p_v;           /* generic other pointer */
>         } kn_ptr;
>         struct                  filterops *kn_fop;
> Index: sys/kern/kern_event.c
> ===================================================================
> --- sys/kern/kern_event.c       (revision 272181)
> +++ sys/kern/kern_event.c       (working copy)
> @@ -569,9 +569,10 @@ filt_timerexpire(void *knx)
>
>         if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) {
>                 calloutp = (struct callout *)kn->kn_hook;
> -               callout_reset_sbt_on(calloutp,
> -                   timer2sbintime(kn->kn_sdata, kn->kn_sfflags), 0,
> -                   filt_timerexpire, kn, PCPU_GET(cpuid), 0);
> +               *kn->kn_ptr.p_nexttime += timer2sbintime(kn->kn_sdata,
> +                   kn->kn_sfflags);
> +               callout_reset_sbt_on(calloutp, *kn->kn_ptr.p_nexttime, 0,
> +                   filt_timerexpire, kn, PCPU_GET(cpuid), C_ABSOLUTE);
>         }
>  }
>
> @@ -607,11 +608,13 @@ filt_timerattach(struct knote *kn)
>
>         kn->kn_flags |= EV_CLEAR;               /* automatically set */
>         kn->kn_status &= ~KN_DETACHED;          /* knlist_add clears it */
> +       kn->kn_ptr.p_nexttime = malloc(sizeof(sbintime_t), M_KQUEUE, M_WAITOK);
>         calloutp = malloc(sizeof(*calloutp), M_KQUEUE, M_WAITOK);
>         callout_init(calloutp, CALLOUT_MPSAFE);
>         kn->kn_hook = calloutp;
> -       callout_reset_sbt_on(calloutp, to, 0,
> -           filt_timerexpire, kn, PCPU_GET(cpuid), 0);
> +       *kn->kn_ptr.p_nexttime = to + sbinuptime();
> +       callout_reset_sbt_on(calloutp, *kn->kn_ptr.p_nexttime, 0,
> +           filt_timerexpire, kn, PCPU_GET(cpuid), C_ABSOLUTE);
>
>         return (0);
>  }
> @@ -625,6 +628,7 @@ filt_timerdetach(struct knote *kn)
>         calloutp = (struct callout *)kn->kn_hook;
>         callout_drain(calloutp);
>         free(calloutp, M_KQUEUE);
> +       free(kn->kn_ptr.p_nexttime, M_KQUEUE);
>         old = atomic_fetch_sub_explicit(&kq_ncallouts, 1, memory_order_relaxed);
>         KASSERT(old > 0, ("Number of callouts cannot become negative"));
>         kn->kn_status |= KN_DETACHED;   /* knlist_remove sets it */
>

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 19:56:22 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 966FC305;
 Fri,  3 Oct 2014 19:56:22 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1FB7BDB4;
 Fri,  3 Oct 2014 19:56:21 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s93JuFQM060556
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 3 Oct 2014 22:56:15 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s93JuFQM060556
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s93JuFWI060555;
 Fri, 3 Oct 2014 22:56:15 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 3 Oct 2014 22:56:15 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Larry Baird <lab@gta.com>
Subject: Re: Kernel/Compiler bug
Message-ID: <20141003195615.GI26076@kib.kiev.ua>
References: <CAFMmRNxAYcr8eEY0SJsX3zkRadjT29-mfsGcSTmG_Yx-Hidi6w@mail.gmail.com>
 <20141001134044.GA57022@gta.com>
 <FBB9E4C3-55B9-4917-9953-F8BC9AE43619@FreeBSD.org>
 <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua>
 <20141002140232.GA52387@gta.com>
 <20141002143345.GY26076@kib.kiev.ua>
 <20141003015456.GA27080@gta.com>
 <20141003073517.GC26076@kib.kiev.ua>
 <20141003164613.GA78971@gta.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141003164613.GA78971@gta.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Ryan Stone <rysto32@gmail.com>, Dimitry Andric <dim@FreeBSD.org>,
 Bryan Drewery <bdrewery@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 19:56:22 -0000

On Fri, Oct 03, 2014 at 12:46:13PM -0400, Larry Baird wrote:
> On Fri, Oct 03, 2014 at 10:35:17AM +0300, Konstantin Belousov wrote:
> > I have several notes.  Mostly, it comes from my desire to make the patch
> > committable.
> I went ahead of made the changes over lunch.  Hopefully the patch below
> addresses all of of the issues. Are you able to shepherd this into head or
> should I open a PR?
> 
> Index: kern_intr.c
> ===================================================================
> --- kern_intr.c	(revision 44897)
> +++ kern_intr.c	(working copy)
> @@ -1386,6 +1386,14 @@
>  	}
>  }
>  
> +#ifdef DEBUG_KERNEL_THREAD_STACK
> +static int max_kern_thread_stack_used;
> +
> +SYSCTL_INT(_kern, OID_AUTO, max_kern_thread_stack_used, CTLFLAG_RD,
> +    &max_kern_thread_stack_used, 0,
> +    "Maxiumum stack depth used by a kernel thread");
> +#endif /* DEBUG_KERNEL_THREAD_STACK */
> +
>  /*
>   * Main interrupt handling body.
>   *
> @@ -1407,6 +1415,40 @@
>  
>  	td = curthread;
>  
> +#ifdef DEBUG_KERNEL_THREAD_STACK
> +	/*
> +	 * Track maximum stack used by a kernel thread.
> +	 *
> +	 * Testing for kernel thread isn't strictly needed. It optimizes the
> +	 * execution, since interrupts from usermode will have only the trap
> +	 * frame on the stack.
> +	 */
> +	char *bottom_of_stack;
> +	char *current;
> +	int used;
> +
> +	if (!TRAPF_USERMODE(frame)) {
> +		bottom_of_stack = (char *)(td->td_kstack + td->td_kstack_pages *
> +		    PAGE_SIZE - 1);
> +		current = (char *)&ih;
> +
> +		/*
> +		 * Try to detect if interrupt is using kernel thread stack.
> +		 * Hardware could use a dedicated stack for interrupt handling.
> +		 */
> +		if (bottom_of_stack > current &&
> +		    current > (char *)(td->td_kstack - PAGE_SIZE)) {
Why do you substract PAGE_SIZE in the right size of the second condition ?

> +			used = bottom_of_stack - current;
> +
> +			while (atomic_load_acq_int(&max_kern_thread_stack_used)
> +			    < used) {
> +				atomic_store_rel_int(&max_kern_thread_stack_used,
> +				    used);
> +			}
> +		}
> +	}
> +#endif /* DEBUG_KERNEL_THREAD_STACK */
> +
>  	/* An interrupt with no event or handlers is a stray interrupt. */
>  	if (ie == NULL || TAILQ_EMPTY(&ie->ie_handlers))
>  		return (EINVAL);
> 

I rewrote the patch to my liking.  Please test.
You should add
options KSTACK_USAGE_PROF
to your kernel config to get this activated.

I dislike the location of the intr_prof_stack_use() declaration, but
I was unable to find any other reasonable place for it.

commit 356881144088e850e725da2bb6f28dd52c4334b9
Author: Konstantin Belousov <kib@freebsd.org>
Date:   Fri Oct 3 22:52:48 2014 +0300

    KSTACK_USAGE_PROF

diff --git a/sys/conf/NOTES b/sys/conf/NOTES
index 5baa306..5cc146e 100644
--- a/sys/conf/NOTES
+++ b/sys/conf/NOTES
@@ -2958,6 +2958,7 @@ options 	SC_RENDER_DEBUG	# syscons rendering debugging
 options 	VFS_BIO_DEBUG	# VFS buffer I/O debugging
 
 options 	KSTACK_MAX_PAGES=32 # Maximum pages to give the kernel stack
+options 	KSTACK_USAGE_PROF
 
 # Adaptec Array Controller driver options
 options 	AAC_DEBUG	# Debugging levels:
diff --git a/sys/conf/options b/sys/conf/options
index 42113c3..8337521 100644
--- a/sys/conf/options
+++ b/sys/conf/options
@@ -136,6 +136,7 @@ KDTRACE_FRAME	opt_kdtrace.h
 KN_HASHSIZE	opt_kqueue.h
 KSTACK_MAX_PAGES
 KSTACK_PAGES
+KSTACK_USAGE_PROF
 KTRACE
 KTRACE_REQUEST_POOL	opt_ktrace.h
 LIBICONV
diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c
index 6e9a4e8..d6de611 100644
--- a/sys/kern/kern_intr.c
+++ b/sys/kern/kern_intr.c
@@ -28,6 +28,7 @@
 __FBSDID("$FreeBSD$");
 
 #include "opt_ddb.h"
+#include "opt_kstack_usage_prof.h"
 
 #include <sys/param.h>
 #include <sys/bus.h>
@@ -1396,6 +1397,10 @@ intr_event_handle(struct intr_event *ie, struct trapframe *frame)
 
 	td = curthread;
 
+#ifdef KSTACK_USAGE_PROF
+	intr_prof_stack_use(td, frame);
+#endif
+
 	/* An interrupt with no event or handlers is a stray interrupt. */
 	if (ie == NULL || TAILQ_EMPTY(&ie->ie_handlers))
 		return (EINVAL);
diff --git a/sys/sys/systm.h b/sys/sys/systm.h
index 0f2732c..c484b7b 100644
--- a/sys/sys/systm.h
+++ b/sys/sys/systm.h
@@ -443,4 +443,6 @@ bitcount16(uint32_t x)
 	return (x);
 }
 
+void	intr_prof_stack_use(struct thread *td, struct trapframe *frame);
+
 #endif /* !_SYS_SYSTM_H_ */
diff --git a/sys/vm/vm_glue.c b/sys/vm/vm_glue.c
index 61c003b..c9ee890 100644
--- a/sys/vm/vm_glue.c
+++ b/sys/vm/vm_glue.c
@@ -62,6 +62,7 @@ __FBSDID("$FreeBSD$");
 #include "opt_vm.h"
 #include "opt_kstack_pages.h"
 #include "opt_kstack_max_pages.h"
+#include "opt_kstack_usage_prof.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
@@ -98,6 +99,8 @@ __FBSDID("$FreeBSD$");
 #include <vm/vm_pager.h>
 #include <vm/swap_pager.h>
 
+#include <machine/cpu.h>
+
 #ifndef NO_SWAPPING
 static int swapout(struct proc *);
 static void swapclear(struct proc *);
@@ -486,6 +489,52 @@ kstack_cache_init(void *nulll)
 
 SYSINIT(vm_kstacks, SI_SUB_KTHREAD_INIT, SI_ORDER_ANY, kstack_cache_init, NULL);
 
+#ifdef KSTACK_USAGE_PROF
+/*
+ * Track maximum stack used by a thread in kernel.
+ */
+static int max_kstack_used;
+
+SYSCTL_INT(_debug, OID_AUTO, max_kstack_used, CTLFLAG_RD,
+    &max_kstack_used, 0,
+    "Maxiumum stack depth used by a thread in kernel");
+
+void
+intr_prof_stack_use(struct thread *td, struct trapframe *frame)
+{
+	vm_offset_t stack_top;
+	vm_offset_t current;
+	int used, prev_used;
+
+	/*
+	 * Testing for interrupted kernel mode isn't strictly
+	 * needed. It optimizes the execution, since interrupts from
+	 * usermode will have only the trap frame on the stack.
+	 */
+	if (TRAPF_USERMODE(frame))
+		return;
+
+	stack_top = td->td_kstack + td->td_kstack_pages * PAGE_SIZE;
+	current = (vm_offset_t)(uintptr_t)&stack_top;
+
+	/*
+	 * Try to detect if interrupt is using kernel thread stack.
+	 * Hardware could use a dedicated stack for interrupt handling.
+	 */
+	if (stack_top <= current || current < td->td_kstack)
+		return;
+
+	used = stack_top - current;
+	for (;;) {
+		prev_used = max_kstack_used;
+		if (prev_used >= used)
+			break;
+		if (atomic_cmpset_int(&max_kstack_used, prev_used, used))
+			break;
+	}
+}
+#endif /* KSTACK_USAGE_PROF */
+
 #ifndef NO_SWAPPING
 /*
  * Allow a thread's kernel stack to be paged out.

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct  3 20:36:13 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 97B06D78;
 Fri,  3 Oct 2014 20:36:13 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7058E223;
 Fri,  3 Oct 2014 20:36:13 +0000 (UTC)
Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net
 [173.70.85.31])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id EA3BAB995;
 Fri,  3 Oct 2014 16:36:11 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Ian Lepore <ian@freebsd.org>
Subject: Re: freebsd 10 kqueue timer regression
Date: Fri, 03 Oct 2014 16:35:39 -0400
Message-ID: <3432803.0HkSRzvHj6@ralph.baldwin.cx>
User-Agent: KMail/4.12.5 (FreeBSD/10.1-BETA2; KDE/4.12.5; amd64; ; )
In-Reply-To: <1412353016.12052.82.camel@revolution.hippie.lan>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <2499075.KMdpQjyIZI@ralph.baldwin.cx>
 <1412353016.12052.82.camel@revolution.hippie.lan>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Fri, 03 Oct 2014 16:36:12 -0400 (EDT)
Cc: freebsd-hackers@freebsd.org, Adrian Chadd <adrian@freebsd.org>,
 Paul Albrecht <palbrecht@glccom.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Oct 2014 20:36:13 -0000

On Friday, October 03, 2014 10:16:56 AM Ian Lepore wrote:
> On Fri, 2014-10-03 at 08:50 -0400, John Baldwin wrote:
> > That said, I think one of the reasons the old code worked was that the
> > previous callout had the equivalent of the C_HARDCLOCK flag set.  Thus,
> > when the timer interrupt fires and we rescheuled for N ticks, it was
> > actually N ticks - <time since the timer interrupt that triggered this
> > callout> (if that makes sense?).  Now I think what is happening is that
> > that the
> > callout_reset_sbt() is doing 'delta time t + <time since the timer
> > interrupt>' and that that accounts for your numbers.
> 
> I'm lost with this paragraph.  I don't know what you mean by "old
> code" (pre-eventtimer?  pre-my-changes?).

Pre-eventtimer.  My suggestion here is that because the old code fired at 
hardclock intervals and because the second value (<time since the timer 
interrupt fired (or was scheduled)>) is much smaller than 1/hz, the old code 
was not affected by the phase shift.

> > In that case, I wonder if the reason
> > you are seeing such large gaps is that you have Cx states enabled
> > (including C1E) and that those are making that second factor large enough
> > to account for the skew.  Try setting machdep.idle=spin as a test (this
> > is a better test than ntpd/ptpd I think).
> 
> I'm testing on ARM hardware with no power management enabled other than
> the idle loop executing a wait-for-interrupt instruction that does not
> stop the eventtimer and timecounter clocks, and the latency for coming
> out of that state is on the order of 1uS.

Ok.  Certainly on x86 boxes the latency for coming out of Cx states could be 
much larger (80-100 us IIRC) and that would give a phase shift comparable to 
what you see.

> > The reason I don't like using C_ABSOLUTE is that in theory that will
> > result in longer or shorter intervals if time is adjusted via ntp/ptp,
> > whereas historically EVFILT_TIMER has been CLOCK_UPTIME based instead of
> > CLOCK_REALTIME.
> 
> I don't see anything in the code that implies C_ABSOLUTE uses a
> different clock.  In fact, it's even documented as being absolute time
> since boot.

Ah, I'm used to absolute timeouts being walltime for some reason.

> In callout_reset_sbt_on() if the absolute flag is set the
> sbt you pass in is used as-is.  If the flag is not set, the sbt you pass
> in is added to either the current uptime or the last tc tick uptime.
> 
> For getting a truly periodic timer event, this absolute time technique
> is required.  Adding a delta to something that isn't the time the prior
> was *scheduled* to occur will result in a phase shift in the event
> occurrances over time (phase relative to uptime top-of-second).

I agree, I was only worrying about ntp time stepping somehow skewing this.
However, since C_ABSOLUTE shouldn't be affected by that, I think your
patch is correct.

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct  4 16:02:33 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 12FD0CE7;
 Sat,  4 Oct 2014 16:02:33 +0000 (UTC)
Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D8B2A3BA;
 Sat,  4 Oct 2014 16:02:32 +0000 (UTC)
Received: from [73.34.117.227] (helo=ilsoft.org)
 by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1XaRmr-0009fx-Jh; Sat, 04 Oct 2014 16:02:25 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s94G2O2H024120;
 Sat, 4 Oct 2014 10:02:24 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 73.34.117.227
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX18RO1IMZKw53K0h4Fh6wy51
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: freebsd 10 kqueue timer regression
From: Ian Lepore <ian@FreeBSD.org>
To: Paul Albrecht <palbrecht@glccom.com>
In-Reply-To: <8587D819-AA2F-4387-A4E9-523014384672@glccom.com>
References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com>
 <CAJ-VmonJQKWeW7K6+jY6=FpmZrm+6HQOuBmhhjJEapyVpwNFdQ@mail.gmail.com>
 <8587D819-AA2F-4387-A4E9-523014384672@glccom.com>
Content-Type: text/plain; charset="iso-8859-13"
Date: Sat, 04 Oct 2014 10:02:23 -0600
Message-ID: <1412438543.12052.107.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by ilsoft.org id
 s94G2O2H024120
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Adrian Chadd <adrian@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 04 Oct 2014 16:02:33 -0000

On Thu, 2014-10-02 at 13:13 -0500, Paul Albrecht wrote:
> On Oct 2, 2014, at 12:18 PM, Adrian Chadd <adrian@freebsd.org> wrote:
>=20
> > On 2 October 2014 08:07, Paul Albrecht <palbrecht@glccom.com> wrote:
> >>=20
> >> Hi,
> >>=20
> >> What=FFs up with freebsd 10? I=FFm testing some code that uses the k=
queue timer for timing and it doesn=FFt work because the precision of the=
 timer is off.
> >=20
> > Can you provide a test case for it?
>=20
> Here=FFs the code:
>=20
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <errno.h>
> #include <sys/types.h>
> #include <sys/event.h>
> #include <sys/time.h>
>=20
> int
> main(void)
> {
>         int i,msec;
>         int kq,nev;
>         struct kevent inqueue;
>         struct kevent outqueue;
>         struct timeval start,end;
>=20
>         if ((kq =3D kqueue()) =3D=3D -1) {
>                 fprintf(stderr, "kqueue error!? errno =3D %s", strerror=
(errno));
>                 exit(EXIT_FAILURE);
>         }
>         EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0)=
;
>=20
>         gettimeofday(&start, 0);
>         for (i =3D 0; i < 50; i++) {
>                 if ((nev =3D kevent(kq, &inqueue, 1, &outqueue, 1, NULL=
)) =3D=3D -1) {
>                         fprintf(stderr, "kevent error!? errno =3D %s", =
strerror(errno));
>                         exit(EXIT_FAILURE);
>                 } else if (outqueue.flags & EV_ERROR) {
>                         fprintf(stderr, "EV_ERROR: %s\n", strerror(outq=
ueue.data));
>                         exit(EXIT_FAILURE);
>                 }
>         }
>         gettimeofday(&end, 0);
>=20
>         msec =3D ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 + en=
d.tv_usec - start.tv_usec) / 1000) - 1000);
>=20
>         printf("msec =3D %d\n", msec);
>=20
>         close(kq);
>         return EXIT_SUCCESS;
> }
>=20
> When I run it on my system I get these results:
>=20
> ./a.out
> msec =3D 1072
> ./a.out
> msec =3D 1071
> ./a.out
> msec =3D 1071
>=20
> Which is over about 3.5 times the wait time per second.
>=20
>=20
> >=20
> > I just chased down one of those recently; maybe it's the same thing
> > (callout() API changes.)
> >=20
> >=20

FYI, I just committed the fix to -current as r272528.  I'll MFC it to
10-stable in 3 days, and then we'll see if it can get into the 10.1
release cycle.

-- Ian