From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 28 01:07:43 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 26239990 for ; Sun, 28 Sep 2014 01:07:43 +0000 (UTC) Received: from mail-ig0-f181.google.com (mail-ig0-f181.google.com [209.85.213.181]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E691EA21 for ; Sun, 28 Sep 2014 01:07:42 +0000 (UTC) Received: by mail-ig0-f181.google.com with SMTP id h18so1516243igc.8 for ; Sat, 27 Sep 2014 18:07:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=dw+uLjsl8zv5Dm0fuABFKZlhGvwtipCVFUDSHnlH00I=; b=abx5fhT8MSMWooPZTQAYG/FeDIahOvFGkWdSL1RmNOzhchL4wZTC7lR+SuEXxyMy5I /iPGm0qtlvkp5CpkK9iQAa1xxwQiQ2cC4w9QZXd3tHMp7rMnDPumFhSfLBi7FqTCwdJu 8xOI09fhwK3KhLYDV1nzm6wI/TeG/4BTDOnX3uJcm7x6u5K8HLvnuzIbybxTPukbS1mP ddjgLPyrcSWz7bTa/RLAKju1dU7NLDy9IIZDi774GIt94fCLSV8Pilcu0jcWKsp8pl2t XXXs49uTpJLPmluHjziB74Yo76ncq3ffEtIq3o2fcXqwExPtAXYSZ/8ZvFe0CbliImD1 4dtg== X-Gm-Message-State: ALoCoQmQLvHjmz5E+HbBv9unznq58BbPFknus8WOzAb/ls8FxyMbUmU8emARJNy/wjnpFYzqzC0g X-Received: by 10.50.62.50 with SMTP id v18mr24503647igr.21.1411866007491; Sat, 27 Sep 2014 18:00:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.9.67 with HTTP; Sat, 27 Sep 2014 17:59:47 -0700 (PDT) X-Originating-IP: [67.198.113.68] From: Bryan Venteicher Date: Sat, 27 Sep 2014 19:59:47 -0500 Message-ID: Subject: Change uma_mtx to rwlock To: "freebsd-hackers@freebsd.org" Content-Type: multipart/mixed; boundary=047d7bdc0854d973ec050415ab2b X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 01:07:43 -0000 --047d7bdc0854d973ec050415ab2b Content-Type: text/plain; charset=UTF-8 Hi, I'd appreciate some comments attached patch that changes the uma_mtx to a rwlock. At $JOB, we have machines with ~400GB RAM, with much of that being allocated through UMA zones. We've observed that timeouts were sometimes unexpectedly delayed by a half second or more. We tracked one of the reasons for this down to when the page daemon was running, calling uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while zone_drain()'ing each zone. If uma_timeout() fires, it will block on the uma_mtx when it tries to zone_timeout() each zone. --047d7bdc0854d973ec050415ab2b Content-Type: application/octet-stream; name="0001-Make-the-UMA-lock-a-rwlock-instead-of-a-mutex.patch" Content-Disposition: attachment; filename="0001-Make-the-UMA-lock-a-rwlock-instead-of-a-mutex.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i0lofryd0 RnJvbSA5NTYwYzMwODIyOWZlODhjOTAxMTRhZmIyMDdmZDJhOTZlZDYzMjRmIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBCcnlhbiBWZW50ZWljaGVyIDxicnlhbnZAZGFlbW9uaW50aGVj bG9zZXQub3JnPgpEYXRlOiBUdWUsIDEgSnVsIDIwMTQgMTY6MDQ6MjMgLTA1MDAKU3ViamVjdDog W1BBVENIXSBNYWtlIHRoZSBVTUEgbG9jayBhIHJ3bG9jayBpbnN0ZWFkIG9mIGEgbXV0ZXgKClRo ZSB6b25lX2ZvcmVhY2goKSBjYWxsIHRoYXQgaXMgZG9uZSBpbiB1bWFfdGltZW91dCgpIG1heSBi bG9jayBvbiB0aGUKVU1BIG11dGV4IHdoaWxlIHRoZSBlYWNoIHpvbmUgaXMgZHJhaW5lZCBpbiB1 bWFfcmVjbGFpbSgpLiBUaGlzIG1heQpzdGFsbCB0aW1lb3V0cyBmb3IgYW4gdW5hY2NlcHRhYmxl IGFtb3VudCBvZiB0aW1lIGlmIHRoZSBkcmFpbmluZyB0YWtlcwphIGxvbmcgdGltZS4KLS0tCiBz eXMvdm0vdW1hX2NvcmUuYyB8IDQ5ICsrKysrKysrKysrKysrKysrKysrKysrKysrKy0tLS0tLS0t LS0tLS0tLS0tLS0tLS0KIDEgZmlsZSBjaGFuZ2VkLCAyNyBpbnNlcnRpb25zKCspLCAyMiBkZWxl dGlvbnMoLSkKCmRpZmYgLS1naXQgYS9zeXMvdm0vdW1hX2NvcmUuYyBiL3N5cy92bS91bWFfY29y ZS5jCmluZGV4IDgxYjcxNGEuLjMwYmE5MGEgMTAwNjQ0Ci0tLSBhL3N5cy92bS91bWFfY29yZS5j CisrKyBiL3N5cy92bS91bWFfY29yZS5jCkBAIC0xMzUsOCArMTM1LDggQEAgc3RhdGljIExJU1Rf SEVBRCgsdW1hX2tlZykgdW1hX2tlZ3MgPSBMSVNUX0hFQURfSU5JVElBTElaRVIodW1hX2tlZ3Mp Owogc3RhdGljIExJU1RfSEVBRCgsdW1hX3pvbmUpIHVtYV9jYWNoZXpvbmVzID0KICAgICBMSVNU X0hFQURfSU5JVElBTElaRVIodW1hX2NhY2hlem9uZXMpOwogCi0vKiBUaGlzIG11dGV4IHByb3Rl Y3RzIHRoZSBrZWcgbGlzdCAqLwotc3RhdGljIHN0cnVjdCBtdHhfcGFkYWxpZ24gdW1hX210eDsK Ky8qIFRoaXMgUlcgbG9jayBwcm90ZWN0cyB0aGUga2VnIGxpc3QgKi8KK3N0YXRpYyBzdHJ1Y3Qg cndsb2NrX3BhZGFsaWduIHVtYV9yd2xvY2s7CiAKIC8qIExpbmtlZCBsaXN0IG9mIGJvb3QgdGlt ZSBwYWdlcyAqLwogc3RhdGljIExJU1RfSEVBRCgsdW1hX3NsYWIpIHVtYV9ib290X3BhZ2VzID0K QEAgLTg4Niw2ICs4ODYsNyBAQCBmaW5pc2hlZDoKIHN0YXRpYyB2b2lkCiB6b25lX2RyYWluX3dh aXQodW1hX3pvbmVfdCB6b25lLCBpbnQgd2FpdG9rKQogeworCWludCB3bG9jazsKIAogCS8qCiAJ ICogU2V0IGRyYWluaW5nIHRvIGludGVybG9jayB3aXRoIHpvbmVfZHRvcigpIHNvIHdlIGNhbiBy ZWxlYXNlIG91cgpAQCAtODk3LDE2ICs4OTgsMjAgQEAgem9uZV9kcmFpbl93YWl0KHVtYV96b25l X3Qgem9uZSwgaW50IHdhaXRvaykKIAl3aGlsZSAoem9uZS0+dXpfZmxhZ3MgJiBVTUFfWkZMQUdf RFJBSU5JTkcpIHsKIAkJaWYgKHdhaXRvayA9PSBNX05PV0FJVCkKIAkJCWdvdG8gb3V0OwotCQlt dHhfdW5sb2NrKCZ1bWFfbXR4KTsKKwkJd2xvY2sgPSByd193b3duZWQoJnVtYV9yd2xvY2spOwor CQlyd191bmxvY2soJnVtYV9yd2xvY2spOwogCQltc2xlZXAoem9uZSwgem9uZS0+dXpfbG9ja3B0 ciwgUFZNLCAiem9uZWRyYWluIiwgMSk7Ci0JCW10eF9sb2NrKCZ1bWFfbXR4KTsKKwkJaWYgKHds b2NrICE9IDApCisJCQlyd193bG9jaygmdW1hX3J3bG9jayk7CisJCWVsc2UKKwkJCXJ3X3Jsb2Nr KCZ1bWFfcndsb2NrKTsKIAl9CiAJem9uZS0+dXpfZmxhZ3MgfD0gVU1BX1pGTEFHX0RSQUlOSU5H OwogCWJ1Y2tldF9jYWNoZV9kcmFpbih6b25lKTsKIAlaT05FX1VOTE9DSyh6b25lKTsKIAkvKgog CSAqIFRoZSBEUkFJTklORyBmbGFnIHByb3RlY3RzIHVzIGZyb20gYmVpbmcgZnJlZWQgd2hpbGUK LQkgKiB3ZSdyZSBydW5uaW5nLiAgTm9ybWFsbHkgdGhlIHVtYV9tdHggd291bGQgcHJvdGVjdCB1 cyBidXQgd2UKKwkgKiB3ZSdyZSBydW5uaW5nLiAgTm9ybWFsbHkgdGhlIHVtYV9yd2xvY2sgd291 bGQgcHJvdGVjdCB1cyBidXQgd2UKIAkgKiBtdXN0IGJlIGFibGUgdG8gcmVsZWFzZSBhbmQgYWNx dWlyZSB0aGUgcmlnaHQgbG9jayBmb3IgZWFjaCBrZWcuCiAJICovCiAJem9uZV9mb3JlYWNoX2tl Zyh6b25lLCAma2VnX2RyYWluKTsKQEAgLTE1NDIsOSArMTU0Nyw5IEBAIGtlZ19jdG9yKHZvaWQg Km1lbSwgaW50IHNpemUsIHZvaWQgKnVkYXRhLCBpbnQgZmxhZ3MpCiAKIAlMSVNUX0lOU0VSVF9I RUFEKCZrZWctPnVrX3pvbmVzLCB6b25lLCB1el9saW5rKTsKIAotCW10eF9sb2NrKCZ1bWFfbXR4 KTsKKwlyd193bG9jaygmdW1hX3J3bG9jayk7CiAJTElTVF9JTlNFUlRfSEVBRCgmdW1hX2tlZ3Ms IGtlZywgdWtfbGluayk7Ci0JbXR4X3VubG9jaygmdW1hX210eCk7CisJcndfd3VubG9jaygmdW1h X3J3bG9jayk7CiAJcmV0dXJuICgwKTsKIH0KIApAQCAtMTU5NCw5ICsxNTk5LDkgQEAgem9uZV9j dG9yKHZvaWQgKm1lbSwgaW50IHNpemUsIHZvaWQgKnVkYXRhLCBpbnQgZmxhZ3MpCiAJCXpvbmUt PnV6X3JlbGVhc2UgPSBhcmctPnJlbGVhc2U7CiAJCXpvbmUtPnV6X2FyZyA9IGFyZy0+YXJnOwog CQl6b25lLT51el9sb2NrcHRyID0gJnpvbmUtPnV6X2xvY2s7Ci0JCW10eF9sb2NrKCZ1bWFfbXR4 KTsKKwkJcndfd2xvY2soJnVtYV9yd2xvY2spOwogCQlMSVNUX0lOU0VSVF9IRUFEKCZ1bWFfY2Fj aGV6b25lcywgem9uZSwgdXpfbGluayk7Ci0JCW10eF91bmxvY2soJnVtYV9tdHgpOworCQlyd193 dW5sb2NrKCZ1bWFfcndsb2NrKTsKIAkJZ290byBvdXQ7CiAJfQogCkBAIC0xNjEzLDcgKzE2MTgs NyBAQCB6b25lX2N0b3Iodm9pZCAqbWVtLCBpbnQgc2l6ZSwgdm9pZCAqdWRhdGEsIGludCBmbGFn cykKIAkJem9uZS0+dXpfZmluaSA9IGFyZy0+ZmluaTsKIAkJem9uZS0+dXpfbG9ja3B0ciA9ICZr ZWctPnVrX2xvY2s7CiAJCXpvbmUtPnV6X2ZsYWdzIHw9IFVNQV9aT05FX1NFQ09OREFSWTsKLQkJ bXR4X2xvY2soJnVtYV9tdHgpOworCQlyd193bG9jaygmdW1hX3J3bG9jayk7CiAJCVpPTkVfTE9D Syh6b25lKTsKIAkJTElTVF9GT1JFQUNIKHosICZrZWctPnVrX3pvbmVzLCB1el9saW5rKSB7CiAJ CQlpZiAoTElTVF9ORVhUKHosIHV6X2xpbmspID09IE5VTEwpIHsKQEAgLTE2MjIsNyArMTYyNyw3 IEBAIHpvbmVfY3Rvcih2b2lkICptZW0sIGludCBzaXplLCB2b2lkICp1ZGF0YSwgaW50IGZsYWdz KQogCQkJfQogCQl9CiAJCVpPTkVfVU5MT0NLKHpvbmUpOwotCQltdHhfdW5sb2NrKCZ1bWFfbXR4 KTsKKwkJcndfd3VubG9jaygmdW1hX3J3bG9jayk7CiAJfSBlbHNlIGlmIChrZWcgPT0gTlVMTCkg ewogCQlpZiAoKGtlZyA9IHVtYV9rY3JlYXRlKHpvbmUsIGFyZy0+c2l6ZSwgYXJnLT51bWluaXQs IGFyZy0+ZmluaSwKIAkJICAgIGFyZy0+YWxpZ24sIGFyZy0+ZmxhZ3MpKSA9PSBOVUxMKQpAQCAt MTcyMCw5ICsxNzI1LDkgQEAgem9uZV9kdG9yKHZvaWQgKmFyZywgaW50IHNpemUsIHZvaWQgKnVk YXRhKQogCWlmICghKHpvbmUtPnV6X2ZsYWdzICYgVU1BX1pGTEFHX0lOVEVSTkFMKSkKIAkJY2Fj aGVfZHJhaW4oem9uZSk7CiAKLQltdHhfbG9jaygmdW1hX210eCk7CisJcndfd2xvY2soJnVtYV9y d2xvY2spOwogCUxJU1RfUkVNT1ZFKHpvbmUsIHV6X2xpbmspOwotCW10eF91bmxvY2soJnVtYV9t dHgpOworCXJ3X3d1bmxvY2soJnVtYV9yd2xvY2spOwogCS8qCiAJICogWFhYIHRoZXJlIGFyZSBz b21lIHJhY2VzIGhlcmUgd2hlcmUKIAkgKiB0aGUgem9uZSBjYW4gYmUgZHJhaW5lZCBidXQgem9u ZSBsb2NrCkBAIC0xNzQ0LDkgKzE3NDksOSBAQCB6b25lX2R0b3Iodm9pZCAqYXJnLCBpbnQgc2l6 ZSwgdm9pZCAqdWRhdGEpCiAJICogV2Ugb25seSBkZXN0cm95IGtlZ3MgZnJvbSBub24gc2Vjb25k YXJ5IHpvbmVzLgogCSAqLwogCWlmIChrZWcgIT0gTlVMTCAmJiAoem9uZS0+dXpfZmxhZ3MgJiBV TUFfWk9ORV9TRUNPTkRBUlkpID09IDApICB7Ci0JCW10eF9sb2NrKCZ1bWFfbXR4KTsKKwkJcndf d2xvY2soJnVtYV9yd2xvY2spOwogCQlMSVNUX1JFTU9WRShrZWcsIHVrX2xpbmspOwotCQltdHhf dW5sb2NrKCZ1bWFfbXR4KTsKKwkJcndfd3VubG9jaygmdW1hX3J3bG9jayk7CiAJCXpvbmVfZnJl ZV9pdGVtKGtlZ3MsIGtlZywgTlVMTCwgU0tJUF9OT05FKTsKIAl9CiAJWk9ORV9MT0NLX0ZJTkko em9uZSk7CkBAIC0xNzY4LDEyICsxNzczLDEyIEBAIHpvbmVfZm9yZWFjaCh2b2lkICgqemZ1bmMp KHVtYV96b25lX3QpKQogCXVtYV9rZWdfdCBrZWc7CiAJdW1hX3pvbmVfdCB6b25lOwogCi0JbXR4 X2xvY2soJnVtYV9tdHgpOworCXJ3X3Jsb2NrKCZ1bWFfcndsb2NrKTsKIAlMSVNUX0ZPUkVBQ0go a2VnLCAmdW1hX2tlZ3MsIHVrX2xpbmspIHsKIAkJTElTVF9GT1JFQUNIKHpvbmUsICZrZWctPnVr X3pvbmVzLCB1el9saW5rKQogCQkJemZ1bmMoem9uZSk7CiAJfQotCW10eF91bmxvY2soJnVtYV9t dHgpOworCXJ3X3J1bmxvY2soJnVtYV9yd2xvY2spOwogfQogCiAvKiBQdWJsaWMgZnVuY3Rpb25z ICovCkBAIC0xNzg5LDcgKzE3OTQsNyBAQCB1bWFfc3RhcnR1cCh2b2lkICpib290bWVtLCBpbnQg Ym9vdF9wYWdlcykKICNpZmRlZiBVTUFfREVCVUcKIAlwcmludGYoIkNyZWF0aW5nIHVtYSBrZWcg aGVhZGVycyB6b25lIGFuZCBrZWcuXG4iKTsKICNlbmRpZgotCW10eF9pbml0KCZ1bWFfbXR4LCAi VU1BIGxvY2siLCBOVUxMLCBNVFhfREVGKTsKKwlyd19pbml0KCZ1bWFfcndsb2NrLCAiVU1BIGxv Y2siKTsKIAogCS8qICJtYW51YWxseSIgY3JlYXRlIHRoZSBpbml0aWFsIHpvbmUgKi8KIAltZW1z ZXQoJmFyZ3MsIDAsIHNpemVvZihhcmdzKSk7CkBAIC0zMzY0LDEyICszMzY5LDEyIEBAIHN5c2N0 bF92bV96b25lX2NvdW50KFNZU0NUTF9IQU5ETEVSX0FSR1MpCiAJaW50IGNvdW50OwogCiAJY291 bnQgPSAwOwotCW10eF9sb2NrKCZ1bWFfbXR4KTsKKwlyd19ybG9jaygmdW1hX3J3bG9jayk7CiAJ TElTVF9GT1JFQUNIKGt6LCAmdW1hX2tlZ3MsIHVrX2xpbmspIHsKIAkJTElTVF9GT1JFQUNIKHos ICZrei0+dWtfem9uZXMsIHV6X2xpbmspCiAJCQljb3VudCsrOwogCX0KLQltdHhfdW5sb2NrKCZ1 bWFfbXR4KTsKKwlyd19ydW5sb2NrKCZ1bWFfcndsb2NrKTsKIAlyZXR1cm4gKHN5c2N0bF9oYW5k bGVfaW50KG9pZHAsICZjb3VudCwgMCwgcmVxKSk7CiB9CiAKQEAgLTMzOTQsNyArMzM5OSw3IEBA IHN5c2N0bF92bV96b25lX3N0YXRzKFNZU0NUTF9IQU5ETEVSX0FSR1MpCiAJc2J1Zl9uZXdfZm9y X3N5c2N0bCgmc2J1ZiwgTlVMTCwgMTI4LCByZXEpOwogCiAJY291bnQgPSAwOwotCW10eF9sb2Nr KCZ1bWFfbXR4KTsKKwlyd19ybG9jaygmdW1hX3J3bG9jayk7CiAJTElTVF9GT1JFQUNIKGt6LCAm dW1hX2tlZ3MsIHVrX2xpbmspIHsKIAkJTElTVF9GT1JFQUNIKHosICZrei0+dWtfem9uZXMsIHV6 X2xpbmspCiAJCQljb3VudCsrOwpAQCAtMzQ3MCw3ICszNDc1LDcgQEAgc2tpcDoKIAkJCVpPTkVf VU5MT0NLKHopOwogCQl9CiAJfQotCW10eF91bmxvY2soJnVtYV9tdHgpOworCXJ3X3J1bmxvY2so JnVtYV9yd2xvY2spOwogCWVycm9yID0gc2J1Zl9maW5pc2goJnNidWYpOwogCXNidWZfZGVsZXRl KCZzYnVmKTsKIAlyZXR1cm4gKGVycm9yKTsKLS0gCjEuOC41LjQKCg== --047d7bdc0854d973ec050415ab2b-- From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 28 01:42:40 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 54410C83 for ; Sun, 28 Sep 2014 01:42:40 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 193DCD1F for ; Sun, 28 Sep 2014 01:42:39 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id 26A8720E7088F; Sun, 28 Sep 2014 01:42:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.multiplay.co.uk X-Spam-Level: ** X-Spam-Status: No, score=2.2 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX, FSL_HELO_NON_FQDN_1,RDNS_DYNAMIC,STOX_REPLY_TYPE autolearn=no version=3.3.1 Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 42E1820E7088B; Sun, 28 Sep 2014 01:42:30 +0000 (UTC) Message-ID: From: "Steven Hartland" To: "Bryan Venteicher" , References: Subject: Re: Change uma_mtx to rwlock Date: Sun, 28 Sep 2014 02:42:25 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 01:42:40 -0000 Out of interest does that include ZFS and its UMA zones, as we're currently investigating issues around this. Regards Steve ----- Original Message ----- From: "Bryan Venteicher" To: Sent: Sunday, September 28, 2014 1:59 AM Subject: Change uma_mtx to rwlock > Hi, > > I'd appreciate some comments attached patch that changes the uma_mtx to a > rwlock. > > At $JOB, we have machines with ~400GB RAM, with much of that being > allocated through UMA zones. We've observed that timeouts were sometimes > unexpectedly delayed by a half second or more. We tracked one of the > reasons for this down to when the page daemon was running, calling > uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while > zone_drain()'ing each zone. If uma_timeout() fires, it will block on the > uma_mtx when it tries to zone_timeout() each zone. > -------------------------------------------------------------------------------- > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 28 01:56:13 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 976841BD for ; Sun, 28 Sep 2014 01:56:13 +0000 (UTC) Received: from mail-ig0-f181.google.com (mail-ig0-f181.google.com [209.85.213.181]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5F15DE05 for ; Sun, 28 Sep 2014 01:56:13 +0000 (UTC) Received: by mail-ig0-f181.google.com with SMTP id h18so1549071igc.2 for ; Sat, 27 Sep 2014 18:56:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=7zJx+v75//Lz8DK3p8hbBa8Bk3TdSky9JhOlgQSrSfs=; b=UG4WnVhEpA99zBjxbXXiSs6zSoy+wCHgXJ9CEqaYCozAuO4tSi2aWEbmoHK1n4BUQc 27UIJpoMM2+EL/M7ae2D386Sov60c/6/bxiPsE4obzDr7ETDph32qSnL74EST7LmQW5s E2osw7DUS3SWld1GZACriWsK8U4BLtbPGGT5MdkHvC4GIp7MmXQs+vAz39//C5p8Iihf lt6caGO7R0YV7gJK5p+1LLRgB+L4VcDhfHbhvdobYF9aTveUzLTExAg9emjt8gsluNiI rzy/QI/c4t44QW/g/9x1FlP8yCetgOz3RbF0DlVa9k3+tBSWb/S1NUr4BNkA1y7iPDFn QlXg== X-Gm-Message-State: ALoCoQmKb1t0HjE+3UNRfI1ALmo6zn+Mbd/MmkSiC8KjUyauTm3ZVDyGvU0EtYlLCpgG0X8/xYWz X-Received: by 10.50.20.4 with SMTP id j4mr42755112ige.13.1411869372468; Sat, 27 Sep 2014 18:56:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.9.67 with HTTP; Sat, 27 Sep 2014 18:55:52 -0700 (PDT) X-Originating-IP: [67.198.113.68] In-Reply-To: References: From: Bryan Venteicher Date: Sat, 27 Sep 2014 20:55:52 -0500 Message-ID: Subject: Re: Change uma_mtx to rwlock To: Steven Hartland Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 01:56:13 -0000 On Sat, Sep 27, 2014 at 8:42 PM, Steven Hartland wrote: > Out of interest does that include ZFS and its UMA zones, as we're current= ly > investigating issues around this. > > =E2=80=8BYes, I believe this would include ZFS's zones too.=E2=80=8B > Regards > Steve > > ----- Original Message ----- From: "Bryan Venteicher" < > bryanv@daemoninthecloset.org> > To: > Sent: Sunday, September 28, 2014 1:59 AM > Subject: Change uma_mtx to rwlock > > > > Hi, >> >> I'd appreciate some comments attached patch that changes the uma_mtx to = a >> rwlock. >> >> At $JOB, we have machines with ~400GB RAM, with much of that being >> allocated through UMA zones. We've observed that timeouts were sometimes >> unexpectedly delayed by a half second or more. We tracked one of the >> reasons for this down to when the page daemon was running, calling >> uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while >> zone_drain()'ing each zone. If uma_timeout() fires, it will block on the >> uma_mtx when it tries to zone_timeout() each zone. >> >> > > ------------------------------------------------------------ > -------------------- > > > _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.or= g >> " >> > From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 28 02:30:24 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 518F96BA for ; Sun, 28 Sep 2014 02:30:24 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 1559010C for ; Sun, 28 Sep 2014 02:30:23 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id ECF9D20E7088F; Sun, 28 Sep 2014 02:30:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.multiplay.co.uk X-Spam-Level: *** X-Spam-Status: No, score=3.1 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX, FSL_HELO_NON_FQDN_1,RDNS_DYNAMIC,STOX_REPLY_TYPE,URIBL_BLACK autolearn=no version=3.3.1 Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 612FE20E7088B; Sun, 28 Sep 2014 02:30:21 +0000 (UTC) Message-ID: <0EAA931C1DCF4897A45D86C7CBDCA11C@multiplay.co.uk> From: "Steven Hartland" To: "Bryan Venteicher" References: Subject: Re: Change uma_mtx to rwlock Date: Sun, 28 Sep 2014 03:30:16 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 02:30:24 -0000 ----- Original Message ----- From: "Bryan Venteicher" > On Sat, Sep 27, 2014 at 8:42 PM, Steven Hartland > wrote: > >> Out of interest does that include ZFS and its UMA zones, as we're currently >> investigating issues around this. >> >> > Yes, I believe this would include ZFS's zones too It would but I was more curious as it if you had seen the delay specifically on the ZFS zone or if it was other zones which triggered the issue for you? Regards Steve From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 28 03:55:36 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5CBCF51C for ; Sun, 28 Sep 2014 03:55:36 +0000 (UTC) Received: from mail-ig0-f173.google.com (mail-ig0-f173.google.com [209.85.213.173]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 27F1DB3E for ; Sun, 28 Sep 2014 03:55:35 +0000 (UTC) Received: by mail-ig0-f173.google.com with SMTP id uq10so491469igb.6 for ; Sat, 27 Sep 2014 20:55:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=J72CyKVfY3Z4G/L31piNy4Q6ODQG790oE6CqB5/pPOY=; b=c75NgR9Af+tUdp0PTjAMg+brYqa6fuiopS9x45jTIgU0ziuCE/M4LLef0k8ciefK8T foK29ieYm2gHRoQN8UblxnuQSdFQUx9N+rkV8T1ss7MbECiHVBKF5HZuPoZR/4FEOqrT cff8LSkQhrE1jZvmeOcQ9f0i+qHJ9YXJkK6TvhYw31ZTMvHAz0cjC0KZi5FkanIMjEzN i4vKD+Zlp9VurC3XwuhLl8/grgEXv4QfxvZ2xRfmo2H7BHNElN6MVUt2kDIoFM9xjxGb tGz01PmHnXy5wgKpxR7FV6o4lByCIJItZWLHiMnRCtLxUdPtfUhhnD7WJmj5Pp0dU6gn dxjg== X-Gm-Message-State: ALoCoQmtzNx6Azjdfyj/QfxE0WDj+B1vgpP1Gi+GMn3yjKJYxM2Q5C74thF2NXbiFYiMPndAIL5x X-Received: by 10.42.180.5 with SMTP id bs5mr287002icb.70.1411874716370; Sat, 27 Sep 2014 20:25:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.9.67 with HTTP; Sat, 27 Sep 2014 20:24:56 -0700 (PDT) X-Originating-IP: [67.198.113.68] In-Reply-To: <0EAA931C1DCF4897A45D86C7CBDCA11C@multiplay.co.uk> References: <0EAA931C1DCF4897A45D86C7CBDCA11C@multiplay.co.uk> From: Bryan Venteicher Date: Sat, 27 Sep 2014 22:24:56 -0500 Message-ID: Subject: Re: Change uma_mtx to rwlock To: Steven Hartland Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 03:55:36 -0000 On Sat, Sep 27, 2014 at 9:30 PM, Steven Hartland wrote: > ----- Original Message ----- From: "Bryan Venteicher" < > bryanv@daemoninthecloset.org> > >> On Sat, Sep 27, 2014 at 8:42 PM, Steven Hartland > > >> wrote: >> >> Out of interest does that include ZFS and its UMA zones, as we're >>> currently >>> investigating issues around this. >>> >>> >>> Yes, I believe this would include ZFS's zones too >> > > It would but I was more curious as it if you had seen the delay > specifically on > the ZFS zone or if it was other zones which triggered the issue for you? > > =E2=80=8BWe are using an old version of FreeBSD/ZFS, so our ZFS allocations= go through malloc instead of directly to UMA. For us, it was the cumulative effect of the number of UMA zones (buckets really) =E2=80=8Bthat lead to long hold times of the UMA mutex. ZFS related allocations were a significant part of that, but not typically the largest. > Regards > Steve > From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 28 14:04:20 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8D88D24C for ; Sun, 28 Sep 2014 14:04:20 +0000 (UTC) Received: from astart2.astart.com (108-248-95-193.lightspeed.sndgca.sbcglobal.net [108.248.95.193]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5805965E for ; Sun, 28 Sep 2014 14:04:19 +0000 (UTC) Received: from laptop_93.private (localhost [127.0.0.1]) by astart2.astart.com (8.14.4/8.14.4) with ESMTP id s8SE4CYJ067023 for ; Sun, 28 Sep 2014 07:04:12 -0700 (PDT) (envelope-from papowell@astart.com) Message-ID: <5428155C.5000404@astart.com> Date: Sun, 28 Sep 2014 07:04:12 -0700 From: Patrick Powell Reply-To: papowell@astart.com Organization: Astart Technologies User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Re: Inproper ada# assignment in 10-BETA2 References: <1411851225.9364.YahooMailNeo@web180902.mail.ne1.yahoo.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 14:04:20 -0000 On 09/27/14 15:15, Mehmet Erol Sanliturk wrote: > On Sat, Sep 27, 2014 at 1:53 PM, Jin Guojun wrote: > >> Installed 10-BETA2 on SATA port 4 (ad8) and then added another SATA port 3 >> (ad6), the system has not correctly enumerate the ada # for the boot device. >> As original boot (without the second SATA drive), the ad8 is enumerated as >> ada0 -- the boot drive: >> >> Sep 24 22:51:30 R10-B2 kernel: ada0 at ahcich2 bus 0 scbus2 target 0 lun 0 >> Sep 24 22:51:30 R10-B2 kernel: ada0: >> ATA-8 SATA 2.x device >> ... >> Sep 24 22:51:30 R10-B2 kernel: ada0: Previously was known as ad8 >> >> >> However, after added another SATA drive (ad6), this new drive is assigned >> to ada0, but ad8 has changed to ada1. This is incorrect dynamic device >> assignment. FreeBSD has kept using fixed disk ID assignment due to the same >> problem introduced in around 4-R (or may be slightly later), and after a >> simple debate, a decision was made to use fixed drive ID to avoid such >> hassle. >> >> If now we want to use dynamic enumeration for drive ID# assignment, this >> has to be done correctly -- boot drive MUST assigned to 0 or whatever the # >> as installation assigned to; otherwise, adding a new drive will cause >> system not bootable, or make other existing drive not mountable due to >> enumeration # changes. >> >> Has this been reported as a known problem for 10-R, or shall I open a bug >> to track? >> >> -Jin >> > > > > One point should be checked : > > On mainboards SATA ports are numbered from 0 or 1 to upward . > BIOS always uses first SATA drive for boot . This is NOT related to the > operating system . > Therefore , it is necessary to check port numbers of existing drives and > the bootable SATA drive should be connected > to the smallest numbered SATA port among existent drives . > > > For example , assume bootable drive is connected to SATA port 2 . > New drive should be connected to a higher numbered SATA port . > If there are only two SATA ports , then bootable drive should be connected > to the first SATA port . > > If mainboard BIOS allows definition of any SATA port for boot , and > bootable SATA port and drive is specified in there , again it may boot from > that drive . Up to now , I did not see any BIOS which supplies such an > ordering among SATA ports . Please check your BIOS for such a feature . If > it is present you may use it , otherwise it is necessary to reconnect SATA > cables . > > > Thank you very much . > > Mehmet Erol Sanliturk > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > Try the Dell Precision M6500 Laptop which has three SATA ports (two internal, one external) and you can via the BIOS select the boot drive. It appears that for FreeBSD 9.3 the drives are all enumerated the same, independent of which is the current boot drive. Interesting... From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 28 16:58:55 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3E8B33AA; Sun, 28 Sep 2014 16:58:55 +0000 (UTC) Received: from mail-ob0-x236.google.com (mail-ob0-x236.google.com [IPv6:2607:f8b0:4003:c01::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EE8EDBD6; Sun, 28 Sep 2014 16:58:54 +0000 (UTC) Received: by mail-ob0-f182.google.com with SMTP id wo20so12040901obc.41 for ; Sun, 28 Sep 2014 09:58:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=kcrj/Xy9yspD6DDojEZ2azOdTrCs/izFtpixSBvH/kI=; b=c/abJIpUpLQrxBNe69qLgCkhuR/2jSLb9CD2vSRPaaNDryN+otRq3qVChKrZRzHATS qn1WT9FUmowcOlHaWXZ23zTzOVYRhbw+XFRaEf6y1UgTDjb222NEdSFfOknXWytFYMLp XaY53kL2Q0+rrCLwsk7atfOiL+iP3aCK/e0rEG74GWJGL3LY99sEQS4+bhEtUOQ1PKHa Rjd5Y0zzM+I8jeSPbtOF6th6o9AXWGNv/RVMgN6QgHSkgrI+Y0tb3HNq1USgfO3CDcbL 7YWoswNqTJqcmSiiXROaiPFrGvt46Q8sYyEulqjobiqUv7kMApvYusRs93TpUNXa0cX2 ALoQ== MIME-Version: 1.0 X-Received: by 10.182.191.39 with SMTP id gv7mr34383538obc.14.1411923534203; Sun, 28 Sep 2014 09:58:54 -0700 (PDT) Received: by 10.202.188.84 with HTTP; Sun, 28 Sep 2014 09:58:54 -0700 (PDT) In-Reply-To: References: <5408938E.5020005@yandex.ru> Date: Sun, 28 Sep 2014 09:58:54 -0700 Message-ID: Subject: Re: IOAT driver for FreeBSD From: Jim Harris To: Vijay Singh Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-hackers@freebsd.org" , "Andrey V. Elsukov" , hiren panchasara X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 16:58:55 -0000 On Fri, Sep 26, 2014 at 10:10 AM, Vijay Singh wrote: > Jim, since the device IDs were changed, were there any changes to the > descriptors for the DMA part? > > Hi Vijay, No changes. The descriptor formats are the same. -Jim > =vijay > > On Thu, Sep 25, 2014 at 4:41 PM, Jim Harris wrote: > >> >> >> On Tue, Sep 23, 2014 at 5:38 PM, hiren panchasara >> wrote: >> >>> + Jim >>> >>> On Thu, Sep 4, 2014 at 9:30 AM, Andrey V. Elsukov >>> wrote: >>> > On 03.09.2014 20:59, Vijay Singh wrote: >>> >> Hi All, I found some discussion in the past about this. Is there a >>> version >>> >> of such a driver that I can test, and hopefully help get committed? >>> > >>> > There was some work in >>> > >>> http://svnweb.freebsd.org/base/user/jimharris/ioat/sys/dev/ioat/ >>> >>> Hi Jim, >>> >>> Whats the status of this user branch? >>> >>> cheers, >>> Hiren >>> >> >> This user branch is a couple of years old, but should not be too >> difficult to bring forward to HEAD. It only includes E5 v1 (Sandy Bridge >> Xeon) device IDs so would need to be updated to include E5 v2 (Ivy Bridge) >> and v3 (Haswell) device IDs. >> >> Note this driver only does DMA operations currently and is not plumbed >> for other opcodes (XOR/P+Q, CRC, etc.) But the general framework is there >> to add code for the other opcodes. >> >> E5 v2 and v3 device IDs are pasted below. >> >> -Jim >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB0 0x0e20 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB1 0x0e21 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB2 0x0e22 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB3 0x0e23 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB4 0x0e24 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB5 0x0e25 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB6 0x0e26 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB7 0x0e27 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB8 0x0e2e >> >> #define PCI_DEVICE_ID_INTEL_IOAT_IVB9 0x0e2f >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW0 0x2f20 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW1 0x2f21 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW2 0x2f22 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW3 0x2f23 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW4 0x2f24 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW5 0x2f25 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW6 0x2f26 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW7 0x2f27 >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW8 0x2f2e >> >> #define PCI_DEVICE_ID_INTEL_IOAT_HSW9 0x2f2f >> >> > From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 28 18:26:54 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EE8B31E1 for ; Sun, 28 Sep 2014 18:26:53 +0000 (UTC) Received: from nm27-vm5.access.bullet.mail.bf1.yahoo.com (nm27-vm5.access.bullet.mail.bf1.yahoo.com [216.109.115.228]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9B72A63D for ; Sun, 28 Sep 2014 18:26:53 +0000 (UTC) Received: from [66.196.81.163] by nm27.access.bullet.mail.bf1.yahoo.com with NNFMP; 28 Sep 2014 18:24:23 -0000 Received: from [66.196.81.140] by tm9.access.bullet.mail.bf1.yahoo.com with NNFMP; 28 Sep 2014 18:24:23 -0000 Received: from [127.0.0.1] by omp1016.access.mail.bf1.yahoo.com with NNFMP; 28 Sep 2014 18:24:23 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 641230.67594.bm@omp1016.access.mail.bf1.yahoo.com Received: (qmail 94540 invoked by uid 60001); 28 Sep 2014 18:24:23 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sbcglobal.net; s=s1024; t=1411928663; bh=q6rdIwwkZdgvEJSJFvrJ6+c2BjabNPeC37i2lrmLDaE=; h=References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=MiDOH3Ky2lbiqCctYGARtf2oNdGuAkttDkRp3JTa7tWCexjHig3R0gE+BLH4UsGQ3Mdqs9LH6+tyLIgJVOGwZgwwsBgvuOGy26pdy8amPyamayPVPtTL3O5Vxf1f5VQPvpTLET1znSoXSuXyqjY40U4Ql1Io/DejIPigYVaUGvM= X-YMail-OSG: QkPYOewVM1nKrIQ4eIb3hMiSYzqrf9k0ROXFlkCvsTZT9lP lKYhko2TuTDIhGwoq3FOnsOdHV8ZVg7boX2nRi7eEerUMjM9bIkOiSZYff6O lVNQAu2MALUTzoN8gbMR529KK9oyfU9lWnFLkxkxrIzuusEIv7X5N8Xc5Vmk iSs1ku9MJORECCE2tC1yDOLLYA9sVlU1CvrvOKUtp.3uzpTmBccug6I8g17j Isc.metmw9yPhHcA5Ks8jXIO1rB3903UmRMl.pQz0h_XwBwvtxYsJHDuxJ5G awznfZEexQfx3Bmoa3CxM_eTmnNq6fAotqJYLimoI75490ZDomS6v_cM5cyn 5E3FZ_qZZmedJdIdAl67ZA26t3VQkZbU1CZWDseJdXmE3RI8CJBgUTdz821B 58PcO5GJa0sUqRGl8njR2IvK5ZkaJuf_jnoHpXS3cOGIWygtO_J32woRDXdN ogrfD_5RS5Ibz.sxupQDcT1uN1HnQq0blGk2nlEelUXUOeZjaftviMhy28zw Tqa3MPPxe1XQV5TzlhTmYX3n.GU.OghzchOFEZuCHLJEn8YWKthOxmOR1N7L 8kJvxhQbb9A-- Received: from [162.239.0.170] by web180901.mail.ne1.yahoo.com via HTTP; Sun, 28 Sep 2014 11:24:22 PDT X-Rocket-MIMEInfo: 002.001, Tm8sIEJJT1MgYm9vdCBpcyBjb21wbGV0ZWx5IGlycmVsZXZhbnQgdG8gdGhpcyBwcm9ibGVtLiAKCkl0IGxvb2tzIGxpa2UgSSBoYXZlIHRvIHJlLWFkZHJlc3MgdGhpcyBpc3N1ZSBkZWJhdGVkIDE1LTIwIHllYXJzIGFnby4KCkZpcnN0IG9mIGFsbCwgbGV0J3MgY2xlYXIgQklPUyBxdWVzdGlvbiBzbyB5b3Ugd2lsbCBub3QgYXJndWUgaXQgYWdhaW4uCkJJT1MgYm9vdCBzZXF1ZW5jZSBjYW4gYmUgc3BlY2lmaWVkIGluIGFueSBvcmRlciwgYW5kIGl0IGJvb3RzIGRlc2lyZWQgIGRyaXZlIGNvcnJlY3RseS4BMAEBAQE- X-Mailer: YahooMailWebService/0.8.203.696 References: <1411851225.9364.YahooMailNeo@web180902.mail.ne1.yahoo.com> Message-ID: <1411928662.22540.YahooMailNeo@web180901.mail.ne1.yahoo.com> Date: Sun, 28 Sep 2014 11:24:22 -0700 From: Jin Guojun Reply-To: Jin Guojun Subject: Re: Inproper ada# assignment in 10-BETA2 To: Mehmet Erol Sanliturk In-Reply-To: MIME-Version: 1.0 X-Mailman-Approved-At: Sun, 28 Sep 2014 18:53:10 +0000 Content-Type: text/plain; charset=us-ascii X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "hackers@freebsd.org" , questions freebsd X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 18:26:54 -0000 No, BIOS boot is completely irrelevant to this problem. It looks like I have to re-address this issue debated 15-20 years ago. First of all, let's clear BIOS question so you will not argue it again. BIOS boot sequence can be specified in any order, and it boots desired drive correctly. If not, 10.1-BETA2 will not be loaded, and we will not see this problem, period. After 10.1-BETA2 is loaded and booted, it enumerates drive(s) dynamically to assign ID to ada or da node. Kernel clearly knows which drive is ad1, 2, 3, ..., but it does not assigned proper ID to existing drive(s) for ada and da nodes. That is, ad note IDs are still correct, but ada and da node IDs are wrong. The dynamic enumeration is likely used for moving a boot drive from on system to another or from one bus to another without manually modifying fstab entries. That is, this mechanism wants to ensure no matter where this drive is plugged in, Boot drive should be always enumerated as ID 0 or the ID installation assigned to. How to ensure this? For boot drive, this is relatively easy -- the boot drive is always this first one in general, so this drive should always enumerated as ada0 or da0. If installation has assigned drive ID to not 0 somehow, then generic enumeration apply. Generic enumeration is drive serial number (S#) based enumeration mechanism, which has been used for at least two decades. For example, if two drives installed and their S# are AAAA and XXXXX (boot drive), regardless what SATA port they resides at -- AAAA at ad9 and XXXXX at ad5, We knew installation will likely name drive XXXXX as ada0 and AAAA as ada1. In fixed fashion, drive XXXXX is ad5 and AAAA is ad9, when a new drive is inserted as ad0, we knew drive XXXXX will be still ad5 and boot should not fail. But in current 10.1-BETA2, the new drive is likely will be ada0, drive XXXXX will be ada1, and AAAA will be ada2, then boot will fail. In case if new drive is inserted as ad8, drive XXXXX will remain as ada0 but AAAA will be ada21. Even though boot will succeed in this case, but mounting drive AAAA will fail. The S#-based enumeration will record the S# for corresponding device ID in a dynamic boot configuration file, which is used in boot time to determine what device ID should be assigned to each drive. After existing drive ID has been enumerated, any new drive(s) will be given a unused ID sequentially. This ensures that existing drive(s) will always get device ID originally assigned to, so the disk mounting operation will never fail no matter where a disk drive (has FreeBSD already installed) is plugged in. Hopefully, this explains what is correct the dynamic enumeration operation. In old time, we have a mechanisms to alter the dynamic enumeration to fixed one, but I do not know if this mechanism is still in 10.x-R. Because it looks like that current developers have no knowledge about this concept, I am going to open a bug to track this problem again soon, unless I will hear if we have a work around for this problem. On Saturday, September 27, 2014 3:15 PM, Mehmet Erol Sanliturk wrote: On Sat, Sep 27, 2014 at 1:53 PM, Jin Guojun wrote: Installed 10-BETA2 on SATA port 4 (ad8) and then added another SATA port 3 (ad6), the system has not correctly enumerate the ada # for the boot device. >As original boot (without the second SATA drive), the ad8 is enumerated as ada0 -- the boot drive: > >Sep 24 22:51:30 R10-B2 kernel: ada0 at ahcich2 bus 0 scbus2 target 0 lun 0 >Sep 24 22:51:30 R10-B2 kernel: ada0: ATA-8 SATA 2.x device >... >Sep 24 22:51:30 R10-B2 kernel: ada0: Previously was known as ad8 > > >However, after added another SATA drive (ad6), this new drive is assigned to ada0, but ad8 has changed to ada1. This is incorrect dynamic device assignment. FreeBSD has kept using fixed disk ID assignment due to the same problem introduced in around 4-R (or may be slightly later), and after a simple debate, a decision was made to use fixed drive ID to avoid such hassle. > >If now we want to use dynamic enumeration for drive ID# assignment, this has to be done correctly -- boot drive MUST assigned to 0 or whatever the # as installation assigned to; otherwise, adding a new drive will cause system not bootable, or make other existing drive not mountable due to enumeration # changes. > >Has this been reported as a known problem for 10-R, or shall I open a bug to track? > >-Jin > One point should be checked : On mainboards SATA ports are numbered from 0 or 1 to upward . BIOS always uses first SATA drive for boot . This is NOT related to the operating system . Therefore , it is necessary to check port numbers of existing drives and the bootable SATA drive should be connected to the smallest numbered SATA port among existent drives . For example , assume bootable drive is connected to SATA port 2 . New drive should be connected to a higher numbered SATA port . If there are only two SATA ports , then bootable drive should be connected to the first SATA port . If mainboard BIOS allows definition of any SATA port for boot , and bootable SATA port and drive is specified in there , again it may boot from that drive . Up to now , I did not see any BIOS which supplies such an ordering among SATA ports . Please check your BIOS for such a feature . If it is present you may use it , otherwise it is necessary to reconnect SATA cables . Thank you very much . Mehmet Erol Sanliturk From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 28 21:05:35 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 49B97A0B for ; Sun, 28 Sep 2014 21:05:35 +0000 (UTC) Received: from mx1.scaleengine.net (beauharnois2.bhs1.scaleengine.net [142.4.218.15]) by mx1.freebsd.org (Postfix) with ESMTP id 244A780F for ; Sun, 28 Sep 2014 21:05:34 +0000 (UTC) Received: from [172.16.0.55] (unknown [92.247.20.226]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id 1DBB3545ED for ; Sun, 28 Sep 2014 21:05:26 +0000 (UTC) Message-ID: <54287814.9000207@freebsd.org> Date: Sun, 28 Sep 2014 17:05:24 -0400 From: Allan Jude User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Re: Inproper ada# assignment in 10-BETA2 References: <1411851225.9364.YahooMailNeo@web180902.mail.ne1.yahoo.com> <1411928662.22540.YahooMailNeo@web180901.mail.ne1.yahoo.com> In-Reply-To: <1411928662.22540.YahooMailNeo@web180901.mail.ne1.yahoo.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 21:05:35 -0000 On 09/28/2014 14:24, Jin Guojun wrote: > No, BIOS boot is completely irrelevant to this problem. > > It looks like I have to re-address this issue debated 15-20 years ago. > > First of all, let's clear BIOS question so you will not argue it again. > BIOS boot sequence can be specified in any order, and it boots desired drive correctly. > > If not, 10.1-BETA2 will not be loaded, and we will not see this problem, period. > > After 10.1-BETA2 is loaded and booted, it enumerates drive(s) dynamically to assign ID to ada or da node. > > Kernel clearly knows which drive is ad1, 2, 3, ..., but it does not assigned proper ID to existing drive(s) for ada and da nodes. > > That is, ad note IDs are still correct, but ada and da node IDs are wrong. > > The dynamic enumeration is likely used for moving a boot drive from on system to another > or from one bus to another without manually modifying fstab entries. > That is, this mechanism wants to ensure no matter where this drive is plugged in, > > Boot drive should be always enumerated as ID 0 or the ID installation assigned to. > > How to ensure this? For boot drive, this is relatively easy -- the boot drive is always this first one in general, > > so this drive should always enumerated as ada0 or da0. > If installation has assigned drive ID to not 0 somehow, then generic enumeration apply. > Generic enumeration is drive serial number (S#) based enumeration mechanism, which has been used for at least two decades. > For example, if two drives installed and their S# are AAAA and XXXXX (boot drive), > regardless what SATA port they resides at -- AAAA at ad9 and XXXXX at ad5, > We knew installation will likely name drive XXXXX as ada0 and AAAA as ada1. > In fixed fashion, drive XXXXX is ad5 and AAAA is ad9, when a new drive is inserted as ad0, > > we knew drive XXXXX will be still ad5 and boot should not fail. > But in current 10.1-BETA2, the new drive is likely will be ada0, drive XXXXX will be ada1, and AAAA will be ada2, then boot will fail. > In case if new drive is inserted as ad8, drive XXXXX will remain as ada0 but AAAA will be ada21. > Even though boot will succeed in this case, but mounting drive AAAA will fail. > > > The S#-based enumeration will record the S# for corresponding device ID in a dynamic boot configuration file, which is used in boot time to determine what device ID should be assigned to each drive. > After existing drive ID has been enumerated, any new drive(s) will be given a unused ID sequentially. > This ensures that existing drive(s) will always get device ID originally assigned to, so the disk mounting operation will never fail no matter where a disk drive (has FreeBSD already installed) is plugged in. > > > Hopefully, this explains what is correct the dynamic enumeration operation. > In old time, we have a mechanisms to alter the dynamic enumeration to fixed one, but I do not know if this mechanism is still in 10.x-R. > > Because it looks like that current developers have no knowledge about this concept, > I am going to open a bug to track this problem again soon, unless I will hear if we have a work around for this problem. > > > > On Saturday, September 27, 2014 3:15 PM, Mehmet Erol Sanliturk wrote: > > > > > > > > On Sat, Sep 27, 2014 at 1:53 PM, Jin Guojun wrote: > > Installed 10-BETA2 on SATA port 4 (ad8) and then added another SATA port 3 (ad6), the system has not correctly enumerate the ada # for the boot device. >> As original boot (without the second SATA drive), the ad8 is enumerated as ada0 -- the boot drive: >> >> Sep 24 22:51:30 R10-B2 kernel: ada0 at ahcich2 bus 0 scbus2 target 0 lun 0 >> Sep 24 22:51:30 R10-B2 kernel: ada0: ATA-8 SATA 2.x device >> ... >> Sep 24 22:51:30 R10-B2 kernel: ada0: Previously was known as ad8 >> >> >> However, after added another SATA drive (ad6), this new drive is assigned to ada0, but ad8 has changed to ada1. This is incorrect dynamic device assignment. FreeBSD has kept using fixed disk ID assignment due to the same problem introduced in around 4-R (or may be slightly later), and after a simple debate, a decision was made to use fixed drive ID to avoid such hassle. >> >> If now we want to use dynamic enumeration for drive ID# assignment, this has to be done correctly -- boot drive MUST assigned to 0 or whatever the # as installation assigned to; otherwise, adding a new drive will cause system not bootable, or make other existing drive not mountable due to enumeration # changes. >> >> Has this been reported as a known problem for 10-R, or shall I open a bug to track? >> >> -Jin >> > > > > > One point should be checked : > > > On mainboards SATA ports are numbered from 0 or 1 to upward . > > BIOS always uses first SATA drive for boot . This is NOT related to the operating system . > > Therefore , it is necessary to check port numbers of existing drives and the bootable SATA drive should be connected > > to the smallest numbered SATA port among existent drives . > > > > For example , assume bootable drive is connected to SATA port 2 . > > New drive should be connected to a higher numbered SATA port . > > If there are only two SATA ports , then bootable drive should be connected to the first SATA port . > > > If mainboard BIOS allows definition of any SATA port for boot , and bootable SATA port and drive is specified in there , again it may boot from that drive . Up to now , I did not see any BIOS which supplies such an ordering among SATA ports . Please check your BIOS for such a feature . If it is present you may use it , otherwise it is necessary to reconnect SATA cables . > > > > Thank you very much . > > > Mehmet Erol Sanliturk > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > The Correct solution is probably to use the diskid (/dev/diskid/) or a label (gpt label, ufs label, glabel) so the device name doesn't matter so much. My understanding is that the new 'ada' device names, the devices are named linearly. -- Allan Jude From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 29 15:27:55 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 10684EB8; Mon, 29 Sep 2014 15:27:55 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DE899AA3; Mon, 29 Sep 2014 15:27:54 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 86F91B921; Mon, 29 Sep 2014 11:27:53 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: Change uma_mtx to rwlock Date: Mon, 29 Sep 2014 11:27:16 -0400 Message-ID: <1458140.gGPpU3NGiG@ralph.baldwin.cx> User-Agent: KMail/4.12.5 (FreeBSD/10.1-BETA2; KDE/4.12.5; amd64; ; ) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 29 Sep 2014 11:27:53 -0400 (EDT) Cc: jeff@freebsd.org, Bryan Venteicher X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Sep 2014 15:27:55 -0000 On Saturday, September 27, 2014 07:59:47 PM Bryan Venteicher wrote: > Hi, > > I'd appreciate some comments attached patch that changes the uma_mtx to a > rwlock. > > At $JOB, we have machines with ~400GB RAM, with much of that being > allocated through UMA zones. We've observed that timeouts were sometimes > unexpectedly delayed by a half second or more. We tracked one of the > reasons for this down to when the page daemon was running, calling > uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while > zone_drain()'ing each zone. If uma_timeout() fires, it will block on the > uma_mtx when it tries to zone_timeout() each zone. The only nit I see is in zone_drain_wait(). It would be nice to not need the hack of checking for a read or write lock and just require the one it actually needs depending on the callers. However, checking the code in HEAD, this appears to just be broken. Specifically, zone_drain_wait() is called in two places: void zone_drain(uma_zone_t zone) { zone_drain_wait(zone, M_NOWAIT); } ... static void zone_dtor(void *arg, int size, void *udata) { ... mtx_lock(&uma_mtx); LIST_REMOVE(zone, uz_link); mtx_unlock(&uma_mtx); /* * XXX there are some races here where * the zone can be drained but zone lock * released and then refilled before we * remove it... we dont care for now */ zone_drain_wait(zone, M_WAITOK); ... } Neither one calls it with the uma_mtx locked! This appears to have been broken since that function was introduced in r187681. I think it might be best to first remove the unlock/lock of uma_mtx from zone_drain_wait() (so it can be MFC'd). That then simplifies that one part of your patch (which I think is otherwise fine). -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 29 16:02:46 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BE03BD8E for ; Mon, 29 Sep 2014 16:02:46 +0000 (UTC) Received: from mail-ie0-f175.google.com (mail-ie0-f175.google.com [209.85.223.175]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 89DE9E85 for ; Mon, 29 Sep 2014 16:02:46 +0000 (UTC) Received: by mail-ie0-f175.google.com with SMTP id y20so5245960ier.6 for ; Mon, 29 Sep 2014 09:02:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=yvjtphGDvZfVVOsmICWNIaPe3hXvF3kMutmC97jup6A=; b=dOgTcHDJVhWh+Gm459tDUhA93CixaMuISHaveaTcPilCQFZn1dL0vCtCAGrPg6yz3c sqHYdZKTYM0mdqbf0w1PrJ5HPB+vgJytzsFVxNDRRWwALkyohZ+RQhlB/MihDqTpiOGm HakS0B/cUckVVouE8UDFfITYzKfYjq+oCF+H6LD6ngcM2L9/FRXRfhksG8C3bP63P9Qf pGV6IHG8g6uCEx1ocp0d06VlW7sKXtK/gPMjInux4QjdSmNwjIuVjt9x7vpFdOKja6QZ qCdX834H1sCXQe+n/ucfaryyck9vjdbKT8A3+CThvPI3wz6H9c3Pf7hyE+Cn/K64El5f z5wA== X-Gm-Message-State: ALoCoQlho7pTdnd49PbMcKmG0cA5hrBvx108aToyl2cjvBtGTjwyVlSw+nTAtBfmmGrxdEw6euMU X-Received: by 10.50.109.228 with SMTP id hv4mr36301701igb.13.1412006560276; Mon, 29 Sep 2014 09:02:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.9.67 with HTTP; Mon, 29 Sep 2014 09:02:20 -0700 (PDT) X-Originating-IP: [216.240.30.23] In-Reply-To: <1458140.gGPpU3NGiG@ralph.baldwin.cx> References: <1458140.gGPpU3NGiG@ralph.baldwin.cx> From: Bryan Venteicher Date: Mon, 29 Sep 2014 11:02:20 -0500 Message-ID: Subject: Re: Change uma_mtx to rwlock To: John Baldwin Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-hackers@freebsd.org" , jeff@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Sep 2014 16:02:46 -0000 On Mon, Sep 29, 2014 at 10:27 AM, John Baldwin wrote: > On Saturday, September 27, 2014 07:59:47 PM Bryan Venteicher wrote: > > Hi, > > > > I'd appreciate some comments attached patch that changes the uma_mtx to= a > > rwlock. > > > > At $JOB, we have machines with ~400GB RAM, with much of that being > > allocated through UMA zones. We've observed that timeouts were sometime= s > > unexpectedly delayed by a half second or more. We tracked one of the > > reasons for this down to when the page daemon was running, calling > > uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while > > zone_drain()'ing each zone. If uma_timeout() fires, it will block on th= e > > uma_mtx when it tries to zone_timeout() each zone. > > The only nit I see is in zone_drain_wait(). It would be nice to not need > the > hack of checking for a read or write lock and just require the one it > actually > needs depending on the callers. > However, checking the code in HEAD, this appears to just be broken. > Specifically, zone_drain_wait() is called in two places: > > void > zone_drain(uma_zone_t zone) > { > > zone_drain_wait(zone, M_NOWAIT); > } > > ... > > > static void > zone_dtor(void *arg, int size, void *udata) > { > ... > mtx_lock(&uma_mtx); > LIST_REMOVE(zone, uz_link); > mtx_unlock(&uma_mtx); > /* > * XXX there are some races here where > * the zone can be drained but zone lock > * released and then refilled before we > * remove it... we dont care for now > */ > zone_drain_wait(zone, M_WAITOK); > ... > } > > Neither one calls it with the uma_mtx locked! This appears to have been > broken since that function was introduced in r187681. > > =E2=80=8BIndeed. I had noticed and mentioned that when I sent this patch to= jeff@ a few months ago: When zone_dtor() calls zone_drain_wait(), should it hold the uma_{mtx= , rwlock}? Can the zone not be in the DRAINING state at this point? Similarly, does the while draining loop in zone_drain_wait() then take the uma_mtx and the zone lock out of order after the msleep().=E2=80=8B =E2=80=8BBut I was just trying to clear out my queue a bit, and hadn't look= ed at the HEAD UMA in awhile, so I was going to double check that later. I think it might be best to first remove the unlock/lock of uma_mtx from > zone_drain_wait() (so it can be MFC'd). That then simplifies that one > part of > your patch (which I think is otherwise fine). > > =E2=80=8BI'll try to get a review started in Phabric =E2=80=8Bsoon. > -- > John Baldwin > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " > From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 29 21:53:48 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D20EDEFC; Mon, 29 Sep 2014 21:53:48 +0000 (UTC) Received: from dmz-mailsec-scanner-4.mit.edu (dmz-mailsec-scanner-4.mit.edu [18.9.25.15]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4797AD19; Mon, 29 Sep 2014 21:53:47 +0000 (UTC) X-AuditID: 1209190f-f79aa6d000005b45-d8-5429d3b72ffe Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP id B4.C9.23365.7B3D9245; Mon, 29 Sep 2014 17:48:39 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id s8TLmcxG002935; Mon, 29 Sep 2014 17:48:39 -0400 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s8TLmaVo031029 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 29 Sep 2014 17:48:38 -0400 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id s8TLmarR012072; Mon, 29 Sep 2014 17:48:36 -0400 (EDT) Date: Mon, 29 Sep 2014 17:48:36 -0400 (EDT) From: Benjamin Kaduk X-X-Sender: kaduk@multics.mit.edu To: FreeBSD Current , "freebsd-hackers@freebsd.org" Subject: Re: Call for FreeBSD 2014Q3 (July-September) Status Reports In-Reply-To: Message-ID: References: User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrJIsWRmVeSWpSXmKPExsUixCmqrbv9smaIwYcL6ha7rp1mt5jz5gOT xfbN/xgdmD1mfJrPEsAYxWWTkpqTWZZapG+XwJWxteUZc8EMnorn21+wNTD+5Oxi5OSQEDCR ePz5GDuELSZx4d56ti5GLg4hgdlMEh82HGWHcDYySuy9socZpEpI4BCTRO8nZYhEA6PEuo09 jCAJFgFtiaUXP4KNYhNQk3i8t5kVYqyixOZTk8CaRQTKJb42ngGrFxZwkbj2vZcFxOYUCJRY t2MqWJxXwFFibkcnG8SyAImnUyeCzREV0JFYvX8KC0SNoMTJmU/AbGYBLYnl07exTGAUnIUk NQtJagEj0ypG2ZTcKt3cxMyc4tRk3eLkxLy81CJdE73czBK91JTSTYzgUJXk38H47aDSIUYB DkYlHl6OFRohQqyJZcWVuYcYJTmYlER5353QDBHiS8pPqcxILM6ILyrNSS0+xCjBwawkwiu3 AyjHm5JYWZValA+TkuZgURLn3fSDL0RIID2xJDU7NbUgtQgmK8PBoSTB++ESUKNgUWp6akVa Zk4JQpqJgxNkOA/Q8MUgNbzFBYm5xZnpEPlTjLoc6zq/9TMJseTl56VKifM+uwhUJABSlFGa BzcHlmJeMYoDvSXMuxtkFA8wPcFNegW0hAloSdoGdZAlJYkIKakGRv6J2dfuyW3sUJXx1Et+ GtN3ZDrTL7OTh7dE3DCdOmVRv8KJEwcmrtfy3MOffdD+y36eaXsETm9oE74Y7Oh0wL4+JF36 9ja5wmUHo/fMjr4iNV9ki/n9yhdKM7qyAtedMJL9FhX3wO+TSGbxTatAsYKpG1R7TtzgneGf sc7f/tHfWNHNJ/wZNyuxFGckGmoxFxUnAgDGX/ZLDAMAAA== X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Sep 2014 21:53:49 -0000 Reminder: the deadline for 2014Q3 status reports is just over a week away! Thanks, Ben (on behalf of monthly@) On Tue, 2 Sep 2014, Ed Maste wrote: > Dear FreeBSD Community, > > The deadline for the next FreeBSD Quarterly Status update is October 7, > for work done in July through September. > > Status report submissions do not have to be very long. They may be > about anything happening in the FreeBSD project and community, and > provide a great way to inform FreeBSD users and developers about what > you're working on. Submission of reports is not restricted to > committers. Anyone doing anything interesting and FreeBSD-related > can -- and should -- write one! > > The preferred and easiest submission method is to use the XML > generator [1] with the results emailed to the status report team at > monthly@freebsd.org . There is also an XML template [2] which can be > filled out manually and attached if preferred. For the expected > content and style, please study our guidelines on how to write a good > status report [3]. You can also review previous issues [4][5] for > ideas on the style and format. > > We are looking forward to all of your 2014Q3 reports! > > Thanks, > Ed (on behalf of monthly@) > > > [1] http://www.freebsd.org/cgi/monthly.cgi > [2] http://www.freebsd.org/news/status/report-sample.xml > [3] http://www.freebsd.org/news/status/howto.html > [4] http://www.freebsd.org/news/status/report-2014-01-2014-03.html > [4] http://www.freebsd.org/news/status/report-2014-04-2014-06.html From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 30 08:44:13 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E8618385 for ; Tue, 30 Sep 2014 08:44:13 +0000 (UTC) Received: from mail-ob0-x232.google.com (mail-ob0-x232.google.com [IPv6:2607:f8b0:4003:c01::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B89982EA for ; Tue, 30 Sep 2014 08:44:13 +0000 (UTC) Received: by mail-ob0-f178.google.com with SMTP id uy5so3600392obc.23 for ; Tue, 30 Sep 2014 01:44:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=3lP4QPGe4MFYD2frX4yzVH2LxS2eFwY+oJYM7XacmWY=; b=ecvx/AR+jwJz2H6WxJa3vgTF/MKGvKa5Y8aGoTMR2dmWmMRabHuyUNkAOijhhAuGYi 8S/QipAkEAwzZ46gtzx+xyrdFR41SqiAiizJGiDh+b9QDqLYQ7UwHvF4Me+bFC2vRfO8 bRuOlmRl5JDrZBLhmDGkCy8PvB4K1WqSwI0yyg9zj73Ezi1nIbXAKaqspYm0XNZpBsmh E/vFEU/Pd3+ZVRwWr0ffL799NnM2e4hxUmNmPOjz2vj3zveQOS/lPEGu3mc+rlz6XHUs WRvciChDXTVa+brcLlZLvJU/GGE0SyfEHvqpZF9IGUP18vbQsnnEfZ0KN2VLflqfYYVf /6dA== MIME-Version: 1.0 X-Received: by 10.60.133.228 with SMTP id pf4mr16917649oeb.38.1412066652951; Tue, 30 Sep 2014 01:44:12 -0700 (PDT) Received: by 10.76.167.65 with HTTP; Tue, 30 Sep 2014 01:44:12 -0700 (PDT) Date: Tue, 30 Sep 2014 01:44:12 -0700 Message-ID: Subject: Textdump capture not generating "ddb.txt" when scripted via ddb utility From: Shrikanth Kamath To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 08:44:14 -0000 I am trying to experiment with text dumps, and using the ddb utility to script the necessary capture information when a panic is triggered. The problem I am seeing is that ddb.txt is not getting generated as the ddb capture is not set on when invoked via ddb utility. I am doing the following % /sbin/ddb script kdb.enter.panic="textdump set; capture on; show pcpu; bt; ps; alltrace; capture off; reset" % sysctl debug.ddb.textdump.pending=1 debug.ddb.textdump.pending: 0 -> 1 I drop to the debugger and trigger a panic, which promptly generates the text dump but is creating only the following text files %tar -xvf textdump.tar.1 x msgbuf.txt x panic.txt x version.txt The ddb.txt is not generated. But if I drop to the debugger and do the following after doing the above scripting, db> capture on db>show allpcpu db>capture off I am able to see the ddb.txt after triggering panic. Question is why is /sbin/ddb script not effecting "capture on" when done from command line? Am I missing any steps. Here are my settings %sysctl -a | grep ddb debug.ddb.capture.data: debug.ddb.capture.bufsize: 49152 debug.ddb.capture.inprogress: 0 debug.ddb.capture.maxbufsize: 5242880 debug.ddb.capture.bufoff: 20523 debug.ddb.scripting.unscript: debug.ddb.scripting.scripts: kdb.enter.panic=textdump set; capture on; show pcpu; bt; ps; alltrace; capture off; reset debug.ddb.textdump.do_version: 1 debug.ddb.textdump.do_panic: 1 debug.ddb.textdump.do_msgbuf: 1 debug.ddb.textdump.do_ddb: 1 debug.ddb.textdump.pending: 1 debug.ddb_use_printf: 0 debug.kdb.current: ddb debug.kdb.available: ddb gdb ndb This is in a FreeBSD 10 environment. -- Shrikanth R K From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 30 08:46:00 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5C8EB525 for ; Tue, 30 Sep 2014 08:46:00 +0000 (UTC) Received: from eu1sys200aog122.obsmtp.com (eu1sys200aog122.obsmtp.com [207.126.144.153]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AD90A313 for ; Tue, 30 Sep 2014 08:45:59 +0000 (UTC) Received: from mail-wg0-f45.google.com ([74.125.82.45]) (using TLSv1) by eu1sys200aob122.postini.com ([207.126.147.11]) with SMTP ID DSNKVCptrOGz+YiV7QLEYC64AlgitTLiBo3F@postini.com; Tue, 30 Sep 2014 08:45:59 UTC Received: by mail-wg0-f45.google.com with SMTP id m15so1894770wgh.4 for ; Tue, 30 Sep 2014 01:45:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:message-id:to:subject:reply-to; bh=aIAgGXCBOfQPwcoIfRIcXOlf7W9LSb9IHmKVtMTqgb8=; b=XqRIp5AvZZzeljaL89PaA6uU2nVtgudIWu/XNFHKZp0i7BINrlhg36Bv859t4N9rNR AyLwom8Ss6ZrjT1RwpJyzDpzMk0l+AhW69gJ+/kxWqOIA3CT7SEzAqm4FcKv3RdHxzjZ ycoiIqgtrt8UsGBtIIQ1MaJwfUdPIFCA/vEB+OPmyWDIr/+X48SsN1oMR7Hi9RTUmL8Y f1bPt8XvGJTdTiz+Vr2BJXILRVujov02HKspMpyh9OHdRy4HNxJr08OqrZx4rqYMu3Ne hTlrM71RKDGQEl/x88TfZCq87Y6vEBG/RHb9xvfmjIMHUVpWilPtG44yEA2IljNvRJHI 2fgw== X-Gm-Message-State: ALoCoQlaL6/zoXYMf13xL5CFLYSNYsxNR9w0jawPXJRW9Cw6w9KQFpS71KAEavtr3iUSU2oDriNjsXFtT97LYlRoumJI/yYBf4aJaG/2WVt70FIXl/SGtbYHCH1TVloJi6e+RFXmAQTdD9G1NsnvzoZVh4Tn62ZZ9g== X-Received: by 10.180.97.98 with SMTP id dz2mr4058270wib.26.1412066732621; Tue, 30 Sep 2014 01:45:32 -0700 (PDT) X-Received: by 10.180.97.98 with SMTP id dz2mr4058253wib.26.1412066732511; Tue, 30 Sep 2014 01:45:32 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk. [137.222.187.221]) by mx.google.com with ESMTPSA id t1sm14384135wiy.8.2014.09.30.01.45.31 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Sep 2014 01:45:32 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1]) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s8U8jUYU079242 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Tue, 30 Sep 2014 09:45:30 +0100 (BST) (envelope-from mexas@mech-as221.men.bris.ac.uk) Received: (from mexas@localhost) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s8U8jUTa079241 for freebsd-hackers@freebsd.org; Tue, 30 Sep 2014 09:45:30 +0100 (BST) (envelope-from mexas) Date: Tue, 30 Sep 2014 09:45:30 +0100 (BST) From: Anton Shterenlikht Message-Id: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk> To: freebsd-hackers@freebsd.org Subject: cluster FS? Reply-To: mexas@bristol.ac.uk X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 08:46:00 -0000 Hello Not sure if this is the right list... I wanted to ask about a cluster file system. Is there something like this on FreeBSD? It seems to me (just from reading the handbook) that none of NFS, HAST or iSCSI provide this. My specific needs are as follows. I have multiple nodes and a disk array. Each node is connected by fibre to the disk array. I want to have each node read/write access to all disks on disk array. So that if any node fails, the data is still accessible via the remaining nodes. I want to have all nodes equal, i.e. no master/slave or server/client model. Also, the disk array provides adequate RAID already, so that is not needed either. In the archives I see that the demands for a cluster FS support on FreeBSD have been expressed periodically over a very long time, but seems there's never been any resolution. Some people mention GFS, but I've no idea if this what I'm trying to describe. So is what I'm describing a cluster FS at all? Is there something like this on FreeBSD already? Is there someting in ports that can be used to achive this? Thanks Anton From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 30 11:19:26 2014 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CB544D16; Tue, 30 Sep 2014 11:19:26 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id AB3B195B; Tue, 30 Sep 2014 11:19:25 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA21479; Tue, 30 Sep 2014 14:19:23 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1XYvSl-000G91-7i; Tue, 30 Sep 2014 14:19:23 +0300 Message-ID: <542A916A.2060703@FreeBSD.org> Date: Tue, 30 Sep 2014 14:18:02 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: freebsd-hackers@FreeBSD.org Subject: uk_slabsize, uk_ppera, uk_ipers, uk_pages Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 8bit Cc: Gleb Smirnoff X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 11:19:27 -0000 I have a hard time understanding how to use uk_slabsize, uk_ppera, uk_ipers, uk_pages to derive other useful characteristics of UMA kegs. This is despite the good descriptions of the fields and multiple examples of their usage in the code. Unfortunately, I find those examples to be at least inconsistent and possibly contradictory. First problem is quite obvious. uk_slabsize has a too narrow type. For example, ZFS creates many zones with item sizes larger than 64KB. So, obviously, uk_slabsize overflows. Not sure how that affects further calculation, if any, but probably not in a good way. On the other hand, there is probably no harm at all, because as far as I can see uk_slabsize is actually used only within keg_small_init(). It is set but not used in keg_large_init() and keg_cachespread_init(). It does not seem to be used after initialization. So, maybe this field could be just converted to a local variable in keg_small_init() ? Now a second problem. Even the names uk_ppera (pages per allocation) and uk_ipers (items per slab) leave some room for ambiguity. What is a relation between the allocation and the slab? It seems that some code assumes that the slab takes the whole allocation (that is, one slab per allocation), other code places multiple slabs into a single allocation, while other code looks inconsistent in this respect. For instance: static void keg_drain(uma_keg_t keg) { ... LIST_REMOVE(slab, us_link); keg->uk_pages -= keg->uk_ppera; keg->uk_free -= keg->uk_ipers; A slab is freed. There is no question about uk_free. But it is clear that the code assumes the slab takes a whole allocation. keg_alloc_slab() is symmetric with these stats. int uma_zone_set_max(uma_zone_t zone, int nitems) { uma_keg_t keg; keg = zone_first_keg(zone); if (keg == NULL) return (0); KEG_LOCK(keg); keg->uk_maxpages = (nitems / keg->uk_ipers) * keg->uk_ppera; if (keg->uk_maxpages * keg->uk_ipers < nitems) keg->uk_maxpages += keg->uk_ppera; nitems = keg->uk_maxpages * keg->uk_ipers; KEG_UNLOCK(keg); return (nitems); } The uk_maxpages calculation seems to assume that the allocation and the slab is the same. We first calculate a number of slabs needed to hold nitems and then multiply that number by the number of pages per allocation. But when we calculate nitems to be returned we simply multiply uk_maxpages by uk_ipers as if we assume that the slab size is 1 page regardless of uk_ppera. uma_zone_get_max() calculates nitems in the same way without taking uk_ppera into account. uma_print_keg(): out is calculated as (keg->uk_ipers * keg->uk_pages) - keg->uk_free while limit is calculated as: (keg->uk_maxpages / keg->uk_ppera) * keg->uk_ipers In one case we simply multiply a number of pages by ipers, but in the other case we first divide with uk_ppera. My personal opinion is that we should settle on the rule that the slab and the allocation map 1:1 and fix the code that does not conform to that. It seems that this is how the code that allocates and frees slabs actually works. I do not see any good reason to support multiple slabs per an allocation. P.S. By the way, we have some wonderful things in UMA code that are not used anymore (if ever?) and are scarcely documented. Perhaps some of those could be removed to simplify the code: - UMA_ZONE_CACHESPREAD - uma_zsecond_add() More generally it looks like the support for multiple zones using the same keg is quite useful. On the other hand the support for a zone using multiple kegs is of questionable utility and at present that capability is not used. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 30 11:23:26 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 426FEE72 for ; Tue, 30 Sep 2014 11:23:26 +0000 (UTC) Received: from puchar.net (puchar.net [188.252.31.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "puchar.net", Issuer "puchar.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C3DD9A15 for ; Tue, 30 Sep 2014 11:23:25 +0000 (UTC) Received: Received: from 127.0.0.1 (localhost [127.0.0.1]) by puchar.net (8.14.9/8.14.9) with ESMTP id s8UB4XBN001014; Tue, 30 Sep 2014 13:04:35 +0200 (CEST) (envelope-from wojtek@puchar.net) Date: Tue, 30 Sep 2014 13:04:34 +0200 (CEST) From: Wojciech Puchar X-X-Sender: wojtek@laptop To: mexas@bristol.ac.uk Subject: Re: cluster FS? In-Reply-To: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk> Message-ID: References: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (puchar.net [10.0.1.1]); Tue, 30 Sep 2014 13:04:35 +0200 (CEST) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 11:23:26 -0000 > > It seems to me (just from reading the handbook) > that none of NFS, HAST or iSCSI provide this. none of following are filesystems at all. NFS is remote access to filesystem, the rest presents raw block device. > My specific needs are as follows. > I have multiple nodes and a disk array. > Each node is connected by fibre to the disk array. > I want to have each node read/write access > to all disks on disk array. > So that if any node fails, the > data is still accessible > via the remaining nodes. as disk array presents block devices, not files it is not possible to have filesystem read write access with more than one computer to the same block device. There is no AFAIK filesystems that can communicate between nodes to synchronize state after writes and prevent conflict. > I want to have all nodes equal, i.e. no master/slave > or server/client model. Also, the disk array > provides adequate RAID already, so that is not instead of using disk arrays (expensive) it's better to run FreeBSD as file server with good deal of disks and connectivity and export filesystems using eg. NFS. you may do any RAID type and any filesystem not only cheaper but with extra security - on disk format is known and open and you may access these disks with any other computer running FreeBSD. From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 30 11:44:27 2014 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CAB6841E; Tue, 30 Sep 2014 11:44:27 +0000 (UTC) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 47E2FC9E; Tue, 30 Sep 2014 11:44:26 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.9/8.14.9) with ESMTP id s8UBiOOe074177 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 30 Sep 2014 15:44:24 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.9/8.14.9/Submit) id s8UBiOfH074176; Tue, 30 Sep 2014 15:44:24 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Tue, 30 Sep 2014 15:44:24 +0400 From: Gleb Smirnoff To: Andriy Gapon Subject: Re: uk_slabsize, uk_ppera, uk_ipers, uk_pages Message-ID: <20140930114424.GD73266@glebius.int.ru> References: <542A916A.2060703@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <542A916A.2060703@FreeBSD.org> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-hackers@FreeBSD.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 11:44:28 -0000 Andriy, On Tue, Sep 30, 2014 at 02:18:02PM +0300, Andriy Gapon wrote: A> I have a hard time understanding how to use uk_slabsize, uk_ppera, uk_ipers, A> uk_pages to derive other useful characteristics of UMA kegs. This is despite A> the good descriptions of the fields and multiple examples of their usage in the A> code. Unfortunately, I find those examples to be at least inconsistent and A> possibly contradictory. A> A> First problem is quite obvious. uk_slabsize has a too narrow type. For A> example, ZFS creates many zones with item sizes larger than 64KB. So, A> obviously, uk_slabsize overflows. Not sure how that affects further A> calculation, if any, but probably not in a good way. A> On the other hand, there is probably no harm at all, because as far as I can see A> uk_slabsize is actually used only within keg_small_init(). It is set but not A> used in keg_large_init() and keg_cachespread_init(). It does not seem to be A> used after initialization. So, maybe this field could be just converted to a A> local variable in keg_small_init() ? Nice observation. I bet, that when I developed UMA_ZONE_PCPU this field was used outside of keg_small_init(). It looks like now, uk_ipers and uk_pages are enough to know outside of keg_small_init(). A> Now a second problem. Even the names uk_ppera (pages per allocation) and A> uk_ipers (items per slab) leave some room for ambiguity. What is a relation A> between the allocation and the slab? It seems that some code assumes that the A> slab takes the whole allocation (that is, one slab per allocation), other code A> places multiple slabs into a single allocation, while other code looks A> inconsistent in this respect. A> My personal opinion is that we should settle on the rule that the slab and the A> allocation map 1:1 and fix the code that does not conform to that. A> It seems that this is how the code that allocates and frees slabs actually A> works. I do not see any good reason to support multiple slabs per an allocation. In case of UMA_ZONE_PCPU, the slab is ncpu times smaller than the allocation. BUT, whenever you do uma_zalloc() you allocate not a single item, but ncpu items at a time. That's why all statistics that you quoted work correctly. And that's why we do not have 1:1 mapping of slab and allocation and we need to have uk_ppera, and uk_ipers. A> P.S. A> By the way, we have some wonderful things in UMA code that are not used anymore A> (if ever?) and are scarcely documented. Perhaps some of those could be removed A> to simplify the code: A> - UMA_ZONE_CACHESPREAD A> - uma_zsecond_add() A> More generally it looks like the support for multiple zones using the same keg A> is quite useful. On the other hand the support for a zone using multiple kegs A> is of questionable utility and at present that capability is not used. I second on that. -- Totus tuus, Glebius. From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 30 12:16:42 2014 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 758A3166; Tue, 30 Sep 2014 12:16:42 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 7EB25FF7; Tue, 30 Sep 2014 12:16:41 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA22062; Tue, 30 Sep 2014 15:16:40 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1XYwMB-000GCH-N7; Tue, 30 Sep 2014 15:16:39 +0300 Message-ID: <542A9EF0.3050405@FreeBSD.org> Date: Tue, 30 Sep 2014 15:15:44 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Gleb Smirnoff Subject: Re: uk_slabsize, uk_ppera, uk_ipers, uk_pages References: <542A916A.2060703@FreeBSD.org> <20140930114424.GD73266@glebius.int.ru> In-Reply-To: <20140930114424.GD73266@glebius.int.ru> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Cc: freebsd-hackers@FreeBSD.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 12:16:42 -0000 On 30/09/2014 14:44, Gleb Smirnoff wrote: > Andriy, > > On Tue, Sep 30, 2014 at 02:18:02PM +0300, Andriy Gapon wrote: > A> Now a second problem. Even the names uk_ppera (pages per allocation) and > A> uk_ipers (items per slab) leave some room for ambiguity. What is a relation > A> between the allocation and the slab? It seems that some code assumes that the > A> slab takes the whole allocation (that is, one slab per allocation), other code > A> places multiple slabs into a single allocation, while other code looks > A> inconsistent in this respect. > > A> My personal opinion is that we should settle on the rule that the slab and the > A> allocation map 1:1 and fix the code that does not conform to that. > A> It seems that this is how the code that allocates and frees slabs actually > A> works. I do not see any good reason to support multiple slabs per an allocation. > > In case of UMA_ZONE_PCPU, the slab is ncpu times smaller than the allocation. > > BUT, whenever you do uma_zalloc() you allocate not a single item, but ncpu items > at a time. That's why all statistics that you quoted work correctly. This is not true for kegs with multi-page slabs. Consider a zone with 8KB items on a system 4KB pages. Its keg uses slabs with the size of two pages, uk_ppera is 2. There is only one item per slab, uk_ipers is 1. Let's say there are two slabs allocated. Then uk_pages is 4. So, uk_ipers * uk_pages would give 4, but in reality there are only two items. The correct calculation must be (uk_pages / uk_ppera) * uk_ipers. If you have enough CPUs for a pcpu zone to use multi-page slabs / allocations, then the above will also be applicable. Consider "64 pcpu" and 8 CPUs. You have uk_ppera = 2, uk_ipers = 128. If there is only 1 "real" slab allocated that's 2 pages, so uk_pages * uk_ipers = 256, but in reality the correct number of provided items is (uk_pages / uk_ppera) * uk_ipers = 128. BTW, it's a pity that you omitted the code that demonstrated the problem from the quote. > And that's > why we do not have 1:1 mapping of slab and allocation and we need to have > uk_ppera and uk_ipers. We do have 1:1 mapping between allocations and "real" slabs. The imaginary "slabs" specific to pcpu zones do not affect how the keg code works. We do need both uk_ppera and uk_ipers, of course, because one allows to convert between a number of pages and a number of slabs and the other allows to convert between a number of items and the number of slabs. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 30 12:27:02 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 491987E1 for ; Tue, 30 Sep 2014 12:27:02 +0000 (UTC) Received: from eu1sys200aog104.obsmtp.com (eu1sys200aog104.obsmtp.com [207.126.144.117]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 990B219F for ; Tue, 30 Sep 2014 12:27:00 +0000 (UTC) Received: from mail-we0-f177.google.com ([74.125.82.177]) (using TLSv1) by eu1sys200aob104.postini.com ([207.126.147.11]) with SMTP ID DSNKVCqhjVqti65x8Jti6vtMegtU9eqHEklH@postini.com; Tue, 30 Sep 2014 12:27:01 UTC Received: by mail-we0-f177.google.com with SMTP id k48so86720wev.36 for ; Tue, 30 Sep 2014 05:26:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to :in-reply-to; bh=QHVwLQcm8vaQtesTYRfUewrr+/GJ5jrqDq0OEJ6qZMM=; b=kS5kAzo0UV91FYnnn35yLVONN1hMoGJSA8mXb4REp+DU6+oVrIoz8PN35K8IN4bXiI 3vgxXAACP2ZI+7hUTMVDKWa8xgH2ejoaO4itRGgdF/DduY4GhJgeLGLXu77WD9JfrCUx AcCU6VLX/1FeF8Vsh++/uKOPKeFNnXPylCfqvpBBM+lhvPZBsoOWWkb4nyTalNUEYuVG q0bfLpTRsjsgqSg7/uF973QILNBxYqDjOwi0aqrtwUZ5zSwWtoFoImS/T7ew1dYiOI4z qhzl7dYeZcvh6DHVQZK12SWWaUK46KJAYE+OhS7xamye1nv4p7dratXlLpqKJdRGbTYn 83Cw== X-Received: by 10.180.76.100 with SMTP id j4mr5069383wiw.51.1412078276097; Tue, 30 Sep 2014 04:57:56 -0700 (PDT) X-Gm-Message-State: ALoCoQnn/j1SlF70/8wtiB1P30wHK7w074Ee7wgVu2oMjTBapgpsxccw0cXUdjvJj4Y7S7gxI/wQQ2fYUW3pUS+s/s71r4/9jUng9fWblgGypBYXy6BtbhagHON7GC4uZMjGwmyPmBmOS8yw68Jv/CPE3ult/86IOQ== X-Received: by 10.180.76.100 with SMTP id j4mr5069370wiw.51.1412078275989; Tue, 30 Sep 2014 04:57:55 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk. [137.222.187.221]) by mx.google.com with ESMTPSA id cy10sm18993813wjb.21.2014.09.30.04.57.54 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Sep 2014 04:57:55 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1]) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s8UBvrLT079813 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 30 Sep 2014 12:57:54 +0100 (BST) (envelope-from mexas@mech-as221.men.bris.ac.uk) Received: (from mexas@localhost) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s8UBvr8f079812; Tue, 30 Sep 2014 12:57:53 +0100 (BST) (envelope-from mexas) Date: Tue, 30 Sep 2014 12:57:53 +0100 (BST) From: Anton Shterenlikht Message-Id: <201409301157.s8UBvr8f079812@mech-as221.men.bris.ac.uk> To: mexas@bristol.ac.uk, wojtek@puchar.net Subject: Re: cluster FS? Reply-To: mexas@bristol.ac.uk In-Reply-To: Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 12:27:02 -0000 >From wojtek@puchar.net Tue Sep 30 12:14:35 2014 > >as disk array presents block devices, not files it is not possible to have >filesystem read write access with more than one computer to the same block >device. >There is no AFAIK filesystems that can communicate between nodes to >synchronize state after writes and prevent conflict. The hardware is inherited from a VMS cluster, which did precisely that. I don't remember now what FS VMS used. I guess I'm trying to replicate a VMS cluster with FreeBSD means. >> I want to have all nodes equal, i.e. no master/slave >> or server/client model. Also, the disk array >> provides adequate RAID already, so that is not > >instead of using disk arrays (expensive) it's better to run FreeBSD as well.. I have a populated array already, so no extra costs are involved. >file server with good deal of disks and connectivity and export >filesystems using eg. NFS. but again, what if the NFS server dies? The data is no longer available. Thanks Anton From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 30 12:36:11 2014 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2276DBC3; Tue, 30 Sep 2014 12:36:11 +0000 (UTC) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 94D382F5; Tue, 30 Sep 2014 12:36:09 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.9/8.14.9) with ESMTP id s8UCa7Kg074382 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 30 Sep 2014 16:36:07 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.9/8.14.9/Submit) id s8UCa7pW074381; Tue, 30 Sep 2014 16:36:07 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Tue, 30 Sep 2014 16:36:07 +0400 From: Gleb Smirnoff To: Andriy Gapon Subject: Re: uk_slabsize, uk_ppera, uk_ipers, uk_pages Message-ID: <20140930123607.GE73266@glebius.int.ru> References: <542A916A.2060703@FreeBSD.org> <20140930114424.GD73266@glebius.int.ru> <542A9EF0.3050405@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <542A9EF0.3050405@FreeBSD.org> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-hackers@FreeBSD.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 12:36:11 -0000 On Tue, Sep 30, 2014 at 03:15:44PM +0300, Andriy Gapon wrote: A> This is not true for kegs with multi-page slabs. Consider a zone with 8KB items A> on a system 4KB pages. Its keg uses slabs with the size of two pages, uk_ppera A> is 2. There is only one item per slab, uk_ipers is 1. Let's say there are two A> slabs allocated. Then uk_pages is 4. So, uk_ipers * uk_pages would give 4, but A> in reality there are only two items. The correct calculation must be (uk_pages A> / uk_ppera) * uk_ipers. A> A> If you have enough CPUs for a pcpu zone to use multi-page slabs / allocations, A> then the above will also be applicable. Consider "64 pcpu" and 8 CPUs. You have A> uk_ppera = 2, uk_ipers = 128. If there is only 1 "real" slab allocated that's 2 A> pages, so uk_pages * uk_ipers = 256, but in reality the correct number of A> provided items is (uk_pages / uk_ppera) * uk_ipers = 128. You are right. -- Totus tuus, Glebius. From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 30 22:53:41 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6CF9D1A5 for ; Tue, 30 Sep 2014 22:53:41 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0074.outbound.protection.outlook.com [157.56.111.74]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1C8E9FAA for ; Tue, 30 Sep 2014 22:53:39 +0000 (UTC) Received: from DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) by DM2PR0801MB0943.namprd08.prod.outlook.com (25.160.131.26) with Microsoft SMTP Server (TLS) id 15.0.1039.15; Tue, 30 Sep 2014 22:53:59 +0000 Received: from DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) by DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) with mapi id 15.00.1039.011; Tue, 30 Sep 2014 22:53:40 +0000 From: "Pokala, Ravi" To: "freebsd-hackers@freebsd.org" Subject: dumpsys/savecore on AF-4Kn drives? Thread-Topic: dumpsys/savecore on AF-4Kn drives? Thread-Index: AQHP3QFgvwLpNOmcHUaIK0Ete9ljBw== Date: Tue, 30 Sep 2014 22:53:40 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.4.140807 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [64.80.217.3] x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR0801MB0943; x-forefront-prvs: 0350D7A55D x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(164054003)(199003)(189002)(2351001)(229853001)(107886001)(110136001)(54356999)(107046002)(4396001)(120916001)(76482002)(97736003)(99286002)(101416001)(87936001)(105586002)(64706001)(106116001)(2656002)(77096002)(31966008)(83506001)(66066001)(80022003)(106356001)(85306004)(95666004)(20776003)(21056001)(50986999)(85852003)(99396003)(92566001)(10300001)(36756003)(46102003)(86362001)(92726001); DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR0801MB0943; H:DM2PR0801MB0944.namprd08.prod.outlook.com; FPR:; MLV:sfv; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Content-Type: text/plain; charset="us-ascii" Content-ID: <1C6A75B497BE8A49B67831132974C60C@namprd08.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: panasas.com X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 22:53:41 -0000 Hi folks, Does anyone out there have AF-4Kn drives (both logical and physical sector size is 4KB)? Have you been able to drop a core to one, and successfully save the core on the way back up? I'm working on adding AF-4Kn support to an older version of FreeBSD (based on 7 - yeah, I know... :-P), using -CURRENT as a reference. Things look good at the GEOM level and higher; the GEOM utils report correct sizes, UFS runs fine, etc. If I manually break into the debugger and 'call doadump', it appears to work; at least, it does not report any errors. But when I reboot, `savecore' complains: error reading dump header at offset 0 in /dev/mirror/gm1: Invalid argument (Yes, it's dumping to a mirror; no, that's not the problem: the mirror is configured using the 'prefer' balancing algorithm, as described in gmirror(8), and we've been doing this without issue for years.) I'm trying to figure out if the problem is on the dumpsys side, the savecore side, or if they're both broken for AF-4Kn. In particular, 'struct kerneldumpheader' is 512 bytes, and it looks like most calls to dump_write() in the full-dump context (not minidumps) pass either the size of the structure, or an explicit 512, for the 'length' argument. That's the case in both the 7-ish version I'm porting to, and in -CURRENT. There's no AF-4Kn-aware bootstrap in the version we're using (emaste@ - does the new UEFI bootstrap in 10-STABLE work w/ AF-4Kn drives?), so one of the drives is 512n, and I could probably find some space on there to save the core to. But that device is small, and we have other uses for it, so I'd like to avoid reserving a large chunk of it. Any thoughts? Thanks, Ravi From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 01:10:21 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2382AAF6 for ; Wed, 1 Oct 2014 01:10:21 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 074F1F1D for ; Wed, 1 Oct 2014 01:10:20 +0000 (UTC) Received: from [192.168.1.172] (pool-173-52-87-124.nycmny.fios.verizon.net [173.52.87.124]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: ryao) by smtp.gentoo.org (Postfix) with ESMTPSA id 3214334003C; Wed, 1 Oct 2014 01:10:09 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: cluster FS? From: Richard Yao X-Mailer: iPad Mail (11D257) In-Reply-To: Date: Tue, 30 Sep 2014 21:10:07 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk> To: Wojciech Puchar Cc: "freebsd-hackers@freebsd.org" , "mexas@bristol.ac.uk" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 01:10:21 -0000 On Sep 30, 2014, at 7:04 AM, Wojciech Puchar wrote: >>=20 >> It seems to me (just from reading the handbook) >> that none of NFS, HAST or iSCSI provide this. >=20 > none of following are filesystems at all. NFS is remote access to filesyst= em, the rest presents raw block device. >=20 >> My specific needs are as follows. >> I have multiple nodes and a disk array. >> Each node is connected by fibre to the disk array. >> I want to have each node read/write access >> to all disks on disk array. >> So that if any node fails, the >> data is still accessible >> via the remaining nodes. >=20 > as disk array presents block devices, not files it is not possible to have= filesystem read write access with more than one computer to the same block d= evice. > There is no AFAIK filesystems that can communicate between nodes to synchr= onize state after writes and prevent conflict. Linux tends to have most of the work in this area. In specific, Lustre, Ceph= and Gluster. Gluster is FUSE-based and the server will run on FreeBSD: https://wiki.freebsd.org/GlusterFS The client likely can run on FreeBSD too, but it might be that no one has te= sted it because the FreeBSD support was done before FreeBSD supported FUSE.= From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 01:14:30 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 36A97C03 for ; Wed, 1 Oct 2014 01:14:30 +0000 (UTC) Received: from mx1.scaleengine.net (beauharnois2.bhs1.scaleengine.net [142.4.218.15]) by mx1.freebsd.org (Postfix) with ESMTP id EBB8BFD0 for ; Wed, 1 Oct 2014 01:14:29 +0000 (UTC) Received: from [192.168.1.2] (Seawolf.HML3.ScaleEngine.net [209.51.186.28]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id BB30F57F65 for ; Wed, 1 Oct 2014 01:14:28 +0000 (UTC) Message-ID: <542B557F.4050603@freebsd.org> Date: Tue, 30 Sep 2014 21:14:39 -0400 From: Allan Jude User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Re: cluster FS? References: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk> In-Reply-To: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="uwrg0P5ODEVL921K6DrL39Cl0LCCjugsE" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 01:14:30 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --uwrg0P5ODEVL921K6DrL39Cl0LCCjugsE Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 2014-09-30 04:45, Anton Shterenlikht wrote: > Hello >=20 > Not sure if this is the right list... > I wanted to ask about a cluster file system. > Is there something like this on FreeBSD? >=20 > It seems to me (just from reading the handbook) > that none of NFS, HAST or iSCSI provide this. >=20 > My specific needs are as follows. > I have multiple nodes and a disk array. > Each node is connected by fibre to the disk array. > I want to have each node read/write access > to all disks on disk array. > So that if any node fails, the > data is still accessible > via the remaining nodes. >=20 > I want to have all nodes equal, i.e. no master/slave > or server/client model. Also, the disk array > provides adequate RAID already, so that is not > needed either. >=20 > In the archives I see that the demands for > a cluster FS support on FreeBSD have been expressed > periodically over a very long time, but seems > there's never been any resolution. > Some people mention GFS, but I've no idea > if this what I'm trying to describe. >=20 > So is what I'm describing a cluster FS at all? > Is there something like this on FreeBSD already? > Is there someting in ports that can be used > to achive this? >=20 > Thanks >=20 > Anton >=20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.o= rg" >=20 What you are describing doesn't really seem to be a 'cluster' FS. In a cluster, the disks would reside in multiple machines, and the 'file system' would withstand any one of those machines going down. That is quite a bit different than just wanting a bunch of clients to have concurrent access to a single disk array. If you explain your use-case in more detail, we may be able to guide you in the right direction. --=20 Allan Jude --uwrg0P5ODEVL921K6DrL39Cl0LCCjugsE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) iQIcBAEBAgAGBQJUK1WCAAoJEJrBFpNRJZKfY6YP/1I225shSB9C0Vnkw9oNBLxy JfJ7nxyghsFcCseUe+N7ggYCMxr6DO+z4RTG4XuCJ2v7ntlcBkdO8LwnuUL+blkY noGpPzJHuAsX/iujTsNe2XPukuCK3guEFKyO1MMbG1WQnNbRCq3F5zPwOoVUYhHC urFZeoSLYnZWFL6deFJcrTxDVuXh0gc/O/d9lOIieWUVtgFDnBLLmo6LZ6t9qc0P EBOsM28dINchOgOoN2fLyhkvISN1ZwbIN6p8HNM2vYsWZg7toOlWjKr4KUMtyDvp KEpFpBMTOI058qxsNDqjgpboj5izhp7N2o7rtFp1ks2JR0ar1Y2iHzm8lTTuXpqQ Rp9xfhDt8D7yxy3zDI49+mMRCHWnqqg8GGaC5qCs0urd4SGto4ZzV6X91qdnrmSo /adc4PZtJYHECiSB14D0tMDiDJW/w40F/j1oXlA87OwbQtrbDgomq/Kj4JnPAN/3 8vxF3oFoQO5fMfgjmIV2MfUsbt7F9zYFoiZlG0Yyw5rqyNz00nKiTj1p4eFtpaNR isEsIjBCjF3Th8mqtwfWeWMLX99UeWN6XMrMVKhxnT89jwgvp9cQ710Q+ZX/teZR UjBR7I1HFa01wmCcwy/4lTsoI8eaD+KGGAwUwjXXdHlOpjqJttPxdP6JNfHWR0Gh kA/buRftqIGfw+Y88EFx =qAKB -----END PGP SIGNATURE----- --uwrg0P5ODEVL921K6DrL39Cl0LCCjugsE-- From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 01:44:57 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A211A419; Wed, 1 Oct 2014 01:44:57 +0000 (UTC) Received: from mail-yh0-x22e.google.com (mail-yh0-x22e.google.com [IPv6:2607:f8b0:4002:c01::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 575FE32A; Wed, 1 Oct 2014 01:44:57 +0000 (UTC) Received: by mail-yh0-f46.google.com with SMTP id f73so66122yha.19 for ; Tue, 30 Sep 2014 18:44:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=xgvuuBGKIKC6Xe/fgpV03bJrGcOCBPG1QiO3xHcvkL0=; b=v3HTGx10lw/rImfiYABAK6ybADOTRL1iNLnRtv5W0rUX7kmZdGBCpsfpocGFrjqjIC V5OPWq2YK9ZdaGls2z7kxfPWp6KqObJvfiKEHjmgj6BHLGAZ51uoB26kTh8e992IxxRU a/CH1cmnAPteZtYn5XItR+8WaXul24XHIyEbXUZcw4xpRgekim3OTByrP7DTH1KiZ36t fCJARcXvzj9e9Zzz7xeVnuP7s1t0iBVutZdB9mIXIYNU75XTtH3jw5xgOSLxI0l5yJZd Pw9J1njmlHF06JI7NeVpT1eehDFJHM9tLyErwsK9V9lqG9PI+eXrn23biLsguuFOHlNm Qjbw== MIME-Version: 1.0 X-Received: by 10.236.127.140 with SMTP id d12mr75723501yhi.37.1412127896572; Tue, 30 Sep 2014 18:44:56 -0700 (PDT) Received: by 10.170.206.10 with HTTP; Tue, 30 Sep 2014 18:44:56 -0700 (PDT) In-Reply-To: <542B557F.4050603@freebsd.org> References: <201409300845.s8U8jUTa079241@mech-as221.men.bris.ac.uk> <542B557F.4050603@freebsd.org> Date: Tue, 30 Sep 2014 21:44:56 -0400 Message-ID: Subject: Re: cluster FS? From: Mehmet Erol Sanliturk To: Allan Jude Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: FreeBSD Hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 01:44:57 -0000 On Tue, Sep 30, 2014 at 9:14 PM, Allan Jude wrote: > On 2014-09-30 04:45, Anton Shterenlikht wrote: > > Hello > > > > Not sure if this is the right list... > > I wanted to ask about a cluster file system. > > Is there something like this on FreeBSD? > > > > It seems to me (just from reading the handbook) > > that none of NFS, HAST or iSCSI provide this. > > > > My specific needs are as follows. > > I have multiple nodes and a disk array. > > Each node is connected by fibre to the disk array. > > I want to have each node read/write access > > to all disks on disk array. > > So that if any node fails, the > > data is still accessible > > via the remaining nodes. > > > > I want to have all nodes equal, i.e. no master/slave > > or server/client model. Also, the disk array > > provides adequate RAID already, so that is not > > needed either. > > > > In the archives I see that the demands for > > a cluster FS support on FreeBSD have been expressed > > periodically over a very long time, but seems > > there's never been any resolution. > > Some people mention GFS, but I've no idea > > if this what I'm trying to describe. > > > > So is what I'm describing a cluster FS at all? > > Is there something like this on FreeBSD already? > > Is there someting in ports that can be used > > to achive this? > > > > Thanks > > > > Anton > > > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to " > freebsd-hackers-unsubscribe@freebsd.org" > > > > What you are describing doesn't really seem to be a 'cluster' FS. > > In a cluster, the disks would reside in multiple machines, and the 'file > system' would withstand any one of those machines going down. That is > quite a bit different than just wanting a bunch of clients to have > concurrent access to a single disk array. > > If you explain your use-case in more detail, we may be able to guide you > in the right direction. > > -- > Allan Jude > > The following pages and their associated pages may be useful for definitions of terms and available capabilities : http://en.wikipedia.org/wiki/Parallel_Virtual_Machine http://en.wikipedia.org/wiki/Linda_%28coordination_language%29 http://en.wikipedia.org/wiki/Category:Parallel_computing http://en.wikipedia.org/wiki/Category:Concurrent_computing http://en.wikipedia.org/wiki/Category:Distributed_computing http://en.wikipedia.org/wiki/Network-attached_storage http://en.wikipedia.org/wiki/Clustered_file_system http://en.wikipedia.org/wiki/Category:Shared_disk_file_systems http://en.wikipedia.org/wiki/Category:Network_file_systems http://en.wikipedia.org/wiki/Ceph_%28software%29 http://en.wikipedia.org/wiki/XtreemFS The above problem seems to be "Network-attached_storage" . Thank you very much . Mehmet Erol Sanliturk From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 03:22:36 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D1436438 for ; Wed, 1 Oct 2014 03:22:36 +0000 (UTC) Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23]) by mx1.freebsd.org (Postfix) with ESMTP id 906A4EEC for ; Wed, 1 Oct 2014 03:22:36 +0000 (UTC) Received: (qmail 15878 invoked by uid 1000); 1 Oct 2014 03:15:53 -0000 Date: Tue, 30 Sep 2014 23:15:53 -0400 From: Larry Baird To: freebsd-hackers@freebsd.org Subject: Kernel/Compiler bug Message-ID: <20141001031553.GA14360@gta.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 03:22:36 -0000 I have run into a compiler optimization bug with clang version 3.4.1 and "-O0" when compiling a 10.1 i386 kernel. When debugging kernels using kgbd I like to disable compiler optimization. I have been fighting a kernel double fault bug for a while. I thought is was a modification I had made. Today I finally stumbled upon the fact that it is a compiler lack of optimization bug. (-: It is easy to duplicate the issue with a GENERIC kernel and 10.1-BETA3. Edit /sys/conf/kmod.pre.mk changing first _MINUS_O to '-O0'. --- /sys/conf/kern.pre.mk 2014-09-26 06:33:38.000000000 -0400 +++ kern.pre.mk 2014-09-30 22:59:51.000000000 -0400 @@ -26,7 +26,7 @@ SIZE?= size .if defined(DEBUG) -_MINUS_O= -O +_MINUS_O= -O0 CTFFLAGS+= -g .else .if ${MACHINE_CPUARCH} == "powerpc" Build GENERIC as usual and you will get a double faulting kernel. Should this be reported as a FreeBSD kernel bug or as a clang optimization bug? To get a backtrace I created a kernel conf file called GDB containing: include GENERIC options KDB options KDB_TRACE options DDB options GDB options ALT_BREAK_TO_DEBUGGER # break is CR ~ ^b This resulted in the following panic: /boot/kernel/kernel text=0x1890d80 data=0xebdf0+0x163d60 syms=[0x4+0x126190+0x4+0x18bb01] Booting... GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2014 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10.1-BETA3 #0: Tue Sep 30 22:40:18 EDT 2014 lab@test2.gta.com:/usr/obj/usr/src/sys/GDB i386 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 CPU: AMD FX(tm)-8150 Eight-Core Processor (3573.27-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x600f12 Family = 0x15 Model = 0x1 Stepping = 2 Features=0x1783fbff Features2=0x201 AMD Features=0x2a100800 AMD Features2=0x13 real memory = 2147418112 (2047 MB) avail memory = 2072879104 (1976 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 pnpbios: Bad PnP BIOS data checksum random device not loaded; using insecure entropy ioapic0 irqs 0-23 on motherboard random: initialized kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) acpi0: Sleep Button (fixed) cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 attimer0: port 0x40-0x43,0x50-0x53 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 isab0: at device 1.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xd000-0xd00f at device 1.1 on pci0 ata0: at channel 0 on atapci0 ata1: at channel 1 on atapci0 vgapci0: mem 0xe0000000-0xe0ffffff irq 18 at device 2.0 on pci0 vgapci0: Boot video device em0: port 0xd010-0xd017 mem 0xf0000000-0xf001ffff irq 19 at device 3.0 on pci0 em0: Ethernet address: 08:00:27:32:5e:fe pcm0: port 0xd100-0xd1ff,0xd200-0xd23f irq 21 at device 5.0 on pci0 pcm0: ohci0: mem 0xf0804000-0xf0804fff irq 22 at device 6.0 on pci0 usbus0 on ohci0 pci0: at device 7.0 (no driver attached) ehci0: mem 0xf0805000-0xf0805fff irq 19 at device 11.0 on pci0 usbus1: EHCI version 1.0 usbus1 on ehci0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: console (9600,n,8,1) acpi_acad0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xe2000-0xe2fff pnpid ORM0000 on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x100> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 atrtc0: at port 0x70 irq 8 on isa0 Event timer "RTC" frequency 32768 Hz quality 0 ppc0: parallel port not found. Timecounters tick every 10.000 msec pcm0: ac97 link rate calibration timed out after 1998076 us em0: link state changed to UP usbus0: 12Mbps Full Speed USB v1.0 usbus1: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ada0 at ata0 bus 0 scbus0 target 0 lun 0 ada0: ATA-6 Fatal double fault: eip = 0xc10dbf34 esp = 0xe27f1000 ebp = 0xe27f1004 cpuid = 0; apic id = 00 panic: double fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper(c1ad615d,c1e7090c,5,16,0,...) at db_trace_self_wrapper+0x38/frame 0xc1e708d8 kdb_backtrace(c1c81330,0,c1c81eaf,c1e709e4,a,...) at kdb_backtrace+0x49/frame 0xc1e70940 vpanic(c1c81eaf,c1e709e4,c1e709e4,c1c81eaf,c1e70a50,...) at vpanic+0x209/frame 0xc1e709c0 panic(c1c81eaf,0,0,d,b,...) at panic+0x26/frame 0xc1e709d8 dblfault_handler() at dblfault_handler+0x14b/frame 0xc1e709d8 --- trap 0x17, eip = 0xc10dbf34, esp = 0xe27f1000, ebp = 0xe27f1004 --- critical_enter(0,c76a3c40) at critical_enter+0x4/frame 0xe27f1004 spinlock_enter(0,0,0,0,0,...) at spinlock_enter+0x61/frame 0xe27f1014 sched_setcpu(c782b000,0,0,0,0,...) at sched_setcpu+0x7d/frame 0xe27f1068 sched_add(c782b000,0,0,0,c1e56abc,e5,c782b2e0,c782b000) at sched_add+0x10d/frame 0xe27f10c4 sched_wakeup(c782b000,0,0,0,0,...) at sched_wakeup+0xe6/frame 0xe27f10ec setrunnable(c782b000,0,0,0,0,...) at setrunnable+0x145/frame 0xe27f111c sleepq_resume_thread(c757d2c0,c782b000,0,37d,0,...) at sleepq_resume_thread+0x2b4/frame 0xe27f1164 sleepq_timeout(c782b000,4,e6,eeea40f0,e27f126c,...) at sleepq_timeout+0xf3/frame 0xe27f11d0 softclock_call_cc(c782b264,c1eb4700,1,ac,1f,...) at softclock_call_cc+0x3d0/frame 0xe27f1318 callout_process(50170178,3,fffffffc,16a3c40,0,...) at callout_process+0x4d5/frame 0xe27f1430 handleevents(50170178,3,0,0,0,...) at handleevents+0x4fc/frame 0xe27f1558 timercb(c1e75d78,0,0,0,0,...) at timercb+0x70c/frame 0xe27f1630 lapic_handle_timer(e27f1680) at lapic_handle_timer+0x10b/frame 0xe27f1674 Xtimerint() at Xtimerint+0x20/frame 0xe27f1674 --- interrupt, eip = 0xc1936fcf, esp = 0xe27f16c0, ebp = 0xe27f16c4 --- write_eflags(80246,80246) at write_eflags+0xf/frame 0xe27f16c4 intr_restore(80246,80246,c76a3c40) at intr_restore+0x17/frame 0xe27f16d4 spinlock_exit(c1e377b4,4,c76a3c40,c113f1a0,c248ffc8,...) at spinlock_exit+0x52/frame 0xe27f16e8 cnputs(e27f1754,ffffffff,1,a,e27f1874,...) at cnputs+0x16e/frame 0xe27f1720 _vprintf(ffffffff,5,c19a5b0c,e27f1874,5,...) at _vprintf+0x182/frame 0xe27f181c vprintf(c19a5b0c,e27f1874,6,e27f1874,c19a5b0c,...) at vprintf+0x45/frame 0xe27f184c printf(c19a5b0c,e27f18d4,e27f18c4,c19d6aff,6,...) at printf+0x21/frame 0xe27f1868 ata_print_ident(c7ad699c,c19af72b,0,c19d6aac,0,...) at ata_print_ident+0x121/frame 0xe27f1914 xpt_announce_periph(c76a0100,e27f1b1c,c19af9bf,19000,0,...) at xpt_announce_periph+0x13a/frame 0xe27f1990 adaregister(c76a0100,e27f2340,0,0,0,...) at adaregister+0x1212/frame 0xe27f1d14 cam_periph_alloc(c0506b40,c05080d0,c0508190,c0508360,c19af72b,...) at cam_periph_alloc+0x510/frame 0xe27f1dc0 adaasync(0,80,e27f27c0,e27f2340,0,...) at adaasync+0x1d8/frame 0xe27f2308 xptsetasyncfunc(c7ad6800,e27f2a50,c7828800,e27f29e8,c04bea45,...) at xptsetasyncfunc+0x13e/frame 0xe27f27ec xptdefdevicefunc(c7ad6800,e27f29e0,c76a3c40,0,0,...) at xptdefdevicefunc+0x46/frame 0xe27f2820 xptdevicetraverse(c769fd00,0,c04c7970,e27f29e0,0,...) at xptdevicetraverse+0x2c5/frame 0xe27f28b8 xptdeftargetfunc(c769fd00,e27f29e0,4,c1d7cf08,16a3c40,...) at xptdeftargetfunc+0x7a/frame 0xe27f28ec xpttargettraverse(c7858700,0,c04c7410,e27f29e0,0,...) at xpttargettraverse+0x222/frame 0xe27f2968 xptdefbusfunc(c7858700,e27f29e0,1,c1c933b8,c7858700,...) at xptdefbusfunc+0x7a/frame 0xe27f299c xptbustraverse(0,c04c6fe0,e27f29e0,0,2,...) at xptbustraverse+0x99/frame 0xe27f29c8 xpt_for_all_devices(c04c69f0,e27f2a50,4,ffffffff,ffffffff,...) at xpt_for_all_devices+0x5b/frame 0xe27f2a00 xpt_register_async(80,c05041a0,0,0,0,...) at xpt_register_async+0x2b4/frame 0xe27f2af4 adainit(1,2,2,0,2,...) at adainit+0x3d/frame 0xe27f2b48 periphdriver_init(2,c769f2a8,1000000,4,2,...) at periphdriver_init+0x7f/frame 0xe27f2b64 xpt_finishconfig_task(c7837780,1,4,0,0,...) at xpt_finishconfig_task+0x26/frame 0xe27f2b88 taskqueue_run_locked(c769f280,4,c76a3c40,0,0,...) at taskqueue_run_locked+0x1c7/frame 0xe27f2bec taskqueue_thread_loop(c1eb6928,e27f2d08,0,0,0,...) at taskqueue_thread_loop+0x1cb/frame 0xe27f2c80 fork_exit(c1151cd0,c1eb6928,e27f2d08) at fork_exit+0x179/frame 0xe27f2cf4 fork_trampoline() at fork_trampoline+0x8/frame 0xe27f2cf4 --- trap 0, eip = 0, esp = 0xe27f2d40, ebp = 0 --- KDB: enter: panic [ thread pid 0 tid 100025 ] Stopped at breakpoint+0x4: popl %ebp db> -- ------------------------------------------------------------------------ Larry Baird Global Technology Associates, Inc. 1992-2012 | http://www.gta.com Celebrating Twenty Years of Software Innovation | Orlando, FL Email: lab@gta.com | TEL 407-380-0220 From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 04:46:37 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E9DDEDAC for ; Wed, 1 Oct 2014 04:46:37 +0000 (UTC) Received: from mail-lb0-x231.google.com (mail-lb0-x231.google.com [IPv6:2a00:1450:4010:c04::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 766478FC for ; Wed, 1 Oct 2014 04:46:37 +0000 (UTC) Received: by mail-lb0-f177.google.com with SMTP id w7so45649lbi.8 for ; Tue, 30 Sep 2014 21:46:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=U+sYvnEghbeJANflAMUJ6tmJ1Fp8PKXu1tpGfNQqqa0=; b=vsMT+NhDDhYj83Bc4FiRZZFZTU4F4+jkOyMaabTzIg68S7yxyLgLU2MWUNulO78PLZ /6MKAMWUr944odYBF2h0GP4pw6dM8pIL53qaOF/db1/GtqL4BVvrbpovUf6nmCcT8aDg SNHWicj9W/iTC9jgUpwq+QAPvspRre2CGVvXWvi2y/fhc/S+JY/tTEjRWBzMOwD4639/ ieFHfeDq+nzNDD/c4/LXKo3UorPJrVQYZkYD8pSydgxHmFIgkzhEVD+fffhjAQpOiIMY aiszY+SLd4CL62ZI4leaEj85avIh88e7mgsVWuVTVLLncQPevbjtRUce607Pd9p7zaqG kiEQ== MIME-Version: 1.0 X-Received: by 10.112.13.132 with SMTP id h4mr49310987lbc.45.1412138795339; Tue, 30 Sep 2014 21:46:35 -0700 (PDT) Received: by 10.25.21.197 with HTTP; Tue, 30 Sep 2014 21:46:35 -0700 (PDT) In-Reply-To: <20141001031553.GA14360@gta.com> References: <20141001031553.GA14360@gta.com> Date: Wed, 1 Oct 2014 00:46:35 -0400 Message-ID: Subject: Re: Kernel/Compiler bug From: Ryan Stone To: Larry Baird Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 04:46:38 -0000 This may not be a compiler bug. A quick look at the esp values provided in that backtrace shows that at least 7KB has been used on the stack. The stack for kernel threads is only 8KB, and a stack overflow can cause a double fault like that. My suspicion would be that without optimizations on clang uses a lot more stack space and you push over the limit. There's a kernel build option for the stack size that you could change to confirm. I believe that it's called KSTACK_PAGES. Try increasing it to 4. From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 08:13:57 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6CB4F6A7 for ; Wed, 1 Oct 2014 08:13:57 +0000 (UTC) Received: from eu1sys200aog104.obsmtp.com (eu1sys200aog104.obsmtp.com [207.126.144.117]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BD95CB10 for ; Wed, 1 Oct 2014 08:13:56 +0000 (UTC) Received: from mail-wi0-f180.google.com ([209.85.212.180]) (using TLSv1) by eu1sys200aob104.postini.com ([207.126.147.11]) with SMTP ID DSNKVCu3wpPgNln48l5JfLhTGG76+P0hCBqY@postini.com; Wed, 01 Oct 2014 08:13:56 UTC Received: by mail-wi0-f180.google.com with SMTP id em10so947979wid.1 for ; Wed, 01 Oct 2014 01:13:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to :in-reply-to; bh=7vg4c7WR0eKID4OszivvJskEQvYPYDMHTGRwgtJTTEE=; b=WHp3Vet45NR7wXpGHRgZZ9H92GpaItLFFbmgS6b8LaqyCulNAAYla0VxYYd1iQpLM4 gqwbfU8fg3IgfbUrz95JAe5vSPmHmGpG72l4fg8+/nVVTm8+gGmRRjUfLHtZwQFlM1Uq 5+gywVyOUfiA9OZBR40A+Ho5XjNyHOp2wDe5ABBHTIy6iBh4JHVfwwr+FZZqkuTVJQdj xKAEcDVFys4Xkf3l/tllshwxmDQk6EhTRXXCnb6Gfv/xFFZN7IsJWsoZd9h4t3sUmWCT nMsmNUBhUfPkG0JlJf7uN0m2aABuNJ5AyC0eHGPle2opcyI7EhKapX7YWx7roZRG0iCR kPxQ== X-Received: by 10.180.83.103 with SMTP id p7mr12010340wiy.67.1412150902463; Wed, 01 Oct 2014 01:08:22 -0700 (PDT) X-Gm-Message-State: ALoCoQnCpkBeGo8E+XtE+2kzgmueepPCvkUjU27Hj84aWxy7kAEMNwkK5QL3ul2sxZMblej5X2wo1kcrrgDJuepkuVDjmEnvzMdHhCBj+RVyLkokpIG5f8OKzzknoK1hoJI3lHjeXIYcnAPL8cNTWBXVEMeYNNchZw== X-Received: by 10.180.83.103 with SMTP id p7mr12010322wiy.67.1412150902335; Wed, 01 Oct 2014 01:08:22 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk. [137.222.187.221]) by mx.google.com with ESMTPSA id ny6sm17678031wic.22.2014.10.01.01.08.21 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Oct 2014 01:08:21 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1]) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s9188KNW083914 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 1 Oct 2014 09:08:20 +0100 (BST) (envelope-from mexas@mech-as221.men.bris.ac.uk) Received: (from mexas@localhost) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s9188KVc083913; Wed, 1 Oct 2014 09:08:20 +0100 (BST) (envelope-from mexas) Date: Wed, 1 Oct 2014 09:08:20 +0100 (BST) From: Anton Shterenlikht Message-Id: <201410010808.s9188KVc083913@mech-as221.men.bris.ac.uk> To: allanjude@freebsd.org, m.e.sanliturk@gmail.com Subject: Re: cluster FS? Reply-To: mexas@bristol.ac.uk In-Reply-To: Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 08:13:57 -0000 >From owner-freebsd-hackers@freebsd.org Wed Oct 1 03:25:08 2014 > >On Tue, Sep 30, 2014 at 9:14 PM, Allan Jude wrote: > >> On 2014-09-30 04:45, Anton Shterenlikht wrote: >> > Hello >> > >> > Not sure if this is the right list... >> > I wanted to ask about a cluster file system. >> > Is there something like this on FreeBSD? >> > >> > It seems to me (just from reading the handbook) >> > that none of NFS, HAST or iSCSI provide this. >> > >> > My specific needs are as follows. >> > I have multiple nodes and a disk array. >> > Each node is connected by fibre to the disk array. >> > I want to have each node read/write access >> > to all disks on disk array. >> > So that if any node fails, the >> > data is still accessible >> > via the remaining nodes. >> > >> > I want to have all nodes equal, i.e. no master/slave >> > or server/client model. Also, the disk array >> > provides adequate RAID already, so that is not >> > needed either. >> > >> > In the archives I see that the demands for >> > a cluster FS support on FreeBSD have been expressed >> > periodically over a very long time, but seems >> > there's never been any resolution. >> > Some people mention GFS, but I've no idea >> > if this what I'm trying to describe. >> > >> > So is what I'm describing a cluster FS at all? >> > Is there something like this on FreeBSD already? >> > Is there someting in ports that can be used >> > to achive this? >> > >> > Thanks >> > >> > Anton >> > >> > _______________________________________________ >> > freebsd-hackers@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> > To unsubscribe, send any mail to " >> freebsd-hackers-unsubscribe@freebsd.org" >> > >> >> What you are describing doesn't really seem to be a 'cluster' FS. >> >> In a cluster, the disks would reside in multiple machines, and the 'file >> system' would withstand any one of those machines going down. That is >> quite a bit different than just wanting a bunch of clients to have >> concurrent access to a single disk array. >> >> If you explain your use-case in more detail, we may be able to guide you >> in the right direction. >> >> -- >> Allan Jude >> >> > >The following pages and their associated pages may be useful for >definitions of terms and available capabilities : > >http://en.wikipedia.org/wiki/Parallel_Virtual_Machine >http://en.wikipedia.org/wiki/Linda_%28coordination_language%29 > >http://en.wikipedia.org/wiki/Category:Parallel_computing >http://en.wikipedia.org/wiki/Category:Concurrent_computing >http://en.wikipedia.org/wiki/Category:Distributed_computing > > >http://en.wikipedia.org/wiki/Network-attached_storage >http://en.wikipedia.org/wiki/Clustered_file_system >http://en.wikipedia.org/wiki/Category:Shared_disk_file_systems >http://en.wikipedia.org/wiki/Category:Network_file_systems > > >http://en.wikipedia.org/wiki/Ceph_%28software%29 >http://en.wikipedia.org/wiki/XtreemFS > > > >The above problem seems to be "Network-attached_storage" . Now I'm even more confused. I think what I have is called SAN. The disk array is HP MSA1000: http://www8.hp.com/h20195/v2/GetDocument.aspx?docname=c04324510 *quote* The HP StorageWorks Modular Smart Array 1000 (MSA1000) is a 2 Gb Fibre Channel storage system designed for the entry-level to mid-range Storage Area Network (SAN). *end quote* The disk array has 8-port 2 Gb Fibre Channel Fabric Switch. At present I connect 3 FreeBSD 10 nodes to the disk array via fibre. However, only one node at a time is able to mount disks. What I'm looking for is the solution to be able to mount the disks on the disk array for read/write access from all nodes, up to 8. So that if a node fails, the data is still accessible via the other nodes. The model that I'm describing is a VMS cluster model. I'm not sure if it makes sense for FreeBSD. Thanks Anton From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 08:20:40 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9E408903; Wed, 1 Oct 2014 08:20:40 +0000 (UTC) Received: from mail.iXsystems.com (mail.ixsystems.com [12.229.62.4]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certification Authority" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8483AB7A; Wed, 1 Oct 2014 08:20:40 +0000 (UTC) Received: from localhost (mail.ixsystems.com [10.2.55.1]) by mail.iXsystems.com (Postfix) with ESMTP id 72DE780D96; Wed, 1 Oct 2014 01:20:39 -0700 (PDT) Received: from mail.iXsystems.com ([10.2.55.1]) by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP id 02411-09; Wed, 1 Oct 2014 01:20:39 -0700 (PDT) Received: from [10.8.0.26] (unknown [10.8.0.26]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.iXsystems.com (Postfix) with ESMTPSA id F134880D90; Wed, 1 Oct 2014 01:20:37 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1988\)) Subject: Re: cluster FS? From: Jordan Hubbard In-Reply-To: <201410010808.s9188KVc083913@mech-as221.men.bris.ac.uk> Date: Wed, 1 Oct 2014 11:20:33 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201410010808.s9188KVc083913@mech-as221.men.bris.ac.uk> To: mexas@bristol.ac.uk X-Mailer: Apple Mail (2.1988) Cc: freebsd-hackers@freebsd.org, allanjude@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 08:20:40 -0000 > On Oct 1, 2014, at 11:08 AM, Anton Shterenlikht = wrote: >=20 > The model that I'm describing is a VMS cluster model. > I'm not sure if it makes sense for FreeBSD. It does not. FreeBSD does not currently offer any form of clustered = filesystem support, nor would a SAN provide this in any case since = it=E2=80=99s just a shared fabric for a single set of storage devices. = You could front-end your SAN with a NAS, but that would simply re-export = the SAN as a shared file system such as NFS, which would let multiple = clients see it but still provide no fail-over if the NAS or primary SAN = controller died. You are trying to create an active/active fail-over system with multiple = modes. You cannot get there from where you are starting. This is = basically a =E2=80=9Cstart over=E2=80=9D proposition, and why folks like = NetApp and EMC sell a lot of fileservers to replace existing SAN = solutions. - Jordan From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 09:02:33 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0513DACA for ; Wed, 1 Oct 2014 09:02:33 +0000 (UTC) Received: from eu1sys200aog124.obsmtp.com (eu1sys200aog124.obsmtp.com [207.126.144.157]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 53A8E17A for ; Wed, 1 Oct 2014 09:02:31 +0000 (UTC) Received: from mail-wi0-f178.google.com ([209.85.212.178]) (using TLSv1) by eu1sys200aob124.postini.com ([207.126.147.11]) with SMTP ID DSNKVCvDH+Ra4b6MoMec82Wsdw7pdHjBNe+5@postini.com; Wed, 01 Oct 2014 09:02:32 UTC Received: by mail-wi0-f178.google.com with SMTP id cc10so1072574wib.5 for ; Wed, 01 Oct 2014 02:02:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to :in-reply-to; bh=aIuwTz+5Yt8whj6pUmpm5BIqAWp1e5WY6UTMvKepGmo=; b=Srgmv72spFjOpbigYHp7RpFA9OMYeQFb+HJPOth0CDAoO+ZkDE5FSIEHP3DqAoEudT dph1S70UdsN/L0lDSQLDtdDM+uGM5VwMwGzBPQpUebKIGtwbuDeImNgb3+vxoFKD4YZR PWg36Y01lr6LINnir+oeONFy6l+FSA4siN5luDQ20xiSId0oKB2oWsyECtn80uKeXrh0 /kE6wyopU55ZUlcOilrywQ+NgT7XoU7YmOSC2ASHTU9mBpKeKes+M9m7vXOx5KGQtPu0 nj/a9LBFcHI22IKiaBYnoQFbi1wqVUPgExs4K8XwshjfHYGaH44h5ZM09x6D/v/G78J7 YgTw== X-Gm-Message-State: ALoCoQnq3sD16mZsuwlH0oPp9+tez8q5B4ZH61MCxwuPbN1Ok8F0e8J20FYmY31KCcxEUSK1UQdEmzTN+rs8MBZK+mStNuT+9qyTt9jUQxlZq86kGJdCUnAt99ipb1bsb0W30TIuQlRetBh9BYpu/ygbEgHjdEOLuQ== X-Received: by 10.180.95.66 with SMTP id di2mr12603972wib.60.1412154143633; Wed, 01 Oct 2014 02:02:23 -0700 (PDT) X-Received: by 10.180.95.66 with SMTP id di2mr12603955wib.60.1412154143533; Wed, 01 Oct 2014 02:02:23 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk. [137.222.187.221]) by mx.google.com with ESMTPSA id k2sm422846wjy.34.2014.10.01.02.02.22 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Oct 2014 02:02:22 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1]) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s9192L50084233 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 1 Oct 2014 10:02:21 +0100 (BST) (envelope-from mexas@mech-as221.men.bris.ac.uk) Received: (from mexas@localhost) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s9192Lhb084232; Wed, 1 Oct 2014 10:02:21 +0100 (BST) (envelope-from mexas) Date: Wed, 1 Oct 2014 10:02:21 +0100 (BST) From: Anton Shterenlikht Message-Id: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk> To: jkh@mail.turbofuzz.com, mexas@bristol.ac.uk Subject: Re: cluster FS? Reply-To: mexas@bristol.ac.uk In-Reply-To: Cc: freebsd-hackers@freebsd.org, allanjude@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 09:02:33 -0000 >From jkh@mail.turbofuzz.com Wed Oct 1 09:26:57 2014 > >You are trying to create an active/active fail-over system with multiple modes. You cannot get there from where you are starting. This is basically a “start over” proposition, and why folks like NetApp and EMC sell a lot of fileservers to replace existing SAN solutions. So are you saying that the SAN model is not good for active/active failover with multiple nodes? Clearly if SAN itself fails, then the data is not accessible. From what I understand, in really mission critical systems people use multiple SANs with multiple nodes, with some extra data synchronisation mechanisms between those multiple SANs. Are you saying there are better solutions for high availability? Thanks Anton From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 09:12:59 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 22251DF0; Wed, 1 Oct 2014 09:12:59 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A431D300; Wed, 1 Oct 2014 09:12:58 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s919CqT7008194 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Oct 2014 12:12:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s919CqT7008194 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s919Cqtl008193; Wed, 1 Oct 2014 12:12:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 1 Oct 2014 12:12:52 +0300 From: Konstantin Belousov To: leon zadorin Subject: Re: does linsysfs support mmap on pci resources (e.g. pci device's registers etc.) Message-ID: <20141001091252.GP26076@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-emulation@freebsd.org, hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 09:12:59 -0000 You choice of the list to ask the question is weird. I added hackers@ as more suitable ML. On Wed, Oct 01, 2014 at 03:44:48PM +1000, leon zadorin wrote: > Hello everyone, > Sorry if this is a bit of a noob question -- I'm just starting on this > topic... does FreeBSD's emulation of sysfs (from linux world) support > "mmap" on pci resources? > > Something similar to the following in the linux environment: > > fd = open("/sys/devices/pci0001\:00/0001\:00\:07.0/resource0", O_RDWR | O_SYNC); > ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > printf("PCI BAR0 0x0000 = 0x%4x\n", *((unsigned short *) ptr); > > (above taken from > http://billfarrow.blogspot.com.au/2010/09/userspace-access-to-pci-memory.html) > > The reason I am asking is because I would like to map pci device > registers/memory in user space (and read/write some of the device's > registers from userspace). The reasons are auxiliary to this post > (e.g. kernel-bypass, system call bypass, etc.) At this stage it would > suffice to simply accept that user space pci-register access is needed > without paying the price of any system/ioctl/etc. call on every > access-instance to device's config/control register(s). > > I would prefer to avoid writing additional explicit (albeit generic) > pci related kernel module in order to provide "mmapping" of the given > pci resources to userspace if there is already such a generic way to > do it via sysfs "syntax" (I would like to reduce any of the > specific/additional code re-writing at the kernel level as much as > possible). AFAIK, there is no facilities in FreeBSD kernel which allow you to get the configuration registers or memory BARs mmapped into the userspace. The linsysfs is out of question for this sort of hacks. The native FreeBSD' /dev/pci does not support mmaping either. It should be not too hard to extend the /dev/pci to do what you described. Start looking at the sys/dev/pci/pci_user.c PCIe configuration window is active, so you could access it by hand by mmapping /dev/mem. Also, the window is mapped into KVA, so you could access it by /dev/kmem as well. /dev/mem would be easier, I think, because it needs the physical address, which can be learned from ACPI MCFG much easier than the value of the static symbol pcie_base. From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 09:38:38 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0A459441; Wed, 1 Oct 2014 09:38:38 +0000 (UTC) Received: from mail.iXsystems.com (mail.ixsystems.com [12.229.62.4]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certification Authority" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E49B474E; Wed, 1 Oct 2014 09:38:37 +0000 (UTC) Received: from localhost (mail.ixsystems.com [10.2.55.1]) by mail.iXsystems.com (Postfix) with ESMTP id 769FB7AC86; Wed, 1 Oct 2014 02:38:36 -0700 (PDT) Received: from mail.iXsystems.com ([10.2.55.1]) by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP id 07002-03; Wed, 1 Oct 2014 02:38:36 -0700 (PDT) Received: from [10.8.0.26] (unknown [10.8.0.26]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.iXsystems.com (Postfix) with ESMTPSA id 3F2E27AC82; Wed, 1 Oct 2014 02:38:35 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1988\)) Subject: Re: cluster FS? From: Jordan Hubbard In-Reply-To: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk> Date: Wed, 1 Oct 2014 12:38:32 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com> References: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk> To: mexas@bristol.ac.uk X-Mailer: Apple Mail (2.1988) Cc: freebsd-hackers@freebsd.org, allanjude@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 09:38:38 -0000 > On Oct 1, 2014, at 12:02 PM, Anton Shterenlikht = wrote: >=20 > So are you saying that the SAN model > is not good for active/active failover > with multiple nodes? Correct. SAN is active/passive. For more information on high availability solutions, I suggest you check = out the big file server vendors - there=E2=80=99s far more pertinent = information in their various whitepapers then you=E2=80=99ll ever get on = freebsd-hackers. :) - Jordan From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 10:17:28 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B79461EB for ; Wed, 1 Oct 2014 10:17:28 +0000 (UTC) Received: from eu1sys200aog106.obsmtp.com (eu1sys200aog106.obsmtp.com [207.126.144.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 15910BAC for ; Wed, 1 Oct 2014 10:17:27 +0000 (UTC) Received: from mail-wi0-f177.google.com ([209.85.212.177]) (using TLSv1) by eu1sys200aob106.postini.com ([207.126.147.11]) with SMTP ID DSNKVCvUnz558kOhXJ4zAHlAAANK9pjTgsUs@postini.com; Wed, 01 Oct 2014 10:17:28 UTC Received: by mail-wi0-f177.google.com with SMTP id ho1so40717wib.4 for ; Wed, 01 Oct 2014 03:17:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to :in-reply-to; bh=N7r4MxrrXIrcekcGQkG+MlfBVexufpd2gO/cBXqt2PU=; b=KcIlxVOWPe5l1GvmfLmxwoRoAHWlGXnP3u/7Jdjf9/Z0xJDuoYOorPi/0PE+3Dq/aw +LMn5nIT3Da3H3rP7JFQTEqDcOnuuVVy/T1ux3qvARTw7MB5DV/tuwesEtVPSmJIafHH raqjm+mX2NT1Krg18v50FADlgYNZn3giqqbHCDik7WXahINknhQDluz4zwBjLTCYDiod koAHLYkTmhxY7KRaj1fgVGyauNIfaTFjhZ15JqiquFPivT0/Ck+yT3fpmIizwZ3F4Qjd R0cjE1cUd7b/RLgjeFTRVGLW8uQHflWDOkvwhMSp1jOLOVRs+4nkCfDSB77wxF9oVbU6 VFeA== X-Gm-Message-State: ALoCoQms+eMIlG2C8+AbkByiRAncr7S1Qao3bXryqGepd6ad7LwVeyZvGmlj7717ZUoSDrfz68K9/tQjcmid5+eOJIEdn0ukqp8mca2xrHFtc3LnDXMAqrq3Gz+JYmf4teiWAnByGdJMGS/+4jHcwStx/EeAA8SWyw== X-Received: by 10.180.80.198 with SMTP id t6mr13323767wix.6.1412158623695; Wed, 01 Oct 2014 03:17:03 -0700 (PDT) X-Received: by 10.180.80.198 with SMTP id t6mr13323744wix.6.1412158623519; Wed, 01 Oct 2014 03:17:03 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk. [137.222.187.221]) by mx.google.com with ESMTPSA id au4sm655264wjc.15.2014.10.01.03.17.02 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Oct 2014 03:17:02 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1]) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s91AH1K0084405 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 1 Oct 2014 11:17:01 +0100 (BST) (envelope-from mexas@mech-as221.men.bris.ac.uk) Received: (from mexas@localhost) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s91AH1Lo084404; Wed, 1 Oct 2014 11:17:01 +0100 (BST) (envelope-from mexas) Date: Wed, 1 Oct 2014 11:17:01 +0100 (BST) From: Anton Shterenlikht Message-Id: <201410011017.s91AH1Lo084404@mech-as221.men.bris.ac.uk> To: jkh@mail.turbofuzz.com, mexas@bristol.ac.uk Subject: Re: cluster FS? Reply-To: mexas@bristol.ac.uk In-Reply-To: <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com> Cc: freebsd-hackers@freebsd.org, allanjude@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 10:17:28 -0000 >From jkh@mail.turbofuzz.com Wed Oct 1 10:42:50 2014 > > >> On Oct 1, 2014, at 12:02 PM, Anton Shterenlikht wrote: >> >> So are you saying that the SAN model >> is not good for active/active failover >> with multiple nodes? > >Correct. SAN is active/passive. > >For more information on high availability solutions, I suggest you check out the big file server vendors - there’s far more pertinent information in their various whitepapers then you’ll ever get on freebsd-hackers. :) I thought HP was the "big fileserver vendor"... Also, the SAN array I'm using does support active/active model since 2006: http://eis.bris.ac.uk/~mexas/aa.pdf *quote* HP StorageWorks 1000 Modular Smart Array Announcing active/active support A recent web release of alternative MSA controller firmware includes important new features, including active/active controllers *end quote* Or am I confusing the issues again? Many thanks for your time. I do appreciate your replies. Anton From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 10:16:30 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 96817195; Wed, 1 Oct 2014 10:16:30 +0000 (UTC) Received: from mail-ig0-x229.google.com (mail-ig0-x229.google.com [IPv6:2607:f8b0:4001:c05::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5E701BA4; Wed, 1 Oct 2014 10:16:30 +0000 (UTC) Received: by mail-ig0-f169.google.com with SMTP id uq10so6058igb.2 for ; Wed, 01 Oct 2014 03:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=UKWJQCZfy2zGhm12ph4huStaXHkz3lVgaU3eUEKBYaQ=; b=MLBjSu6Ylhwbyfr2XgE7lSPs7opw32oCOtCTiJzolb+BUjR40GqIVIUp0QzRkeB7lZ MW8G46i1bA28Kk20wzOh8Q0BLFY1pb/Cq0Q6s+HvkkzfRUvo+Ee4sHV3Nwuk0xB75fxd kMnrW1igjmcvJnilG1t2xHnZbiQxlvppne+kZ81BJTJPQB3AQ/eh3RjCKt0VWLjF+3ug hCN9SDEk8ocMWy02EyTdu+nT7NmcAICjh2L6g3R+dSbcdnnnXkWUCFhmFP8O++l6sTRL RCt9M9DhDxFMfijJ52UIkH78DGZ3x26Cf7U7q5anpTyPnXm7ZgeFuzQyWhpZmlDRoUuy 0FLA== MIME-Version: 1.0 X-Received: by 10.50.73.130 with SMTP id l2mr17423611igv.42.1412158589726; Wed, 01 Oct 2014 03:16:29 -0700 (PDT) Received: by 10.50.2.69 with HTTP; Wed, 1 Oct 2014 03:16:29 -0700 (PDT) In-Reply-To: <20141001091252.GP26076@kib.kiev.ua> References: <20141001091252.GP26076@kib.kiev.ua> Date: Wed, 1 Oct 2014 20:16:29 +1000 Message-ID: Subject: Re: does linsysfs support mmap on pci resources (e.g. pci device's registers etc.) From: leon zadorin To: Konstantin Belousov Content-Type: text/plain; charset=UTF-8 X-Mailman-Approved-At: Wed, 01 Oct 2014 11:13:50 +0000 Cc: freebsd-emulation@freebsd.org, hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 10:16:30 -0000 On Wed, Oct 1, 2014 at 7:12 PM, Konstantin Belousov wrote: > You choice of the list to ask the question is weird. I added hackers@ > as more suitable ML. I see, Sure thing about adding the post to another list -- sorry about doing the original post to this list (freebsd-emulation). I had read the lists's description: "A list for the Development of Emulators of other operating systems and enviroments for FreeBSD. These include: BSDI, Linux, and some microsoft products. " https://lists.freebsd.org/mailman/listinfo/freebsd-emulation and given that my post was essentially about whether FreeBSD emulated linux OS "sysfs" feature(s) I thought it might be of some relevance to the list... but being a noob to the list I have possibly not made the best choice :) > On Wed, Oct 01, 2014 at 03:44:48PM +1000, leon zadorin wrote: >> Hello everyone, >> Sorry if this is a bit of a noob question -- I'm just starting on this >> topic... does FreeBSD's emulation of sysfs (from linux world) support >> "mmap" on pci resources? >> >> Something similar to the following in the linux environment: >> >> fd = open("/sys/devices/pci0001\:00/0001\:00\:07.0/resource0", O_RDWR | O_SYNC); >> ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); >> printf("PCI BAR0 0x0000 = 0x%4x\n", *((unsigned short *) ptr); >> >> (above taken from >> http://billfarrow.blogspot.com.au/2010/09/userspace-access-to-pci-memory.html) [...] > AFAIK, there is no facilities in FreeBSD kernel which allow you to get > the configuration registers or memory BARs mmapped into the userspace. > The linsysfs is out of question for this sort of hacks. The native > FreeBSD' /dev/pci does not support mmaping either. > > It should be not too hard to extend the /dev/pci to do what you described. > Start looking at the sys/dev/pci/pci_user.c > > PCIe configuration window is active, so you could access it by hand by > mmapping /dev/mem. Also, the window is mapped into KVA, so you could > access it by /dev/kmem as well. /dev/mem would be easier, I think, > because it needs the physical address, which can be learned from ACPI > MCFG much easier than the value of the static symbol pcie_base. I see -- thanks for the pointers! I shall consider my options :) From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 11:03:23 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 806DA129 for ; Wed, 1 Oct 2014 11:03:23 +0000 (UTC) Received: from mail-yk0-x22d.google.com (mail-yk0-x22d.google.com [IPv6:2607:f8b0:4002:c07::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4442DF6 for ; Wed, 1 Oct 2014 11:03:23 +0000 (UTC) Received: by mail-yk0-f173.google.com with SMTP id 200so22098ykr.32 for ; Wed, 01 Oct 2014 04:03:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=0zuYDM9ISq6JEAr6aV02oqJu8Sw1mk/saBPQLcwmqbk=; b=WDXQP4N2bhWGjTrSo5LKqzYiYnxzsyvbZRhT6bImkPbTmR1ZFxDtOG/lI8s/2lAr9I q+vP1nxnrvoQDHEWuuK4DoYgZ7NkiRL5tMTriqsubr4Ne6CnVjkrPiUaMf5GTSCA//Vf oPk061GmbR65Csdfz7wAPlczHS8g2IZxbASbyoNBmEliPqRR+saNPjG9CQN+cc8LsRgw sFtGB16EjRTBWj1oq9cTFCukH5HwYm10/3l2aS4yZQJ9Zdk6GAgbO927nE4EUHA6AvTc TvQ34SMzG13+QQmdCIwE5WhOv3bJjjmZKR157hENCUgr1NPhhxyfjthfX80L3kqLdxSV 2/BQ== MIME-Version: 1.0 X-Received: by 10.236.128.33 with SMTP id e21mr2618520yhi.187.1412161402277; Wed, 01 Oct 2014 04:03:22 -0700 (PDT) Received: by 10.170.156.139 with HTTP; Wed, 1 Oct 2014 04:03:22 -0700 (PDT) In-Reply-To: <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com> References: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk> <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com> Date: Wed, 1 Oct 2014 12:03:22 +0100 Message-ID: Subject: Re: cluster FS? From: krad To: mexas@bristol.ac.uk X-Mailman-Approved-At: Wed, 01 Oct 2014 11:14:44 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 11:03:23 -0000 These are my definitions, hopefully it makes some stuff a little clearer Cluster file system: a file system that resides on a block device that multiple machines have rw access to, but that consistency is guaranteed. A good real world example of this is an VMware ESX datastore. ie a lun is presented to all the esxi hosts in the cluster, all of which can access it simultaneously. The key thing here is the guarantee of consistency. Distributed file system: A network file system that is created out of multiple nodes working together to provide a fault tolerant service. examples of this is luster, glusterfs, moosefs, p-nfs, openafs. One of the key things to understand here is that these file systems generally sit on top of the normal os file systems and each node has its own discreet storage. All replication is done via the network. Looking at your setup, if you want to provide a fault tolerant setup with your existing san there are two main paths I can think of. I am making the assumption the san is fault tolerant to your requirements option one create a set of LUNs and present them to your file server nodes. on one node create the file systems of your choice (prob zfs) setup carp in a master/slave setup with a vip, and import/export functions for the file systems export your file systems via nfs/cifs If you are dead set on using freebsd for this it will be more tricky to do this as a lot of work will have to be done by yourself. The main thing is making sure you dont have fs mounted in both nodes at once in a split brain scenario. If you can use other OS's something like sun cluster/veritas cluster/red hat cluster can do all of this for you. The advantages of this arch is that if you go for one of the commercial solutions you will have support and there are plenty of people out there with experience in this. option two use a distributed file system Basically here you would create 2x sets of luns and present one set to each node, and only one node. Format and mount up the luns to your preferences on each system. Install and configure the distributed file system of your choice and use the newly mounted file systems on each node as your datastores You should probably look at moosefs and glusterfs 1st, and then maybe openafs if you are going to use freebsd as the host system, but if you went for linux you would have a bigger choice at present On the two nodes of the distributed file system you would then want a relatively simple carp setup to float a VIP between the boxes. All clients would use this vip for their connection points. Also make sure the distributed FS is mounted back onto the Node as a normal mount point. This allows you to re-export it via cifs and NFS Finally for the clients. They have 3 basic ways of connecting to the vip. These should cover most eventualities 1. Native distributed fs client. 2. NFS. 3. CIFS The advantages of this over option one is it scales very well depending on your distributed fs of choice. It also means you can easily break away from you san over time if you want to, as all you need to do is add more nodes not on the san, and replicate to storage to them, then drop out the san nodes. I hoe this helps a little On 1 October 2014 10:38, Jordan Hubbard wrote: > > > On Oct 1, 2014, at 12:02 PM, Anton Shterenlikht > wrote: > > > > So are you saying that the SAN model > > is not good for active/active failover > > with multiple nodes? > > Correct. SAN is active/passive. > > For more information on high availability solutions, I suggest you check > out the big file server vendors - there=E2=80=99s far more pertinent info= rmation in > their various whitepapers then you=E2=80=99ll ever get on freebsd-hackers= . :) > > - Jordan > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " > From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 12:40:21 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9EB6A25E for ; Wed, 1 Oct 2014 12:40:21 +0000 (UTC) Received: from mail.iXsystems.com (mail.ixsystems.com [12.229.62.4]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certification Authority" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 83698E1D for ; Wed, 1 Oct 2014 12:40:21 +0000 (UTC) Received: from localhost (mail.ixsystems.com [10.2.55.1]) by mail.iXsystems.com (Postfix) with ESMTP id B99287FA59; Wed, 1 Oct 2014 05:40:20 -0700 (PDT) Received: from mail.iXsystems.com ([10.2.55.1]) by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP id 26186-06; Wed, 1 Oct 2014 05:40:20 -0700 (PDT) Received: from [10.8.0.38] (unknown [10.8.0.38]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.iXsystems.com (Postfix) with ESMTPSA id A96437FA56; Wed, 1 Oct 2014 05:40:19 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1988\)) Subject: Re: cluster FS? From: Jordan Hubbard In-Reply-To: Date: Wed, 1 Oct 2014 15:40:16 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201410010902.s9192Lhb084232@mech-as221.men.bris.ac.uk> <201E3A2E-B33D-4C63-AD81-8FFD5C2E0ED7@mail.turbofuzz.com> To: krad X-Mailer: Apple Mail (2.1988) Cc: freebsd-hackers@freebsd.org, mexas@bristol.ac.uk X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 12:40:21 -0000 > On Oct 1, 2014, at 2:03 PM, krad wrote: >=20 > These are my definitions, hopefully it makes some stuff a little = clearer Thanks for the exposition - if that doesn=E2=80=99t help Anton, I = don=E2=80=99t know what will. :-) To answer Anton=E2=80=99s previous question, he just needs to read the = PDF he cited a little more closely. HP has obviously provided some sort = of concurrent access mode to their SAN, but it's only active/active if = you have one of the supported operating systems. Presumably, HP also = provides drivers for those OSes which provide some sort of interlock = support, though again, it=E2=80=99s not clear just what sort of = filesystems you can put on the SAN and still keep the active/active = concurrency. It=E2=80=99s very tricky, and the penalty for getting it = wrong is corrupted data, so I=E2=80=99d tend to put my money on an = actual filesystem-level solution which provides concurrent access, like = glusterfs. That just went BETA with FreeBSD support, so who knows, = maybe it=E2=80=99s becoming a viable solution. I have zero experience = with deploying glusterfs, however, so I cannot speak to that. - Jordan From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 13:40:46 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 25C18848 for ; Wed, 1 Oct 2014 13:40:46 +0000 (UTC) Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23]) by mx1.freebsd.org (Postfix) with ESMTP id CE9C9816 for ; Wed, 1 Oct 2014 13:40:45 +0000 (UTC) Received: (qmail 57398 invoked by uid 1000); 1 Oct 2014 13:40:44 -0000 Date: Wed, 1 Oct 2014 09:40:44 -0400 From: Larry Baird To: Ryan Stone Subject: Re: Kernel/Compiler bug Message-ID: <20141001134044.GA57022@gta.com> References: <20141001031553.GA14360@gta.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 13:40:46 -0000 Ryan, On Wed, Oct 01, 2014 at 12:46:35AM -0400, Ryan Stone wrote: > This may not be a compiler bug. A quick look at the esp values > provided in that backtrace shows that at least 7KB has been used on > the stack. The stack for kernel threads is only 8KB, and a stack > overflow can cause a double fault like that. > > My suspicion would be that without optimizations on clang uses a lot > more stack space and you push over the limit. There's a kernel build > option for the stack size that you could change to confirm. I believe > that it's called KSTACK_PAGES. Try increasing it to 4. Good catch. Increasing KSTACK_PAGES does fix the issue. I wonder with optimization, how close to stack overflow does the kernel get during boot? Thank you, Larry -- ------------------------------------------------------------------------ Larry Baird Global Technology Associates, Inc. 1992-2012 | http://www.gta.com Celebrating Twenty Years of Software Innovation | Orlando, FL Email: lab@gta.com | TEL 407-380-0220 From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 13:48:32 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A2249A48 for ; Wed, 1 Oct 2014 13:48:32 +0000 (UTC) Received: from eu1sys200aog113.obsmtp.com (eu1sys200aog113.obsmtp.com [207.126.144.135]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F275F8F8 for ; Wed, 1 Oct 2014 13:48:31 +0000 (UTC) Received: from mail-wi0-f172.google.com ([209.85.212.172]) (using TLSv1) by eu1sys200aob113.postini.com ([207.126.147.11]) with SMTP ID DSNKVCwGFL+8d+LKzl0cuTyRTDQqw+GpPyFT@postini.com; Wed, 01 Oct 2014 13:48:32 UTC Received: by mail-wi0-f172.google.com with SMTP id n3so618506wiv.11 for ; Wed, 01 Oct 2014 06:48:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:message-id:to:subject:cc:reply-to :in-reply-to; bh=GDaz6fv1JfHIyZ/Me8JYDxLcSB3R0DOIXHmANdG+f6Q=; b=FYu453AkOfZf6rz2kUZymBqAt3HAwyxOG1ut6h8V294X++a+CobY3SKAU1bJ1B21zC zruuaO4tc2YZqLRNwO3+jWrFSMsTcXjWrhBjEQU6EcHhjCvCaUnGnSLJUCN/SOMf87F/ 0CkD0mMsTpmbrmtpXe4aWeH5XrIMBbNLsqivU2xMZE2WWYMzBz2rQbJq5SLO6MpjpCkX nFbHBV+swSSiAJ00xGYFvkDfv6Grj5iNw1tw6PoeMUZ+I0FSQK/XGngsfw8bLLcHTUz/ Yu+f36tCZUSUG1bmmxrjWmAStFXYXVz+cMHzb81+6lwOo7VtOrQu5QkzvCCqyMAuYGKy kWQQ== X-Gm-Message-State: ALoCoQm6b4W0EtiteSSqRV8SehQC+5IVAfOKnHwB+5dwnz+46SNDmyY1E9JL4AQeuKndOZs6yImRGQzspNDtkLql+NtfYdVdG1WbtZ51N/9dgVcVgPniuXyXdYacNiQaaD4Kr6KOQmfvxCe6pu/su/237f7rk5fcCg== X-Received: by 10.194.204.232 with SMTP id lb8mr64768831wjc.0.1412171284067; Wed, 01 Oct 2014 06:48:04 -0700 (PDT) X-Received: by 10.194.204.232 with SMTP id lb8mr64768749wjc.0.1412171283531; Wed, 01 Oct 2014 06:48:03 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (mech-as221.men.bris.ac.uk. [137.222.187.221]) by mx.google.com with ESMTPSA id cz3sm1245432wjb.23.2014.10.01.06.48.02 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Oct 2014 06:48:02 -0700 (PDT) Received: from mech-as221.men.bris.ac.uk (localhost [127.0.0.1]) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9) with ESMTP id s91Dm19V084972 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 1 Oct 2014 14:48:01 +0100 (BST) (envelope-from mexas@mech-as221.men.bris.ac.uk) Received: (from mexas@localhost) by mech-as221.men.bris.ac.uk (8.14.9/8.14.9/Submit) id s91Dm1n3084971; Wed, 1 Oct 2014 14:48:01 +0100 (BST) (envelope-from mexas) Date: Wed, 1 Oct 2014 14:48:01 +0100 (BST) From: Anton Shterenlikht Message-Id: <201410011348.s91Dm1n3084971@mech-as221.men.bris.ac.uk> To: jkh@mail.turbofuzz.com, kraduk@gmail.com Subject: Re: cluster FS? Reply-To: mexas@bristol.ac.uk In-Reply-To: Cc: freebsd-hackers@freebsd.org, mexas@bristol.ac.uk X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 13:48:32 -0000 >From jkh@mail.turbofuzz.com Wed Oct 1 14:22:36 2014 > >> On Oct 1, 2014, at 2:03 PM, krad wrote: >> >> These are my definitions, hopefully it makes some stuff a little clearer > >Thanks for the exposition - if that doesn’t help Anton, I don’t know what will. :-) It did. Thanks a lot. >To answer Anton’s previous question, he just needs to read the PDF he cited a little more closely. HP has obviously provided some sort of concurrent access mode to their SAN, but it's only active/active if you have one of the supported operating systems. Presumably, HP also provides drivers for those OSes which provide some sort of interlock support, though again, it’s not clear just what sort of filesystems you can put on the SAN and still keep the active/active concurrency. It’s very tricky, and the penalty for getting it wrong is corrupted data, so I’d tend to put my money on an actual filesystem-level solution which provides concurrent access, like glusterfs. That just went BETA with FreeBSD support, so who knows, maybe it’s becoming a viable solution. I have zero experience with deploying glusterfs, however, so I cannot speak to that. > ok, I get it - my plans are way beyond my IT expertise (amateur). Thanks again Anton From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 17:27:55 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C88DBF10 for ; Wed, 1 Oct 2014 17:27:55 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0083.outbound.protection.outlook.com [157.56.110.83]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 723E898E for ; Wed, 1 Oct 2014 17:27:54 +0000 (UTC) Received: from DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) by DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) with Microsoft SMTP Server (TLS) id 15.0.1039.15; Wed, 1 Oct 2014 17:12:19 +0000 Received: from DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) by DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) with mapi id 15.00.1039.011; Wed, 1 Oct 2014 17:12:19 +0000 From: "Pokala, Ravi" To: "freebsd-hackers@freebsd.org" Subject: Re: dumpsys/savecore on AF-4Kn drives? Thread-Topic: dumpsys/savecore on AF-4Kn drives? Thread-Index: AQHP3QFgvwLpNOmcHUaIK0Ete9ljB5wbBlIA Date: Wed, 1 Oct 2014 17:12:19 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.4.140807 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [24.6.178.251] x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR0801MB0944; x-forefront-prvs: 0351D213B3 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(13464003)(189002)(164054003)(377454003)(51704005)(199003)(20776003)(83506001)(101416001)(2351001)(107886001)(99396003)(80022003)(46102003)(85852003)(54356999)(76176999)(21056001)(50986999)(86362001)(64706001)(2656002)(19580395003)(92566001)(92726001)(19580405001)(66066001)(31966008)(85306004)(87936001)(105586002)(120916001)(95666004)(4396001)(10300001)(76482002)(107046002)(77096002)(110136001)(36756003)(99286002)(97736003)(106356001)(106116001); DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR0801MB0944; H:DM2PR0801MB0944.namprd08.prod.outlook.com; FPR:; MLV:sfv; PTR:InfoNoRecords; MX:1; A:1; LANG:en; Content-Type: text/plain; charset="us-ascii" Content-ID: <056BA62C25CA864480C517738B4FCDB0@namprd08.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: panasas.com X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 17:27:55 -0000 Re-posting something that was sent to me off-list, so the thread stays up-to-date: From: , Conrad Date: Wednesday, October 1, 2014 at 6:05 AM To: Ravi Pokala Subject: RE: dumpsys/savecore on AF-4Kn drives? Ravi, Skipping freebsd-hackers@ as I can't get Outlook to reply in the right way. savecore(1) uses raw read(2) calls to the passed device. FreeBSD DevFS doesn't support non-native block sizes, so that's probably where EINVAL is coming from. To support 4Kn from that end you could probably convert savecore(1) from read(2) and friends to fread(3) (assuming libc does the right thing re: native sector size). The kernel dump code is all DEV_BSIZE (512), but the backing dump device is free to do Read-Modify-Write to satisfy those 512 byte writes during dump. I don't know if gmirror does this, but if you were able to create a dump without error/panic, it probably does. Most of this code hasn't been touched since 2002 or so, 4Kn is a project :). Good luck! Hope that helps, Conrad -----Original Message----- From: , Ravi Pokala Date: Tuesday, September 30, 2014 at 3:53 PM To: "freebsd-hackers@freebsd.org" Subject: dumpsys/savecore on AF-4Kn drives? >Hi folks, > >Does anyone out there have AF-4Kn drives (both logical and physical sector >size is 4KB)? Have you been able to drop a core to one, and successfully >save the core on the way back up? > >I'm working on adding AF-4Kn support to an older version of FreeBSD (based >on 7 - yeah, I know... :-P), using -CURRENT as a reference. Things look >good at the GEOM level and higher; the GEOM utils report correct sizes, >UFS runs fine, etc. If I manually break into the debugger and 'call >doadump', it appears to work; at least, it does not report any errors. But >when I reboot, `savecore' complains: > > error reading dump header at offset 0 in /dev/mirror/gm1: Invalid >argument > >(Yes, it's dumping to a mirror; no, that's not the problem: the mirror is >configured using the 'prefer' balancing algorithm, as described in >gmirror(8), and we've been doing this without issue for years.) > >I'm trying to figure out if the problem is on the dumpsys side, the >savecore side, or if they're both broken for AF-4Kn. In particular, >'struct kerneldumpheader' is 512 bytes, and it looks like most calls to >dump_write() in the full-dump context (not minidumps) pass either the size >of the structure, or an explicit 512, for the 'length' argument. That's >the case in both the 7-ish version I'm porting to, and in -CURRENT. > >There's no AF-4Kn-aware bootstrap in the version we're using (emaste@ - >does the new UEFI bootstrap in 10-STABLE work w/ AF-4Kn drives?), so one >of the drives is 512n, and I could probably find some space on there to >save the core to. But that device is small, and we have other uses for it, >so I'd like to avoid reserving a large chunk of it. > >Any thoughts? > >Thanks, > >Ravi > From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 19:38:17 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1AF2EC68 for ; Wed, 1 Oct 2014 19:38:17 +0000 (UTC) Received: from tensor.andric.com (tensor.andric.com [87.251.56.140]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "tensor.andric.com", Issuer "CAcert Class 3 Root" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id CAB8EBB0 for ; Wed, 1 Oct 2014 19:38:16 +0000 (UTC) Received: from [IPv6:2001:7b8:3a7::e57d:9fd2:d3a8:dc94] (unknown [IPv6:2001:7b8:3a7:0:e57d:9fd2:d3a8:dc94]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 1255EB80A; Wed, 1 Oct 2014 21:38:07 +0200 (CEST) Content-Type: multipart/signed; boundary="Apple-Mail=_FE3828C5-4D92-4392-AE93-72655D74AB90"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Kernel/Compiler bug From: Dimitry Andric In-Reply-To: <20141001134044.GA57022@gta.com> Date: Wed, 1 Oct 2014 21:37:54 +0200 Message-Id: References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> To: Larry Baird X-Mailer: Apple Mail (2.1878.6) Cc: "freebsd-hackers@freebsd.org" , Ryan Stone X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 19:38:17 -0000 --Apple-Mail=_FE3828C5-4D92-4392-AE93-72655D74AB90 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii On 01 Oct 2014, at 15:40, Larry Baird wrote: > Ryan, > > On Wed, Oct 01, 2014 at 12:46:35AM -0400, Ryan Stone wrote: >> This may not be a compiler bug. A quick look at the esp values >> provided in that backtrace shows that at least 7KB has been used on >> the stack. The stack for kernel threads is only 8KB, and a stack >> overflow can cause a double fault like that. >> >> My suspicion would be that without optimizations on clang uses a lot >> more stack space and you push over the limit. There's a kernel build >> option for the stack size that you could change to confirm. I believe >> that it's called KSTACK_PAGES. Try increasing it to 4. > Good catch. Increasing KSTACK_PAGES does fix the issue. I wonder with > optimization, how close to stack overflow does the kernel get during boot? It obviously depends on which optimization flags you use, which drivers you include, and so on. There was a thread some time ago about somebody banging into the limit when mounting certain ZFS filesystems, here: https://lists.freebsd.org/pipermail/freebsd-current/2012-December/038208.html This is why Kostik added printing of the frame addresses to the panic backtrace output, so you can easily see if you hit the stack limit. That said, 8k is not much these days, especially not with fairly complicated code like ZFS, combined with high optimization, which can inline a lot of functions, causing even more stack usage. I would just bump KSTACK_PAGES. -Dimitry --Apple-Mail=_FE3828C5-4D92-4392-AE93-72655D74AB90 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) iEYEARECAAYFAlQsWBoACgkQsF6jCi4glqN8zgCeNe0ZiuINVUj9/pZCd3fUiu0R 2uEAoJc3rkdOrAgsYfXSuqrzltEVscAQ =uHtI -----END PGP SIGNATURE----- --Apple-Mail=_FE3828C5-4D92-4392-AE93-72655D74AB90-- From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 1 23:21:37 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 927DFD36 for ; Wed, 1 Oct 2014 23:21:37 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 73378CF8 for ; Wed, 1 Oct 2014 23:21:37 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.9/8.14.9) with ESMTP id s91NLb6i037175 for ; Wed, 1 Oct 2014 23:21:37 GMT (envelope-from bdrewery@freefall.freebsd.org) Received: (from bdrewery@localhost) by freefall.freebsd.org (8.14.9/8.14.9/Submit) id s91NLbRu037172 for freebsd-hackers@freebsd.org; Wed, 1 Oct 2014 23:21:37 GMT (envelope-from bdrewery) Received: (qmail 42342 invoked from network); 1 Oct 2014 18:21:32 -0500 Received: from unknown (HELO ?10.10.0.24?) (freebsd@shatow.net@10.10.0.24) by sweb.xzibition.com with ESMTPA; 1 Oct 2014 18:21:32 -0500 Message-ID: <542C8C75.30007@FreeBSD.org> Date: Wed, 01 Oct 2014 18:21:25 -0500 From: Bryan Drewery Organization: FreeBSD User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Dimitry Andric , Larry Baird Subject: Re: Kernel/Compiler bug References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> In-Reply-To: OpenPGP: id=6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1UPenIEbfu6kooDcV3CLQPGkblXxSIh6L" Cc: "freebsd-hackers@freebsd.org" , Ryan Stone X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2014 23:21:37 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --1UPenIEbfu6kooDcV3CLQPGkblXxSIh6L Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 10/1/2014 2:37 PM, Dimitry Andric wrote: > On 01 Oct 2014, at 15:40, Larry Baird wrote: >> Ryan, >> >> On Wed, Oct 01, 2014 at 12:46:35AM -0400, Ryan Stone wrote: >>> This may not be a compiler bug. A quick look at the esp values >>> provided in that backtrace shows that at least 7KB has been used on >>> the stack. The stack for kernel threads is only 8KB, and a stack >>> overflow can cause a double fault like that. >>> >>> My suspicion would be that without optimizations on clang uses a lot >>> more stack space and you push over the limit. There's a kernel build= >>> option for the stack size that you could change to confirm. I believ= e >>> that it's called KSTACK_PAGES. Try increasing it to 4. >> Good catch. Increasing KSTACK_PAGES does fix the issue. I wonder wit= h >> optimization, how close to stack overflow does the kernel get during b= oot? >=20 > It obviously depends on which optimization flags you use, which drivers= > you include, and so on. There was a thread some time ago about somebod= y > banging into the limit when mounting certain ZFS filesystems, here: >=20 > https://lists.freebsd.org/pipermail/freebsd-current/2012-December/03820= 8.html >=20 > This is why Kostik added printing of the frame addresses to the panic > backtrace output, so you can easily see if you hit the stack limit. >=20 > That said, 8k is not much these days, especially not with fairly > complicated code like ZFS, combined with high optimization, which can > inline a lot of functions, causing even more stack usage. I would just= > bump KSTACK_PAGES. >=20 > -Dimitry >=20 Is this something that can be bumped in the tree for GENERIC? --=20 Regards, Bryan Drewery --1UPenIEbfu6kooDcV3CLQPGkblXxSIh6L Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJULIx1AAoJEDXXcbtuRpfPHvkH/09cR3hY2SktVv5v4QjRlgMO 07+o6Dc/FGwHLpvwuq9XZXyAlr40j2We3la6sXPFnBcx1uQnLz9TNmEinohmLqlg zVMSUJd97OJRbEEwHsl/jmnSrVAJa+KIO748C0Lu9hgcPQc4eDY86N/nzTTpK4Vm 99+tEAGeIAnsUGaxg7sQNt6GsydcfAngp/UZ7NKPiQoMTJVW/F7cFT9iCIGWurnh udyhNMVmwQDOWuwD+QmWgmCuXGAPHiVME9F/DmTKBXPtFlEpx3XQdy1LCOybL2wM oaAjKO4EMzgl6Z1X6JTOrA2ZpgZb1EPheBzmc8z/2rgrJpJEeIS+FEdXrEfh/AE= =JsN4 -----END PGP SIGNATURE----- --1UPenIEbfu6kooDcV3CLQPGkblXxSIh6L-- From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 07:55:45 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0B62E827; Thu, 2 Oct 2014 07:55:45 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A12E987B; Thu, 2 Oct 2014 07:55:44 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s927tcGT030413 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 2 Oct 2014 10:55:38 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s927tcGT030413 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s927tbCm030410; Thu, 2 Oct 2014 10:55:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 2 Oct 2014 10:55:37 +0300 From: Konstantin Belousov To: Bryan Drewery Subject: Re: Kernel/Compiler bug Message-ID: <20141002075537.GU26076@kib.kiev.ua> References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> <542C8C75.30007@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <542C8C75.30007@FreeBSD.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: "freebsd-hackers@freebsd.org" , Ryan Stone , Dimitry Andric , Larry Baird X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 07:55:45 -0000 On Wed, Oct 01, 2014 at 06:21:25PM -0500, Bryan Drewery wrote: > On 10/1/2014 2:37 PM, Dimitry Andric wrote: > > On 01 Oct 2014, at 15:40, Larry Baird wrote: > >> Ryan, > >> > >> On Wed, Oct 01, 2014 at 12:46:35AM -0400, Ryan Stone wrote: > >>> This may not be a compiler bug. A quick look at the esp values > >>> provided in that backtrace shows that at least 7KB has been used on > >>> the stack. The stack for kernel threads is only 8KB, and a stack > >>> overflow can cause a double fault like that. > >>> > >>> My suspicion would be that without optimizations on clang uses a lot > >>> more stack space and you push over the limit. There's a kernel build > >>> option for the stack size that you could change to confirm. I believe > >>> that it's called KSTACK_PAGES. Try increasing it to 4. > >> Good catch. Increasing KSTACK_PAGES does fix the issue. I wonder with > >> optimization, how close to stack overflow does the kernel get during boot? > > > > It obviously depends on which optimization flags you use, which drivers > > you include, and so on. There was a thread some time ago about somebody > > banging into the limit when mounting certain ZFS filesystems, here: > > > > https://lists.freebsd.org/pipermail/freebsd-current/2012-December/038208.html > > > > This is why Kostik added printing of the frame addresses to the panic > > backtrace output, so you can easily see if you hit the stack limit. > > > > That said, 8k is not much these days, especially not with fairly > > complicated code like ZFS, combined with high optimization, which can > > inline a lot of functions, causing even more stack usage. I would just > > bump KSTACK_PAGES. > > > > -Dimitry > > > > Is this something that can be bumped in the tree for GENERIC? The cost of the increased size for kernel stack is significant, even on architectures with ample KVA. This must not be done just because some non-default kernel settings cause stack overflow. If somebody feels himself qualified enough to tune compiler options, it must understand the consequences and do other required adjustments, including kernel stack size tuning. FWIW, there was old reason why -O0 did not worked for the kernel. The cpufunc.h inlines are not provided in non-inline version, and at least gcc at -O0 level sometimes generated the call to nonexisting function, leading to linking failure. It is curious that clang always inlines at -O0, but it is possible, although unlikely, that kernel source was changed to be immune. From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 14:02:39 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8D72DD65 for ; Thu, 2 Oct 2014 14:02:39 +0000 (UTC) Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23]) by mx1.freebsd.org (Postfix) with ESMTP id 4051384A for ; Thu, 2 Oct 2014 14:02:38 +0000 (UTC) Received: (qmail 59208 invoked by uid 1000); 2 Oct 2014 14:02:32 -0000 Date: Thu, 2 Oct 2014 10:02:32 -0400 From: Larry Baird To: Konstantin Belousov Subject: Re: Kernel/Compiler bug Message-ID: <20141002140232.GA52387@gta.com> References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141002075537.GU26076@kib.kiev.ua> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "freebsd-hackers@freebsd.org" , Ryan Stone , Dimitry Andric , Bryan Drewery X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 14:02:39 -0000 > > Is this something that can be bumped in the tree for GENERIC? > > The cost of the increased size for kernel stack is significant, even > on architectures with ample KVA. This must not be done just because > some non-default kernel settings cause stack overflow. If somebody > feels himself qualified enough to tune compiler options, it must > understand the consequences and do other required adjustments, > including kernel stack size tuning. > > FWIW, there was old reason why -O0 did not worked for the kernel. > The cpufunc.h inlines are not provided in non-inline version, and > at least gcc at -O0 level sometimes generated the call to nonexisting > function, leading to linking failure. It is curious that clang always > inlines at -O0, but it is possible, although unlikely, that kernel > source was changed to be immune. Overall I aggree with your comments. The fact is that I have been using -O0 and -O1 on custom kernels for years. It makes using kgdb much more effective. Both optimization levels work for a custom kernel I have for FreeBSD 10.0 but do not work for FreeBSD 10.1. I just tried turning off optimization for a FreeBSD 10.0 release GENERIC kernel. Same issue. My concern is that opimized kernels may be close to the edge as well. Since people have been runing 10.0 for a while without issue, maybe me concern is unfounded. Anybody have any thoughts on how to instrument a kernel build option to check for maximum used stack depth? It would be nice to prove that my concern is unfounded. Larry -- ------------------------------------------------------------------------ Larry Baird Global Technology Associates, Inc. 1992-2012 | http://www.gta.com Celebrating Twenty Years of Software Innovation | Orlando, FL Email: lab@gta.com | TEL 407-380-0220 From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 14:33:54 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 30C5D75A; Thu, 2 Oct 2014 14:33:54 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C7BA1BBE; Thu, 2 Oct 2014 14:33:53 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s92EXlaF037366 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 2 Oct 2014 17:33:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s92EXlaF037366 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s92EXjoP037365; Thu, 2 Oct 2014 17:33:45 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 2 Oct 2014 17:33:45 +0300 From: Konstantin Belousov To: Larry Baird Subject: Re: Kernel/Compiler bug Message-ID: <20141002143345.GY26076@kib.kiev.ua> References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua> <20141002140232.GA52387@gta.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141002140232.GA52387@gta.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: "freebsd-hackers@freebsd.org" , Ryan Stone , Dimitry Andric , Bryan Drewery X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 14:33:54 -0000 On Thu, Oct 02, 2014 at 10:02:32AM -0400, Larry Baird wrote: > > > Is this something that can be bumped in the tree for GENERIC? > > > > The cost of the increased size for kernel stack is significant, even > > on architectures with ample KVA. This must not be done just because > > some non-default kernel settings cause stack overflow. If somebody > > feels himself qualified enough to tune compiler options, it must > > understand the consequences and do other required adjustments, > > including kernel stack size tuning. > > > > FWIW, there was old reason why -O0 did not worked for the kernel. > > The cpufunc.h inlines are not provided in non-inline version, and > > at least gcc at -O0 level sometimes generated the call to nonexisting > > function, leading to linking failure. It is curious that clang always > > inlines at -O0, but it is possible, although unlikely, that kernel > > source was changed to be immune. > > Overall I aggree with your comments. The fact is that I have been using > -O0 and -O1 on custom kernels for years. It makes using kgdb much more > effective. Both optimization levels work for a custom kernel I have for > FreeBSD 10.0 but do not work for FreeBSD 10.1. I just tried turning off > optimization for a FreeBSD 10.0 release GENERIC kernel. Same issue. My > concern is that opimized kernels may be close to the edge as well. Since > people have been runing 10.0 for a while without issue, maybe me concern > is unfounded. Anybody have any thoughts on how to instrument a kernel > build option to check for maximum used stack depth? It would be nice to > prove that my concern is unfounded. The easiest thing to do is to record the stack depth for kernel mode on entry into interrupt. Interrupt handlers are usually well written and do not consume a lot of stack. Look at the intr_event_handle(), which is the entry point. The mode can be deduced from trapframe passed. The kernel stack for the thread is described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages (size), so the top of the stack is at td_kstack + td_kstack_size [*]. The current stack consumption could be taken from reading %rsp register, or you may take the address of any local variable as well. * - there are pcb and usermode fpu save area at the top of the stack, and actual kernel stack top is right below fpu save area. This should not be important for your measurements, since you are looking at how close the %rsp gets to the bottom. From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 15:08:31 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2D40156E for ; Thu, 2 Oct 2014 15:08:31 +0000 (UTC) Received: from exchange.glccom.com (exchange.glccom.com [209.152.99.146]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "exchange.glccom.com", Issuer "Network Solutions DV Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EEC87F73 for ; Thu, 2 Oct 2014 15:08:30 +0000 (UTC) Received: from karen-pc.local.glccom.com (192.168.10.71) by exchange.glccom.com (209.152.99.146) with Microsoft SMTP Server (TLS) id 8.3.83.0; Thu, 2 Oct 2014 10:07:06 -0500 From: Paul Albrecht Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Subject: freebsd 10 kqueue timer regression Message-ID: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> Date: Thu, 2 Oct 2014 10:07:04 -0500 To: MIME-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-Mailer: Apple Mail (2.1878.6) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 15:08:31 -0000 Hi, What=92s up with freebsd 10? I=92m testing some code that uses the = kqueue timer for timing and it doesn=92t work because the precision of = the timer is off.= From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 15:10:41 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 90DB0866 for ; Thu, 2 Oct 2014 15:10:41 +0000 (UTC) Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com [IPv6:2607:f8b0:4001:c05::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5F5B1B5 for ; Thu, 2 Oct 2014 15:10:41 +0000 (UTC) Received: by mail-ig0-f171.google.com with SMTP id h15so2175934igd.4 for ; Thu, 02 Oct 2014 08:10:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=rQuiAGSD8ZoajlbtoQHJCQ69z3n47Z6lxvuNveFxn+Q=; b=YE1IMbTlSaKi4iSEfh6DNEipkNW+dAlCjAcbfL/RmcuSSYTx5jmfIws8Eby3Dmd2/0 dKFuIoiTYm4yBCQYv2fxCFUawo1eTEmMVzp2I8ASIz6e6MOLtgP8/lpKKE3IV/Pt3Vii hOpwIBKsn64yAvNEXb6AA5guRaMhwxmlg1sataB2j0MLk4vnzZBmePP6NWgP11HSUA3H Y+W3NW6aVlL01M6eHoBbxlZJZFEHa9KAoTiI0jjfxSinNPnvOR4SLsA71GwhCccYRlvo gFDcs+REAvQUfy3a/Wd9lJLSXFy+uR/Po9d0lUvaqbwyBkGSSNKvpmgqWpTxrNaz7OQ3 5zfA== X-Received: by 10.42.233.75 with SMTP id jx11mr5990979icb.22.1412262640695; Thu, 02 Oct 2014 08:10:40 -0700 (PDT) MIME-Version: 1.0 Sender: carpeddiem@gmail.com Received: by 10.107.44.196 with HTTP; Thu, 2 Oct 2014 08:10:20 -0700 (PDT) In-Reply-To: <20141002075537.GU26076@kib.kiev.ua> References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua> From: Ed Maste Date: Thu, 2 Oct 2014 11:10:20 -0400 X-Google-Sender-Auth: sW-PFH2ZYQAdAilBzlLv48p6vxY Message-ID: Subject: Re: Kernel/Compiler bug To: Konstantin Belousov , "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 15:10:41 -0000 On 2 October 2014 03:55, Konstantin Belousov wrote: > > The cost of the increased size for kernel stack is significant, even > on architectures with ample KVA. This must not be done just because > some non-default kernel settings cause stack overflow. If somebody > feels himself qualified enough to tune compiler options, it must > understand the consequences and do other required adjustments, > including kernel stack size tuning. I wonder if we should have a comment in kern.pre.mk, even if it's just an explicit notice that changing -O can have adverse effects. For better or worse it's a fairly common desire to try changing the kernel's -O. Of course, kern.pre.mk is not intended to accommodate user-facing changes. I suspect it's reasonably common for developers to grep '-O2' in sys/conf and discover where it's getting set though. From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 15:48:51 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D5F38640 for ; Thu, 2 Oct 2014 15:48:51 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B5E7E681 for ; Thu, 2 Oct 2014 15:48:51 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.9/8.14.9) with ESMTP id s92Fmp8k068946 for ; Thu, 2 Oct 2014 15:48:51 GMT (envelope-from bdrewery@freefall.freebsd.org) Received: (from bdrewery@localhost) by freefall.freebsd.org (8.14.9/8.14.9/Submit) id s92FmpVo068945 for freebsd-hackers@freebsd.org; Thu, 2 Oct 2014 15:48:51 GMT (envelope-from bdrewery) Received: (qmail 5099 invoked from network); 2 Oct 2014 10:48:46 -0500 Received: from unknown (HELO ?10.10.0.24?) (freebsd@shatow.net@10.10.0.24) by sweb.xzibition.com with ESMTPA; 2 Oct 2014 10:48:46 -0500 Message-ID: <542D73D3.9040109@FreeBSD.org> Date: Thu, 02 Oct 2014 10:48:35 -0500 From: Bryan Drewery Organization: FreeBSD User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Re: Kernel/Compiler bug References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua> <20141002140232.GA52387@gta.com> In-Reply-To: <20141002140232.GA52387@gta.com> OpenPGP: id=6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DlOo11nMxAEEGbmuWhmuxvn8p6c5Ufx60" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 15:48:52 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --DlOo11nMxAEEGbmuWhmuxvn8p6c5Ufx60 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 10/2/2014 9:02 AM, Larry Baird wrote: >>> Is this something that can be bumped in the tree for GENERIC? >> >> The cost of the increased size for kernel stack is significant, even >> on architectures with ample KVA. This must not be done just because=20 >> some non-default kernel settings cause stack overflow. If somebody >> feels himself qualified enough to tune compiler options, it must >> understand the consequences and do other required adjustments, >> including kernel stack size tuning. >> >> FWIW, there was old reason why -O0 did not worked for the kernel. >> The cpufunc.h inlines are not provided in non-inline version, and >> at least gcc at -O0 level sometimes generated the call to nonexisting >> function, leading to linking failure. It is curious that clang always= >> inlines at -O0, but it is possible, although unlikely, that kernel >> source was changed to be immune. >=20 > Overall I aggree with your comments. The fact is that I have been usi= ng > -O0 and -O1 on custom kernels for years. It makes using kgdb much more > effective. Both optimization levels work for a custom kernel I have fo= r > FreeBSD 10.0 but do not work for FreeBSD 10.1. I just tried turning off= > optimization for a FreeBSD 10.0 release GENERIC kernel. Same issue. My= > concern is that opimized kernels may be close to the edge as well. Sin= ce > people have been runing 10.0 for a while without issue, maybe me concer= n > is unfounded. Anybody have any thoughts on how to instrument a kernel > build option to check for maximum used stack depth? It would be nice to= > prove that my concern is unfounded. >=20 > Larry >=20 I think at the very least we should have a DISABLE_OPTIMIZATION option that sets -O0 and increases stack size. I built a kernel with -O0 some months ago and hit a panic on boot and did not look into why. It makes sense now though. It would have been nice if it were more obviously documented or automatically handled. --=20 Regards, Bryan Drewery --DlOo11nMxAEEGbmuWhmuxvn8p6c5Ufx60 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJULXPXAAoJEDXXcbtuRpfPMOwH/1zk51rg4TtQO3hfH0fQiAp6 eb9JC98X/PFukhkUd3byag2juvuP4hZ1loy6VFn8vE5t0+CriH1QhZk8R67bxDpC uY23gko7PnH+1YL6nsbw7PFJjqQIirtMklTnxecSgiURfKD8btoY+dH4EDjgjJsq 6ButnRgPo8Lz0K5H5JHyatBdUg3dhHk8O0k98HYgVtmcIGhioewW82XsB+2iWdNi mKOBvtD1NSObRByn/4GLNP6VSOPKU6Zh+BdfRofuMTynSQwdRpT+PjwgznQyLNRE cYz9UOCPvHQPa08kVlq6ssJSIH19vKOHIVbW4b/dZlF8kiQLlFr6VbILejYpaF0= =wNoz -----END PGP SIGNATURE----- --DlOo11nMxAEEGbmuWhmuxvn8p6c5Ufx60-- From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 16:38:07 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B2F11F03 for ; Thu, 2 Oct 2014 16:38:07 +0000 (UTC) Received: from exchange.glccom.com (exchange.glccom.com [209.152.99.146]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "exchange.glccom.com", Issuer "Network Solutions DV Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7F9F7D04 for ; Thu, 2 Oct 2014 16:38:07 +0000 (UTC) Received: from karen-pc.local.glccom.com (192.168.10.71) by exchange.glccom.com (209.152.99.146) with Microsoft SMTP Server (TLS) id 8.3.83.0; Thu, 2 Oct 2014 11:35:59 -0500 From: Paul Albrecht Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Subject: What happened to the kqueue timer fix? Message-ID: <80825686-58E8-4042-96C8-B86818F1E138@glccom.com> Date: Thu, 2 Oct 2014 11:35:58 -0500 To: MIME-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-Mailer: Apple Mail (2.1878.6) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 16:38:07 -0000 I asked about this problem a while back a got a fix. Here=92s a link to = the relevant freebsd-hackers list thread: = http://lists.freebsd.org/pipermail/freebsd-hackers/2012-July/039907.html= From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 17:18:45 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 88C7D250 for ; Thu, 2 Oct 2014 17:18:45 +0000 (UTC) Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com [IPv6:2a00:1450:400c:c00::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 23009230 for ; Thu, 2 Oct 2014 17:18:44 +0000 (UTC) Received: by mail-wg0-f46.google.com with SMTP id l18so929145wgh.29 for ; Thu, 02 Oct 2014 10:18:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=A0aeVpkE/JUSi89V/LNMJlYXGf8tdWFzjoGLZE/wUcI=; b=x15e9Av+4tij0skVDuYDcGRAD6AQSnhdYEHMnN5bCZlJmiTdB4Hw+CIAMiSv47MSQa hZ/9mNWU7RZiu+jdYlHC3Xt7/nqXEjPWahGJt/Gt/YXxLA1yCW8VgWzlLLK2o4HQ4D7M 4bXJ46sNz+DNbXHJ0nYUzNxXGwBIEgnJJ25qpjU7j0Ew8Hp/jD6AaAduWcQWKS2p/7e5 FfKHmpneCwHrAZ9xGZQhzihJ5atbs8lWDkX6ygNq37+6Oj6aDwKDO5sZOIOaPLEH8eJX o4LezUWNMXUffJLojvvxxR/bObo9CnjlGgF/KI3NaewsXYjEdcEDE/R/kv3xDwd83g1m EiOA== MIME-Version: 1.0 X-Received: by 10.194.177.226 with SMTP id ct2mr443250wjc.20.1412270323410; Thu, 02 Oct 2014 10:18:43 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.216.106.136 with HTTP; Thu, 2 Oct 2014 10:18:43 -0700 (PDT) In-Reply-To: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> Date: Thu, 2 Oct 2014 10:18:43 -0700 X-Google-Sender-Auth: -KkcerpE5_vTv5dqcJl_P5KClsM Message-ID: Subject: Re: freebsd 10 kqueue timer regression From: Adrian Chadd To: Paul Albrecht Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 17:18:45 -0000 On 2 October 2014 08:07, Paul Albrecht wrote: > > Hi, > > What=E2=80=99s up with freebsd 10? I=E2=80=99m testing some code that use= s the kqueue timer for timing and it doesn=E2=80=99t work because the preci= sion of the timer is off. Can you provide a test case for it? I just chased down one of those recently; maybe it's the same thing (callout() API changes.) -a From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 17:28:50 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C2AED922; Thu, 2 Oct 2014 17:28:50 +0000 (UTC) Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 952283A5; Thu, 2 Oct 2014 17:28:49 +0000 (UTC) Received: from [73.34.117.227] (helo=ilsoft.org) by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1XZkBG-000COj-Ly; Thu, 02 Oct 2014 17:28:42 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id s92HSfks019546; Thu, 2 Oct 2014 11:28:41 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 73.34.117.227 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX184q0nZxJ/ydENIXYKy40Pa X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan [172.22.42.240] claimed to be [172.22.42.240] Subject: Re: freebsd 10 kqueue timer regression From: Ian Lepore To: Adrian Chadd In-Reply-To: References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> Content-Type: text/plain; charset="iso-8859-7" Date: Thu, 02 Oct 2014 11:28:40 -0600 Message-ID: <1412270920.12052.3.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by ilsoft.org id s92HSfks019546 Cc: "freebsd-hackers@freebsd.org" , Paul Albrecht X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 17:28:50 -0000 On Thu, 2014-10-02 at 10:18 -0700, Adrian Chadd wrote: > On 2 October 2014 08:07, Paul Albrecht wrote: > > > > Hi, > > > > What=A2s up with freebsd 10? I=A2m testing some code that uses the kq= ueue timer for timing and it doesn=A2t work because the precision of the = timer is off. >=20 > Can you provide a test case for it? >=20 > I just chased down one of those recently; maybe it's the same thing > (callout() API changes.) >=20 The old mail thread he cited contains test code: http://lists.freebsd.org/pipermail/freebsd-hackers/2012-July/039907.html -- Ian From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 17:51:58 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C0E872D4; Thu, 2 Oct 2014 17:51:58 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 95E168CD; Thu, 2 Oct 2014 17:51:58 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s92HpuTd078739 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 2 Oct 2014 10:51:57 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s92HpuKv078738; Thu, 2 Oct 2014 10:51:56 -0700 (PDT) (envelope-from jmg) Date: Thu, 2 Oct 2014 10:51:56 -0700 From: John-Mark Gurney To: Konstantin Belousov Subject: Re: Kernel/Compiler bug Message-ID: <20141002175156.GM43300@funkthat.com> Mail-Followup-To: Konstantin Belousov , Larry Baird , "freebsd-hackers@freebsd.org" , Ryan Stone , Dimitry Andric , Bryan Drewery References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua> <20141002140232.GA52387@gta.com> <20141002143345.GY26076@kib.kiev.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141002143345.GY26076@kib.kiev.ua> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Thu, 02 Oct 2014 10:51:57 -0700 (PDT) Cc: Dimitry Andric , "freebsd-hackers@freebsd.org" , Ryan Stone , Larry Baird , Bryan Drewery X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 17:51:59 -0000 Konstantin Belousov wrote this message on Thu, Oct 02, 2014 at 17:33 +0300: > On Thu, Oct 02, 2014 at 10:02:32AM -0400, Larry Baird wrote: > > > > Is this something that can be bumped in the tree for GENERIC? > > > > > > The cost of the increased size for kernel stack is significant, even > > > on architectures with ample KVA. This must not be done just because > > > some non-default kernel settings cause stack overflow. If somebody > > > feels himself qualified enough to tune compiler options, it must > > > understand the consequences and do other required adjustments, > > > including kernel stack size tuning. > > > > > > FWIW, there was old reason why -O0 did not worked for the kernel. > > > The cpufunc.h inlines are not provided in non-inline version, and > > > at least gcc at -O0 level sometimes generated the call to nonexisting > > > function, leading to linking failure. It is curious that clang always > > > inlines at -O0, but it is possible, although unlikely, that kernel > > > source was changed to be immune. > > > > Overall I aggree with your comments. The fact is that I have been using > > -O0 and -O1 on custom kernels for years. It makes using kgdb much more > > effective. Both optimization levels work for a custom kernel I have for > > FreeBSD 10.0 but do not work for FreeBSD 10.1. I just tried turning off > > optimization for a FreeBSD 10.0 release GENERIC kernel. Same issue. My > > concern is that opimized kernels may be close to the edge as well. Since > > people have been runing 10.0 for a while without issue, maybe me concern > > is unfounded. Anybody have any thoughts on how to instrument a kernel > > build option to check for maximum used stack depth? It would be nice to > > prove that my concern is unfounded. > > The easiest thing to do is to record the stack depth for kernel mode > on entry into interrupt. Interrupt handlers are usually well written > and do not consume a lot of stack. > > Look at the intr_event_handle(), which is the entry point. The mode can > be deduced from trapframe passed. The kernel stack for the thread is > described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages > (size), so the top of the stack is at td_kstack + td_kstack_size [*]. > The current stack consumption could be taken from reading %rsp register, > or you may take the address of any local variable as well. > > * - there are pcb and usermode fpu save area at the top of the stack, and > actual kernel stack top is right below fpu save area. This should not > be important for your measurements, since you are looking at how close > the %rsp gets to the bottom. There once was a script that would print out stack usage for each function in the kernel... This could help identify functions that use too much stack... I poked around in tools, but couldn't find it.. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 18:13:52 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AA0A8491; Thu, 2 Oct 2014 18:13:52 +0000 (UTC) Received: from exchange.glccom.com (exchange.glccom.com [209.152.99.146]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "exchange.glccom.com", Issuer "Network Solutions DV Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5FB02B6B; Thu, 2 Oct 2014 18:13:51 +0000 (UTC) Received: from karen-pc.local.glccom.com (192.168.10.71) by exchange.glccom.com (209.152.99.146) with Microsoft SMTP Server (TLS) id 8.3.83.0; Thu, 2 Oct 2014 13:13:38 -0500 MIME-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: freebsd 10 kqueue timer regression From: Paul Albrecht In-Reply-To: Date: Thu, 2 Oct 2014 13:13:36 -0500 Message-ID: <8587D819-AA2F-4387-A4E9-523014384672@glccom.com> References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> To: Adrian Chadd X-Mailer: Apple Mail (2.1878.6) Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 18:13:52 -0000 On Oct 2, 2014, at 12:18 PM, Adrian Chadd wrote: > On 2 October 2014 08:07, Paul Albrecht wrote: >>=20 >> Hi, >>=20 >> What=92s up with freebsd 10? I=92m testing some code that uses the = kqueue timer for timing and it doesn=92t work because the precision of = the timer is off. >=20 > Can you provide a test case for it? Here=92s the code: #include #include #include #include #include #include #include #include int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq =3D kqueue()) =3D=3D -1) { fprintf(stderr, "kqueue error!? errno =3D %s", = strerror(errno)); exit(EXIT_FAILURE); } EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(&start, 0); for (i =3D 0; i < 50; i++) { if ((nev =3D kevent(kq, &inqueue, 1, &outqueue, 1, = NULL)) =3D=3D -1) { fprintf(stderr, "kevent error!? errno =3D %s", = strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags & EV_ERROR) { fprintf(stderr, "EV_ERROR: %s\n", = strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(&end, 0); msec =3D ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 + = end.tv_usec - start.tv_usec) / 1000) - 1000); printf("msec =3D %d\n", msec); close(kq); return EXIT_SUCCESS; } When I run it on my system I get these results: ./a.out msec =3D 1072 ./a.out msec =3D 1071 ./a.out msec =3D 1071 Which is over about 3.5 times the wait time per second. >=20 > I just chased down one of those recently; maybe it's the same thing > (callout() API changes.) >=20 >=20 > -a From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 19:42:41 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1AA02A3 for ; Thu, 2 Oct 2014 19:42:41 +0000 (UTC) Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com [IPv6:2a00:1450:400c:c00::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A93AF827 for ; Thu, 2 Oct 2014 19:42:40 +0000 (UTC) Received: by mail-wg0-f46.google.com with SMTP id l18so1245952wgh.29 for ; Thu, 02 Oct 2014 12:42:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=z8Gjy26b2CKg3+TWhoREzfBDFEGGVL+SUgvqozwiyhE=; b=RDhbVZNUovkijEBLo1LT4ZaKZjrr2DXYkBqvq0NDDrRu28f2fy9U66bs+56LGNJYdZ prtdhPh9Jd7/UTLeBMj9NQsaA2mzpEUFHmE8es21Q4hBJSOFzHMVZaeYi/T8Y28HdGCQ NyjjlzJiEGVASJaiyN2F7f/ysXElwcvb9gY08ZwbaFnZlt6Jnk2m+Nt7ZevL044tsbD8 c6xp8Bg2bgUybOrHxeRoVAnEgOlJlKAuLa4Go07SHvWm6Y5nbhbc0mwBN0mOB6wJjLud KyQhZTJRAMZvk88Wi76GGjbyVZFdvPnNbntdlAauXJpHllZCs1Uw/2vVvJFBR61+dV+w sjiA== MIME-Version: 1.0 X-Received: by 10.194.202.138 with SMTP id ki10mr1331291wjc.68.1412278959001; Thu, 02 Oct 2014 12:42:39 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.216.106.136 with HTTP; Thu, 2 Oct 2014 12:42:38 -0700 (PDT) In-Reply-To: <8587D819-AA2F-4387-A4E9-523014384672@glccom.com> References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> <8587D819-AA2F-4387-A4E9-523014384672@glccom.com> Date: Thu, 2 Oct 2014 12:42:38 -0700 X-Google-Sender-Auth: 0bIwtvTWfQ-Olm2GjOTW7wmlS5Y Message-ID: Subject: Re: freebsd 10 kqueue timer regression From: Adrian Chadd To: Paul Albrecht Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 19:42:41 -0000 Right, and jhb@ mentioned callout() and related stuff. Let me take a look. I bet it's not doing things "right". -a On 2 October 2014 11:13, Paul Albrecht wrote: > > On Oct 2, 2014, at 12:18 PM, Adrian Chadd wrote: > > On 2 October 2014 08:07, Paul Albrecht wrote: > > > Hi, > > What=E2=80=99s up with freebsd 10? I=E2=80=99m testing some code that use= s the kqueue timer > for timing and it doesn=E2=80=99t work because the precision of the timer= is off. > > > Can you provide a test case for it? > > > Here=E2=80=99s the code: > > #include > #include > #include > #include > #include > #include > #include > #include > > int > main(void) > { > int i,msec; > int kq,nev; > struct kevent inqueue; > struct kevent outqueue; > struct timeval start,end; > > if ((kq =3D kqueue()) =3D=3D -1) { > fprintf(stderr, "kqueue error!? errno =3D %s", > strerror(errno)); > exit(EXIT_FAILURE); > } > EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); > > gettimeofday(&start, 0); > for (i =3D 0; i < 50; i++) { > if ((nev =3D kevent(kq, &inqueue, 1, &outqueue, 1, NULL))= =3D=3D > -1) { > fprintf(stderr, "kevent error!? errno =3D %s", > strerror(errno)); > exit(EXIT_FAILURE); > } else if (outqueue.flags & EV_ERROR) { > fprintf(stderr, "EV_ERROR: %s\n", > strerror(outqueue.data)); > exit(EXIT_FAILURE); > } > } > gettimeofday(&end, 0); > > msec =3D ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 + > end.tv_usec - start.tv_usec) / 1000) - 1000); > > printf("msec =3D %d\n", msec); > > close(kq); > return EXIT_SUCCESS; > } > > When I run it on my system I get these results: > > ./a.out > msec =3D 1072 > ./a.out > msec =3D 1071 > ./a.out > msec =3D 1071 > > Which is over about 3.5 times the wait time per second. > > > > I just chased down one of those recently; maybe it's the same thing > (callout() API changes.) > > > -a > > From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 19:47:30 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 157E621A for ; Thu, 2 Oct 2014 19:47:30 +0000 (UTC) Received: from mail-wg0-x22c.google.com (mail-wg0-x22c.google.com [IPv6:2a00:1450:400c:c00::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A40FB86D for ; Thu, 2 Oct 2014 19:47:29 +0000 (UTC) Received: by mail-wg0-f44.google.com with SMTP id y10so4072329wgg.3 for ; Thu, 02 Oct 2014 12:47:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=9jDOrJzOz5mQQdPz44lMLmbvyRzG05ua41sSHeW4nN4=; b=eHgBvQQI6mcmt5/TlHDxeE59lTWbnvTvN6L66zWWbx8v8nWu/vLjw7s69OGZXQ7H8l NWI7frkjrW1o5pKM4f6p1hmvMD0NuJxEJOAdQdBuNZNLNsrCY54AmdR/d9xQ6cOejmv6 PiOaxTjuRhSMMee+ebWa2Or8OhtiJZKYvctU2TjLGiY+nJpQNC/1ADkM5YkvGMFlIjY7 gn4HlD+sHyJez57oWQkdcr8GAGkwZPDIR0QIj5ghWDN5tMAtWzW9tkY47ReQoriXWm1i i4REj4mIYJ3+Hwk8aUy1N1pTPQiU/4giQcCuzYQeviYQvkkJii+jxR+p89gos2ev/IeT u3Gw== MIME-Version: 1.0 X-Received: by 10.180.74.203 with SMTP id w11mr6953561wiv.26.1412279248013; Thu, 02 Oct 2014 12:47:28 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.216.106.136 with HTTP; Thu, 2 Oct 2014 12:47:27 -0700 (PDT) In-Reply-To: References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> <8587D819-AA2F-4387-A4E9-523014384672@glccom.com> Date: Thu, 2 Oct 2014 12:47:27 -0700 X-Google-Sender-Auth: fUnUrAZBtFheyBVcIIbks7rIbtQ Message-ID: Subject: Re: freebsd 10 kqueue timer regression From: Adrian Chadd To: Paul Albrecht Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 19:47:30 -0000 I'm confused; it's doing 50 loops of a 20msec timer, right? So that's 1000ms. -a From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 19:53:36 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3F65A49B; Thu, 2 Oct 2014 19:53:36 +0000 (UTC) Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 11181945; Thu, 2 Oct 2014 19:53:35 +0000 (UTC) Received: from [73.34.117.227] (helo=ilsoft.org) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1XZmRN-000IkX-G5; Thu, 02 Oct 2014 19:53:29 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id s92JrSJY019832; Thu, 2 Oct 2014 13:53:28 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 73.34.117.227 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX19t8LOfVzClEo9w+8sK1wFp X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan [172.22.42.240] claimed to be [172.22.42.240] Subject: Re: freebsd 10 kqueue timer regression From: Ian Lepore To: Adrian Chadd In-Reply-To: References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> <8587D819-AA2F-4387-A4E9-523014384672@glccom.com> Content-Type: text/plain; charset="us-ascii" Date: Thu, 02 Oct 2014 13:53:28 -0600 Message-ID: <1412279608.12052.24.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: "freebsd-hackers@freebsd.org" , Paul Albrecht X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 19:53:36 -0000 On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote: > I'm confused; it's doing 50 loops of a 20msec timer, right? So that's 1000ms. Yes, so the entire loop should take 1000ms maybe + 1ms. Instead it takes 1070. When I run it on an armv6 system running -current it takes 1050. When I run it on my 8.4 desktop (pre-eventtimers) it takes 1013. -- Ian From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 21:21:35 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B5B55C2; Thu, 2 Oct 2014 21:21:35 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8E24D35C; Thu, 2 Oct 2014 21:21:35 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2870BB91F; Thu, 2 Oct 2014 17:21:34 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: freebsd 10 kqueue timer regression Date: Thu, 2 Oct 2014 16:00:17 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> <1412279608.12052.24.camel@revolution.hippie.lan> In-Reply-To: <1412279608.12052.24.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201410021600.17740.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 02 Oct 2014 17:21:34 -0400 (EDT) Cc: Adrian Chadd , Paul Albrecht , Ian Lepore X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 21:21:35 -0000 On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote: > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote: > > I'm confused; it's doing 50 loops of a 20msec timer, right? So that's 1000ms. > > Yes, so the entire loop should take 1000ms maybe + 1ms. Instead it > takes 1070. When I run it on an armv6 system running -current it takes > 1050. When I run it on my 8.4 desktop (pre-eventtimers) it takes 1013. > > -- Ian What if you set kern.eventtimer.periodic=1? -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 2 22:15:10 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A4222E06; Thu, 2 Oct 2014 22:15:10 +0000 (UTC) Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 75DCCAB5; Thu, 2 Oct 2014 22:15:09 +0000 (UTC) Received: from [73.34.117.227] (helo=ilsoft.org) by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1XZoeR-0004W8-SO; Thu, 02 Oct 2014 22:15:08 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id s92MF634020039; Thu, 2 Oct 2014 16:15:06 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 73.34.117.227 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX192pQr6xDvLmqfkPHo4BYGb X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan [172.22.42.240] claimed to be [172.22.42.240] Subject: Re: freebsd 10 kqueue timer regression From: Ian Lepore To: John Baldwin In-Reply-To: <201410021600.17740.jhb@freebsd.org> References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> <1412279608.12052.24.camel@revolution.hippie.lan> <201410021600.17740.jhb@freebsd.org> Content-Type: text/plain; charset="us-ascii" Date: Thu, 02 Oct 2014 16:15:06 -0600 Message-ID: <1412288106.12052.39.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, Adrian Chadd , Paul Albrecht X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2014 22:15:10 -0000 On Thu, 2014-10-02 at 16:00 -0400, John Baldwin wrote: > On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote: > > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote: > > > I'm confused; it's doing 50 loops of a 20msec timer, right? So that's > 1000ms. > > > > Yes, so the entire loop should take 1000ms maybe + 1ms. Instead it > > takes 1070. When I run it on an armv6 system running -current it takes > > 1050. When I run it on my 8.4 desktop (pre-eventtimers) it takes 1013. > > > > -- Ian > > What if you set kern.eventtimer.periodic=1? > Some interesting results... HZ 100 500 1000 --------------------------------- periodic=0 1050 1050 1080 periodic=1 1110 1012 1049 The 1080 number was +/- 3ms, all the other numbers were +/- 1ms (except for one outlier of 24363 at 100Hz non-periodic which I'm going to pretend didn't happen). The 1050 numbers are probably each 20ms sleep actually taking 21ms, but the old tvtohz code with -1 adjustments from the old email thread isn't in play anymore. I don't know how to account for the other numbers at all. There's all kinds of stuff I don't understand in the new code involving tick thresholds and such. -- Ian From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 3 00:49:53 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 88E6BEBF; Fri, 3 Oct 2014 00:49:53 +0000 (UTC) Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 45670AE2; Fri, 3 Oct 2014 00:49:52 +0000 (UTC) Received: from [73.34.117.227] (helo=ilsoft.org) by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1XZr4B-0007Lv-KM; Fri, 03 Oct 2014 00:49:51 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id s930nofc020256; Thu, 2 Oct 2014 18:49:50 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 73.34.117.227 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX19n/udeWeX6r3o/g/NlB5ed X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan [172.22.42.240] claimed to be [172.22.42.240] Subject: Re: freebsd 10 kqueue timer regression From: Ian Lepore To: John Baldwin In-Reply-To: <1412288106.12052.39.camel@revolution.hippie.lan> References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> <1412279608.12052.24.camel@revolution.hippie.lan> <201410021600.17740.jhb@freebsd.org> <1412288106.12052.39.camel@revolution.hippie.lan> Content-Type: multipart/mixed; boundary="=-5ZH0QgKRoICHPsi7fRHH" Date: Thu, 02 Oct 2014 18:49:49 -0600 Message-ID: <1412297389.12052.46.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Cc: freebsd-hackers@freebsd.org, Adrian Chadd , Paul Albrecht X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Oct 2014 00:49:53 -0000 --=-5ZH0QgKRoICHPsi7fRHH Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Thu, 2014-10-02 at 16:15 -0600, Ian Lepore wrote: > On Thu, 2014-10-02 at 16:00 -0400, John Baldwin wrote: > > On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote: > > > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote: > > > > I'm confused; it's doing 50 loops of a 20msec timer, right? So that's > > 1000ms. > > > > > > Yes, so the entire loop should take 1000ms maybe + 1ms. Instead it > > > takes 1070. When I run it on an armv6 system running -current it takes > > > 1050. When I run it on my 8.4 desktop (pre-eventtimers) it takes 1013. > > > > > > -- Ian > > > > What if you set kern.eventtimer.periodic=1? > > > > Some interesting results... > > HZ 100 500 1000 > --------------------------------- > periodic=0 1050 1050 1080 > periodic=1 1110 1012 1049 > > > The 1080 number was +/- 3ms, all the other numbers were +/- 1ms (except > for one outlier of 24363 at 100Hz non-periodic which I'm going to > pretend didn't happen). > > The 1050 numbers are probably each 20ms sleep actually taking 21ms, but > the old tvtohz code with -1 adjustments from the old email thread isn't > in play anymore. I don't know how to account for the other numbers at > all. There's all kinds of stuff I don't understand in the new code > involving tick thresholds and such. > > -- Ian > The attached patch seems to fix the problem in what I think is the most correct way: scheduling the callout with absolute times based on the time the current event was scheduled for plus the requested interval. The net effect should be metronomic events that do not drift (or phase shift if you prefer) over time, regardless of any latency involved in processing the events. This makes all the numbers in the tests I ran above come out 1000. It doesn't make me understand the strange results from the prior tests any better. -- Ian --=-5ZH0QgKRoICHPsi7fRHH Content-Disposition: inline; filename="kevent_timer_fix.diff" Content-Type: text/x-patch; name="kevent_timer_fix.diff"; charset="us-ascii" Content-Transfer-Encoding: 7bit Index: sys/sys/event.h =================================================================== --- sys/sys/event.h (revision 272181) +++ sys/sys/event.h (working copy) @@ -221,6 +221,7 @@ struct knote { struct proc *p_proc; /* proc pointer */ struct aiocblist *p_aio; /* AIO job pointer */ struct aioliojob *p_lio; /* LIO job pointer */ + sbintime_t *p_nexttime; /* next timer event fires at */ void *p_v; /* generic other pointer */ } kn_ptr; struct filterops *kn_fop; Index: sys/kern/kern_event.c =================================================================== --- sys/kern/kern_event.c (revision 272181) +++ sys/kern/kern_event.c (working copy) @@ -569,9 +569,10 @@ filt_timerexpire(void *knx) if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) { calloutp = (struct callout *)kn->kn_hook; - callout_reset_sbt_on(calloutp, - timer2sbintime(kn->kn_sdata, kn->kn_sfflags), 0, - filt_timerexpire, kn, PCPU_GET(cpuid), 0); + *kn->kn_ptr.p_nexttime += timer2sbintime(kn->kn_sdata, + kn->kn_sfflags); + callout_reset_sbt_on(calloutp, *kn->kn_ptr.p_nexttime, 0, + filt_timerexpire, kn, PCPU_GET(cpuid), C_ABSOLUTE); } } @@ -607,11 +608,13 @@ filt_timerattach(struct knote *kn) kn->kn_flags |= EV_CLEAR; /* automatically set */ kn->kn_status &= ~KN_DETACHED; /* knlist_add clears it */ + kn->kn_ptr.p_nexttime = malloc(sizeof(sbintime_t), M_KQUEUE, M_WAITOK); calloutp = malloc(sizeof(*calloutp), M_KQUEUE, M_WAITOK); callout_init(calloutp, CALLOUT_MPSAFE); kn->kn_hook = calloutp; - callout_reset_sbt_on(calloutp, to, 0, - filt_timerexpire, kn, PCPU_GET(cpuid), 0); + *kn->kn_ptr.p_nexttime = to + sbinuptime(); + callout_reset_sbt_on(calloutp, *kn->kn_ptr.p_nexttime, 0, + filt_timerexpire, kn, PCPU_GET(cpuid), C_ABSOLUTE); return (0); } @@ -625,6 +628,7 @@ filt_timerdetach(struct knote *kn) calloutp = (struct callout *)kn->kn_hook; callout_drain(calloutp); free(calloutp, M_KQUEUE); + free(kn->kn_ptr.p_nexttime, M_KQUEUE); old = atomic_fetch_sub_explicit(&kq_ncallouts, 1, memory_order_relaxed); KASSERT(old > 0, ("Number of callouts cannot become negative")); kn->kn_status |= KN_DETACHED; /* knlist_remove sets it */ --=-5ZH0QgKRoICHPsi7fRHH-- From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 3 01:54:58 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id ED065E58 for ; Fri, 3 Oct 2014 01:54:58 +0000 (UTC) Received: from mailgate.gta.com (mailgate.gta.com [199.120.225.23]) by mx1.freebsd.org (Postfix) with ESMTP id 9E7C411E for ; Fri, 3 Oct 2014 01:54:58 +0000 (UTC) Received: (qmail 27664 invoked by uid 1000); 3 Oct 2014 01:54:56 -0000 Date: Thu, 2 Oct 2014 21:54:56 -0400 From: Larry Baird To: Konstantin Belousov Subject: Re: Kernel/Compiler bug Message-ID: <20141003015456.GA27080@gta.com> References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua> <20141002140232.GA52387@gta.com> <20141002143345.GY26076@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141002143345.GY26076@kib.kiev.ua> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "freebsd-hackers@freebsd.org" , Ryan Stone , Dimitry Andric , Bryan Drewery X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Oct 2014 01:54:59 -0000 > The easiest thing to do is to record the stack depth for kernel mode > on entry into interrupt. Interrupt handlers are usually well written > and do not consume a lot of stack. > > Look at the intr_event_handle(), which is the entry point. The mode can > be deduced from trapframe passed. The kernel stack for the thread is > described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages > (size), so the top of the stack is at td_kstack + td_kstack_size [*]. > The current stack consumption could be taken from reading %rsp register, > or you may take the address of any local variable as well. > > * - there are pcb and usermode fpu save area at the top of the stack, and > actual kernel stack top is right below fpu save area. This should not > be important for your measurements, since you are looking at how close > the %rsp gets to the bottom. This idea worked very well. Booting a GENERIC 10.1-BETA3 kernel I get a maximum stack used of 5247 bytes. This was with a minimal virtual box configuration. It would be interesting to hear about users with more exotic hardware and or configurations. Not sure if I have the KASSERT correct. Index: kern_intr.c =================================================================== --- kern_intr.c (revision 44897) +++ kern_intr.c (working copy) @@ -1386,6 +1386,12 @@ } } +static int max_kern_thread_stack; + +SYSCTL_INT(_kern, OID_AUTO, max_kern_thread_stack, CTLFLAG_RD, + &max_kern_thread_stack, 0, + "Maxiumum stack used by a kernel thread"); + /* * Main interrupt handling body. * @@ -1407,6 +1413,22 @@ td = curthread; + /* + * Check for maximum stack used bya kernel thread. + */ + if (!TRAPF_USERMODE(frame)) { + char *top = (char *)(td->td_kstack + td->td_kstack_pages * + PAGE_SIZE - 1); + char *current = (char *)&ih; + int used = top - current; + + if (used > max_kern_thread_stack) { + max_kern_thread_stack = used; + KASSERT(max_kern_thread_stack < KSTACK_PAGES * PAGE_SIZE, + "Maximum kernel thread stack exxceeded"); + } + } + /* An interrupt with no event or handlers is a stray interrupt. */ if (ie == NULL || TAILQ_EMPTY(&ie->ie_handlers)) return (EINVAL); -- ------------------------------------------------------------------------ Larry Baird Global Technology Associates, Inc. 1992-2012 | http://www.gta.com Celebrating Twenty Years of Software Innovation | Orlando, FL Email: lab@gta.com | TEL 407-380-0220 From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 3 07:35:23 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6DA0A14C; Fri, 3 Oct 2014 07:35:23 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EC4B5263; Fri, 3 Oct 2014 07:35:22 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s937ZH5w065146 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 3 Oct 2014 10:35:17 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s937ZH5w065146 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s937ZHWC065143; Fri, 3 Oct 2014 10:35:17 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 3 Oct 2014 10:35:17 +0300 From: Konstantin Belousov To: Larry Baird Subject: Re: Kernel/Compiler bug Message-ID: <20141003073517.GC26076@kib.kiev.ua> References: <20141001031553.GA14360@gta.com> <20141001134044.GA57022@gta.com> <542C8C75.30007@FreeBSD.org> <20141002075537.GU26076@kib.kiev.ua> <20141002140232.GA52387@gta.com> <20141002143345.GY26076@kib.kiev.ua> <20141003015456.GA27080@gta.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141003015456.GA27080@gta.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: "freebsd-hackers@freebsd.org" , Ryan Stone , Dimitry Andric , Bryan Drewery X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Oct 2014 07:35:23 -0000 On Thu, Oct 02, 2014 at 09:54:56PM -0400, Larry Baird wrote: > > The easiest thing to do is to record the stack depth for kernel mode > > on entry into interrupt. Interrupt handlers are usually well written > > and do not consume a lot of stack. > > > > Look at the intr_event_handle(), which is the entry point. The mode can > > be deduced from trapframe passed. The kernel stack for the thread is > > described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages > > (size), so the top of the stack is at td_kstack + td_kstack_size [*]. > > The current stack consumption could be taken from reading %rsp register, > > or you may take the address of any local variable as well. > > > > * - there are pcb and usermode fpu save area at the top of the stack, and > > actual kernel stack top is right below fpu save area. This should not > > be important for your measurements, since you are looking at how close > > the %rsp gets to the bottom. > > This idea worked very well. Booting a GENERIC 10.1-BETA3 kernel I get a > maximum stack used of 5247 bytes. This was with a minimal virtual box > configuration. It would be interesting to hear about users with more exotic > hardware and or configurations. Not sure if I have the KASSERT correct. > I have several notes. Mostly, it comes from my desire to make the patch committable. A global one is that the profiling of the stack use should be hidden under some kernel config option. > > > Index: kern_intr.c > =================================================================== > --- kern_intr.c (revision 44897) > +++ kern_intr.c (working copy) > @@ -1386,6 +1386,12 @@ > } > } > > +static int max_kern_thread_stack; Add 'usage' somewhere in the name of the var and sysctl ? > + > +SYSCTL_INT(_kern, OID_AUTO, max_kern_thread_stack, CTLFLAG_RD, > + &max_kern_thread_stack, 0, > + "Maxiumum stack used by a kernel thread"); > + > /* > * Main interrupt handling body. > * > @@ -1407,6 +1413,22 @@ > > td = curthread; > > + /* > + * Check for maximum stack used bya kernel thread. > + */ > + if (!TRAPF_USERMODE(frame)) { Just a note, the test for interruption of the usermode is not strictly needed, it only optimizes the execution, since interrupt from usermode would have only the trap frame on the stack. Might be, this should be commented. > + char *top = (char *)(td->td_kstack + td->td_kstack_pages * > + PAGE_SIZE - 1); > + char *current = (char *)&ih; Use the address of top ? It should be deeper in the stack, and account for the normal current function stack use, assuming compiler did not flatten out the frame. Anyway, it assumes that the stack grows down. Also, there are some situations, where the hardware might switch to dedicated stack for interrupt handling. It is impossible right now on amd64 and hw interrupts, but might become used in future, or on other arches. It makes sense to check that current value falls into the td stack region, before using it. > + int used = top - current; > + > + if (used > max_kern_thread_stack) { > + max_kern_thread_stack = used; This should be a loop with atomic cas, to not loose update from other thread. > + KASSERT(max_kern_thread_stack < KSTACK_PAGES * PAGE_SIZE, > + "Maximum kernel thread stack exxceeded"); Assert is not needed, we put a guard page below the thread stack to catch the overflow. You have seen it already with double fault on x86. > + } > + } > + > /* An interrupt with no event or handlers is a stray interrupt. */ > if (ie == NULL || TAILQ_EMPTY(&ie->ie_handlers)) > return (EINVAL); From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 3 14:04:17 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BE9A8AB9; Fri, 3 Oct 2014 14:04:17 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 94E33118; Fri, 3 Oct 2014 14:04:17 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7E2BCB9B0; Fri, 3 Oct 2014 10:04:16 -0400 (EDT) From: John Baldwin To: Ian Lepore Subject: Re: freebsd 10 kqueue timer regression Date: Fri, 03 Oct 2014 08:50:12 -0400 Message-ID: <2499075.KMdpQjyIZI@ralph.baldwin.cx> User-Agent: KMail/4.12.5 (FreeBSD/10.1-BETA2; KDE/4.12.5; amd64; ; ) In-Reply-To: <1412297389.12052.46.camel@revolution.hippie.lan> References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> <1412288106.12052.39.camel@revolution.hippie.lan> <1412297389.12052.46.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 03 Oct 2014 10:04:16 -0400 (EDT) Cc: freebsd-hackers@freebsd.org, Adrian Chadd , Paul Albrecht X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Oct 2014 14:04:17 -0000 On Thursday, October 02, 2014 06:49:49 PM Ian Lepore wrote: > On Thu, 2014-10-02 at 16:15 -0600, Ian Lepore wrote: > > On Thu, 2014-10-02 at 16:00 -0400, John Baldwin wrote: > > > On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote: > > > > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote: > > > > > I'm confused; it's doing 50 loops of a 20msec timer, right? So > > > > > that's > > > > > > 1000ms. > > > > > > > Yes, so the entire loop should take 1000ms maybe + 1ms. Instead it > > > > takes 1070. When I run it on an armv6 system running -current it > > > > takes > > > > 1050. When I run it on my 8.4 desktop (pre-eventtimers) it takes > > > > 1013. > > > > > > > > -- Ian > > > > > > What if you set kern.eventtimer.periodic=1? > > > > Some interesting results... > > > > HZ 100 500 1000 > > > > --------------------------------- > > periodic=0 1050 1050 1080 > > periodic=1 1110 1012 1049 > > > > > > The 1080 number was +/- 3ms, all the other numbers were +/- 1ms (except > > for one outlier of 24363 at 100Hz non-periodic which I'm going to > > pretend didn't happen). > > > > The 1050 numbers are probably each 20ms sleep actually taking 21ms, but > > the old tvtohz code with -1 adjustments from the old email thread isn't > > in play anymore. I don't know how to account for the other numbers at > > all. There's all kinds of stuff I don't understand in the new code > > involving tick thresholds and such. > > > > -- Ian > > The attached patch seems to fix the problem in what I think is the most > correct way: scheduling the callout with absolute times based on the > time the current event was scheduled for plus the requested interval. > The net effect should be metronomic events that do not drift (or phase > shift if you prefer) over time, regardless of any latency involved in > processing the events. > > This makes all the numbers in the tests I ran above come out 1000. > > It doesn't make me understand the strange results from the prior tests > any better. > > -- Ian Are you running ntpd or ptpd? If so, perhaps try the original tests without the patch. That said, I think one of the reasons the old code worked was that the previous callout had the equivalent of the C_HARDCLOCK flag set. Thus, when the timer interrupt fires and we rescheuled for N ticks, it was actually N ticks -