From owner-freebsd-questions@freebsd.org Wed Sep 21 07:35:25 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0BDDDBE3BD2 for ; Wed, 21 Sep 2016 07:35:25 +0000 (UTC) (envelope-from michaelsprivate@gmail.com) Received: from mail-yw0-x22c.google.com (mail-yw0-x22c.google.com [IPv6:2607:f8b0:4002:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BDCDEB5F for ; Wed, 21 Sep 2016 07:35:24 +0000 (UTC) (envelope-from michaelsprivate@gmail.com) Received: by mail-yw0-x22c.google.com with SMTP id t67so39082334ywg.3 for ; Wed, 21 Sep 2016 00:35:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Yo7xa2VMLEMjUNIS1wpY8K+zhLr0wL+0VSPGxPQ3uU0=; b=yJBZtq0Sg/GHduUFrOXWeevySpa0j3isGQ6dCtxGe1XAGTdjmxqASL7J3M5qGr3fTs 1j+QIkOXmQ9k89GA5GJHdPE/r4U+Bz1vUQsUuO5P37n4w1zhVXt6bG+WY4+tarSMph+b f/5tRV7wdICUOfQ3Q6/ynyI5oeR0Ko1PQnB01bM+SgU4cMD5u5hi5/+46tSbpqqFne85 BLUQf1gLdKh+KAcZzs6QqFgrZHIu7Yob/DOA+L38EfmXSg3/ty7DVuvmZWzouThPZE1d 35y0RmmUee4QCq7tSEvmOb/SezI0T9w6ZqNVnyH9KG4BYsOG0Q1O9GBe1Tk72AGmFktj K2UA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Yo7xa2VMLEMjUNIS1wpY8K+zhLr0wL+0VSPGxPQ3uU0=; b=i4egYqDuc3IedvTmQl4Scjw2bcalPdbqhtjdwUA0Lt/gr6RJJtbra0loISDOXYLZma CduSkDt8PSGBTfx4t1LUtBEi5thzQ/E9fjQVyIgEhttl7DboPUfrCNVs7MhcuG6DH4+j KZ2OOaRC7NAvtvdoC0WGIyH3e/ht/2lBxzaKJ62FGXZXTBdS3dhpeCENFeHwK/f/cIB3 707gWzT8q36RcA5pNt5/enkMa0803B4VzFI7bjbVG4xCmP4wQfFnzEZet7sX2cycLK+h aPytlpiComI1a5L4nCRPdGhrM9iB1daCroHnXQ2iVU3oIbD6fzTCtb1OZT1FbZRE4tx9 H/Jg== X-Gm-Message-State: AE9vXwPZDa+feHriDX5J9dgHzG8VbmN1z1Clirsygm+kI9rIc8Bv6S0Gsf7i23t8sHlf7OsgQwPJelApA1Tr0A== X-Received: by 10.13.193.196 with SMTP id c187mr31888606ywd.301.1474443323887; Wed, 21 Sep 2016 00:35:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.173.139 with HTTP; Wed, 21 Sep 2016 00:35:03 -0700 (PDT) In-Reply-To: <20160913232351.GA36091@putsch.kolbu.ws> References: <20160913232351.GA36091@putsch.kolbu.ws> From: Michael Schuster Date: Wed, 21 Sep 2016 09:35:03 +0200 Message-ID: Subject: Re: Server gets a high load, but no CPU use, and then later stops respond on the network To: Stxe5le Bordal Kristoffersen Cc: freeBSD Mailing List Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Sep 2016 07:35:25 -0000 Hi, While I'm not very familiar with FreeBSD internals, I'd like to point out two things that I think may be relevant: 1) note that '[idle]' seems to be the only thread/process doing significant work - at a guess, I'd say that's the kernel doing work that cannot be ascribed to anything else ... housekeeping? (someone who knows FreeBSD better will have to answer that) On Wed, Sep 14, 2016 at 1:23 AM, Stxe5le Bordal Kristoffersen < chiller@putsch.kolbu.ws> wrote: > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > COMMAND > 11 root 24 155 ki31 0K 384K CPU23 23 1206.3 2396.63% > [idle] > 5 root 1 -16 - 0K 16K ipmire 14 100:17 0.00% > [ipmi0: kcs] > 0 root 407 -8 - 0K 6512K - 22 56:20 0.00% > [kernel] > 7 root 2 -16 - 0K 32K umarcl 3 6:21 0.00% > [pagedaemon] > 18 root 1 16 - 0K 16K syncer 14 3:37 0.00% > [syncer] > 12 root 38 -76 - 0K 608K WAIT 255 3:04 0.00% > [intr] > 2 root 6 -16 - 0K 96K - 0 2:41 0.00% > [cam] > 14 root 1 -16 - 0K 16K - 16 1:40 0.00% > [rand_harvestq] > 3 root 9 -8 - 0K 176K tx->tx 20 1:13 0.00% > [zfskern] > 17 root 1 -16 - 0K 16K vlruwt 13 1:10 0.00% > [vnlru] > 762 root 1 20 0 50040K 15212K select 18 0:10 0.00% > /usr/local/bin/perl -wT /usr/local/sbin/munin-node > 620 root 1 20 0 14520K 2044K select 20 0:06 0.00% > /usr/sbin/syslogd -s > 15 root 40 -68 - 0K 640K - 0 0:05 0.00% > [usb] > 686 root 1 20 0 26128K 18044K select 15 0:05 0.00% > /usr/sbin/ntpd -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift > 823 root 1 20 0 24156K 5420K select 13 0:02 0.00% > sendmail: accepting connections (sendmail) > 6 root 1 -16 - 0K 16K idle 16 0:02 0.00% > [enc_daemon0] > 16 root 1 -16 - 0K 16K psleep 19 0:00 0.00% > [bufdaemon] > 830 root 1 20 0 16624K 712K nanslp 16 0:00 0.00% > /usr/sbin/cron -s > 826 smmsp 1 20 0 24156K 1056K pause 23 0:00 0.00% > sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sendmail) > 52778 chiller 1 20 0 31060K 5356K pause 23 0:00 0.00% > -zsh (zsh) > 800 root 1 20 0 61316K 5164K select 17 0:00 0.00% > /usr/sbin/sshd > 1 root 1 20 0 9492K 460K wait 22 0:00 0.00% > [init] > 54395 chiller 1 23 0 31060K 5388K pause 23 0:00 0.00% > -zsh (zsh) > 52777 chiller 1 20 0 86584K 7576K select 23 0:00 0.00% > sshd: chiller@pts/0 (sshd) > 54394 chiller 1 20 0 86584K 7616K select 19 0:00 0.00% > sshd: chiller@pts/1 (sshd) > 473 root 1 20 0 13628K 4504K select 22 0:00 0.00% > /sbin/devd > 54441 root 1 20 0 24392K 4064K pause 15 0:00 0.00% -su > (zsh) > 52774 root 1 20 0 86584K 7532K select 19 0:00 0.00% > sshd: chiller [priv] (sshd) > 54050 root 1 20 0 24392K 4064K ttyin 20 0:00 0.00% -su > (zsh) > 13 root 3 -8 - 0K 48K - 4 0:00 0.00% > [geom] > 54389 root 1 20 0 86584K 7568K select 13 0:00 0.00% > sshd: chiller [priv] (sshd) > > [...] > > 2) look at 'sr' (using a fixed-width font probably helps). In Solaris (which is where I come from ... a long time ago ;-)) this is "scan rate", ie the number of pages (per second) the paging mechanism is looking at - (again on Solaris) this would mean that your system is under some kind of fairly constant memory pressure - where from I cannot even guess, and given the "avm" and "fre" columns, this does look very strange ... but that's what I'd continue my investigation with. pusen# vmstat 1 > procs memory page disks faults > cpu > r b w avm fre flt re pi po fr sr da0 da1 in sy cs > us sy id > 0 0 0 858M 1449M 335 0 0 1 355 4954 0 0 1917 4403 5302 > 0 0 99 > 0 0 0 858M 1449M 0 0 0 0 0 265 0 0 1 120 80 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 265 0 0 0 124 81 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 265 0 0 1 120 68 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 265 0 0 1 127 92 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 265 0 0 0 120 91 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 265 0 0 1 121 82 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 265 0 0 0 120 75 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 265 0 0 2 121 96 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 265 0 0 1 126 83 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 217 0 0 1 121 68 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 215 0 0 1 120 88 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 215 0 0 0 121 92 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 215 0 0 0 120 83 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 215 0 0 1 127 90 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 196 0 0 5 120 94 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 196 0 0 1 121 80 > 0 0 100 > 0 0 0 858M 1449M 0 0 0 0 0 196 0 0 0 123 79 > 0 0 100 > 1 0 0 858M 1449M 0 0 0 0 0 196 0 0 2 121 76 > 0 0 100 > 1 0 0 858M 1449M 0 0 0 0 0 196 0 0 4 118 106 > 0 0 100 > 1 0 0 858M 1449M 0 0 0 0 0 196 0 0 0 112 87 > 0 0 100 > HTH Michael -- Michael Schuster http://recursiveramblings.wordpress.com/ recursion, n: see 'recursion'