From owner-freebsd-stable@FreeBSD.ORG Wed Sep 29 05:31:40 2010 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 83BE6106564A for ; Wed, 29 Sep 2010 05:31:40 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 3E92D8FC1F for ; Wed, 29 Sep 2010 05:31:39 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id o8T5VRZJ061189; Tue, 28 Sep 2010 22:31:31 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201009290531.o8T5VRZJ061189@gw.catspoiler.org> Date: Tue, 28 Sep 2010 22:31:27 -0700 (PDT) From: Don Lewis To: freebsd@jdc.parodius.com In-Reply-To: <201009282344.o8SNiqSK060715@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: stable@FreeBSD.org, sterling@camdensoftware.com Subject: Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 05:31:40 -0000 On 28 Sep, Don Lewis wrote: > Looking at the timestamps of things and comparing to my logs, I > discovered that the last instance of ntp instability happened when I was > running "make index" in /usr/ports. I tried it again with entertaining > results. After a while, the machine became unresponsive. I was logged > in over ssh and it stopped echoing keystrokes. In parallel I was > running a script that echoed the date, the results of "vmstat -i", and > the results of "ntpq -c pe". The latter showed jitter and offset going > insane. Eventually "make index" finished and the machine was responsive > again, but the time was way off and ntpd croaked because the necessary > time correction was too large. Nothing else anomalous showed up in the > logs. Hmn, about half an hour after ntpd died I started my CPU time > accounting test and two minutes into that test I got a spew of calcru > messages ... I tried this experiment again using a kernel with WITNESS and DEBUG_VFS_LOCKS compiled in, and pinging this machine from another. Things look normal for a while, then the ping times get huge for a while and then recover. 64 bytes from 192.168.101.3: icmp_seq=1169 ttl=64 time=0.135 ms 64 bytes from 192.168.101.3: icmp_seq=1170 ttl=64 time=0.141 ms 64 bytes from 192.168.101.3: icmp_seq=1171 ttl=64 time=0.130 ms 64 bytes from 192.168.101.3: icmp_seq=1172 ttl=64 time=0.131 ms 64 bytes from 192.168.101.3: icmp_seq=1173 ttl=64 time=0.128 ms 64 bytes from 192.168.101.3: icmp_seq=1174 ttl=64 time=38232.140 ms 64 bytes from 192.168.101.3: icmp_seq=1175 ttl=64 time=37231.309 ms 64 bytes from 192.168.101.3: icmp_seq=1176 ttl=64 time=36230.470 ms 64 bytes from 192.168.101.3: icmp_seq=1177 ttl=64 time=35229.632 ms 64 bytes from 192.168.101.3: icmp_seq=1178 ttl=64 time=34228.791 ms 64 bytes from 192.168.101.3: icmp_seq=1179 ttl=64 time=33227.953 ms 64 bytes from 192.168.101.3: icmp_seq=1180 ttl=64 time=32227.091 ms 64 bytes from 192.168.101.3: icmp_seq=1181 ttl=64 time=31226.262 ms 64 bytes from 192.168.101.3: icmp_seq=1182 ttl=64 time=30225.425 ms 64 bytes from 192.168.101.3: icmp_seq=1183 ttl=64 time=29224.597 ms 64 bytes from 192.168.101.3: icmp_seq=1184 ttl=64 time=28223.757 ms 64 bytes from 192.168.101.3: icmp_seq=1185 ttl=64 time=27222.918 ms 64 bytes from 192.168.101.3: icmp_seq=1186 ttl=64 time=26222.086 ms 64 bytes from 192.168.101.3: icmp_seq=1187 ttl=64 time=25221.164 ms 64 bytes from 192.168.101.3: icmp_seq=1188 ttl=64 time=24220.407 ms 64 bytes from 192.168.101.3: icmp_seq=1189 ttl=64 time=23219.575 ms 64 bytes from 192.168.101.3: icmp_seq=1190 ttl=64 time=22218.737 ms 64 bytes from 192.168.101.3: icmp_seq=1191 ttl=64 time=21217.905 ms 64 bytes from 192.168.101.3: icmp_seq=1192 ttl=64 time=20217.066 ms 64 bytes from 192.168.101.3: icmp_seq=1193 ttl=64 time=19216.228 ms 64 bytes from 192.168.101.3: icmp_seq=1194 ttl=64 time=18215.333 ms 64 bytes from 192.168.101.3: icmp_seq=1195 ttl=64 time=17214.503 ms 64 bytes from 192.168.101.3: icmp_seq=1196 ttl=64 time=16213.720 ms 64 bytes from 192.168.101.3: icmp_seq=1197 ttl=64 time=15210.912 ms 64 bytes from 192.168.101.3: icmp_seq=1198 ttl=64 time=14210.044 ms 64 bytes from 192.168.101.3: icmp_seq=1199 ttl=64 time=13209.194 ms 64 bytes from 192.168.101.3: icmp_seq=1200 ttl=64 time=12208.376 ms 64 bytes from 192.168.101.3: icmp_seq=1201 ttl=64 time=11207.536 ms 64 bytes from 192.168.101.3: icmp_seq=1202 ttl=64 time=10206.694 ms 64 bytes from 192.168.101.3: icmp_seq=1203 ttl=64 time=9205.816 ms 64 bytes from 192.168.101.3: icmp_seq=1204 ttl=64 time=8205.014 ms 64 bytes from 192.168.101.3: icmp_seq=1205 ttl=64 time=7204.186 ms 64 bytes from 192.168.101.3: icmp_seq=1206 ttl=64 time=6203.294 ms 64 bytes from 192.168.101.3: icmp_seq=1207 ttl=64 time=5202.510 ms 64 bytes from 192.168.101.3: icmp_seq=1208 ttl=64 time=4201.677 ms 64 bytes from 192.168.101.3: icmp_seq=1209 ttl=64 time=3200.851 ms 64 bytes from 192.168.101.3: icmp_seq=1210 ttl=64 time=2200.013 ms 64 bytes from 192.168.101.3: icmp_seq=1211 ttl=64 time=1199.100 ms 64 bytes from 192.168.101.3: icmp_seq=1212 ttl=64 time=198.331 ms 64 bytes from 192.168.101.3: icmp_seq=1213 ttl=64 time=0.129 ms 64 bytes from 192.168.101.3: icmp_seq=1214 ttl=64 time=58223.470 ms 64 bytes from 192.168.101.3: icmp_seq=1215 ttl=64 time=57222.637 ms 64 bytes from 192.168.101.3: icmp_seq=1216 ttl=64 time=56221.800 ms 64 bytes from 192.168.101.3: icmp_seq=1217 ttl=64 time=55220.960 ms 64 bytes from 192.168.101.3: icmp_seq=1218 ttl=64 time=54220.116 ms 64 bytes from 192.168.101.3: icmp_seq=1219 ttl=64 time=53219.282 ms 64 bytes from 192.168.101.3: icmp_seq=1220 ttl=64 time=52218.444 ms 64 bytes from 192.168.101.3: icmp_seq=1221 ttl=64 time=51217.618 ms 64 bytes from 192.168.101.3: icmp_seq=1222 ttl=64 time=50216.778 ms 64 bytes from 192.168.101.3: icmp_seq=1223 ttl=64 time=49215.932 ms 64 bytes from 192.168.101.3: icmp_seq=1224 ttl=64 time=48215.095 ms 64 bytes from 192.168.101.3: icmp_seq=1225 ttl=64 time=47214.262 ms 64 bytes from 192.168.101.3: icmp_seq=1226 ttl=64 time=46213.440 ms 64 bytes from 192.168.101.3: icmp_seq=1227 ttl=64 time=45212.623 ms 64 bytes from 192.168.101.3: icmp_seq=1228 ttl=64 time=44211.783 ms 64 bytes from 192.168.101.3: icmp_seq=1229 ttl=64 time=43210.903 ms 64 bytes from 192.168.101.3: icmp_seq=1230 ttl=64 time=42210.111 ms 64 bytes from 192.168.101.3: icmp_seq=1231 ttl=64 time=41209.274 ms 64 bytes from 192.168.101.3: icmp_seq=1232 ttl=64 time=40208.448 ms 64 bytes from 192.168.101.3: icmp_seq=1233 ttl=64 time=39207.608 ms 64 bytes from 192.168.101.3: icmp_seq=1234 ttl=64 time=38206.774 ms 64 bytes from 192.168.101.3: icmp_seq=1235 ttl=64 time=37205.842 ms 64 bytes from 192.168.101.3: icmp_seq=1236 ttl=64 time=36205.104 ms 64 bytes from 192.168.101.3: icmp_seq=1237 ttl=64 time=35204.270 ms 64 bytes from 192.168.101.3: icmp_seq=1238 ttl=64 time=34203.433 ms 64 bytes from 192.168.101.3: icmp_seq=1239 ttl=64 time=33202.603 ms 64 bytes from 192.168.101.3: icmp_seq=1240 ttl=64 time=32201.764 ms 64 bytes from 192.168.101.3: icmp_seq=1241 ttl=64 time=31200.924 ms 64 bytes from 192.168.101.3: icmp_seq=1242 ttl=64 time=30200.082 ms 64 bytes from 192.168.101.3: icmp_seq=1243 ttl=64 time=29198.883 ms 64 bytes from 192.168.101.3: icmp_seq=1244 ttl=64 time=28198.414 ms 64 bytes from 192.168.101.3: icmp_seq=1245 ttl=64 time=27197.434 ms 64 bytes from 192.168.101.3: icmp_seq=1246 ttl=64 time=26196.738 ms 64 bytes from 192.168.101.3: icmp_seq=1247 ttl=64 time=25195.912 ms 64 bytes from 192.168.101.3: icmp_seq=1248 ttl=64 time=24195.074 ms 64 bytes from 192.168.101.3: icmp_seq=1249 ttl=64 time=23194.231 ms 64 bytes from 192.168.101.3: icmp_seq=1250 ttl=64 time=22193.407 ms 64 bytes from 192.168.101.3: icmp_seq=1251 ttl=64 time=21192.565 ms 64 bytes from 192.168.101.3: icmp_seq=1252 ttl=64 time=20191.725 ms 64 bytes from 192.168.101.3: icmp_seq=1253 ttl=64 time=19190.852 ms 64 bytes from 192.168.101.3: icmp_seq=1254 ttl=64 time=18190.060 ms 64 bytes from 192.168.101.3: icmp_seq=1255 ttl=64 time=17189.220 ms 64 bytes from 192.168.101.3: icmp_seq=1256 ttl=64 time=16188.381 ms 64 bytes from 192.168.101.3: icmp_seq=1257 ttl=64 time=15183.118 ms 64 bytes from 192.168.101.3: icmp_seq=1258 ttl=64 time=14182.711 ms 64 bytes from 192.168.101.3: icmp_seq=1259 ttl=64 time=13181.876 ms 64 bytes from 192.168.101.3: icmp_seq=1260 ttl=64 time=12181.034 ms 64 bytes from 192.168.101.3: icmp_seq=1261 ttl=64 time=11180.192 ms 64 bytes from 192.168.101.3: icmp_seq=1262 ttl=64 time=10179.357 ms 64 bytes from 192.168.101.3: icmp_seq=1263 ttl=64 time=9178.522 ms 64 bytes from 192.168.101.3: icmp_seq=1264 ttl=64 time=8177.692 ms 64 bytes from 192.168.101.3: icmp_seq=1265 ttl=64 time=7176.850 ms 64 bytes from 192.168.101.3: icmp_seq=1266 ttl=64 time=6176.026 ms 64 bytes from 192.168.101.3: icmp_seq=1267 ttl=64 time=5175.185 ms 64 bytes from 192.168.101.3: icmp_seq=1268 ttl=64 time=4174.355 ms 64 bytes from 192.168.101.3: icmp_seq=1269 ttl=64 time=3173.479 ms 64 bytes from 192.168.101.3: icmp_seq=1270 ttl=64 time=2172.658 ms 64 bytes from 192.168.101.3: icmp_seq=1271 ttl=64 time=1171.835 ms 64 bytes from 192.168.101.3: icmp_seq=1272 ttl=64 time=170.971 ms 64 bytes from 192.168.101.3: icmp_seq=1273 ttl=64 time=0.138 ms 64 bytes from 192.168.101.3: icmp_seq=1274 ttl=64 time=0.162 ms 64 bytes from 192.168.101.3: icmp_seq=1275 ttl=64 time=0.133 ms 64 bytes from 192.168.101.3: icmp_seq=1276 ttl=64 time=0.140 ms 64 bytes from 192.168.101.3: icmp_seq=1277 ttl=64 time=0.138 ms 64 bytes from 192.168.101.3: icmp_seq=1278 ttl=64 time=0.132 ms 64 bytes from 192.168.101.3: icmp_seq=1279 ttl=64 time=0.132 ms 64 bytes from 192.168.101.3: icmp_seq=1280 ttl=64 time=0.132 ms 64 bytes from 192.168.101.3: icmp_seq=1281 ttl=64 time=0.129 ms At that point the machine silently rebooted inspite of being compiled with KDB and DDB and not KDB_UNATTENDED. This silent reboot is reproduceable.