Date: Wed, 29 Sep 2010 00:00:17 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Don Lewis <truckman@FreeBSD.org> Cc: stable@FreeBSD.org, sterling@camdensoftware.com Subject: Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime Message-ID: <20100929070017.GA82362@icarus.home.lan> In-Reply-To: <201009290531.o8T5VRZJ061189@gw.catspoiler.org> References: <201009282344.o8SNiqSK060715@gw.catspoiler.org> <201009290531.o8T5VRZJ061189@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 28, 2010 at 10:31:27PM -0700, Don Lewis wrote: > On 28 Sep, Don Lewis wrote: > > > Looking at the timestamps of things and comparing to my logs, I > > discovered that the last instance of ntp instability happened when I was > > running "make index" in /usr/ports. I tried it again with entertaining > > results. After a while, the machine became unresponsive. I was logged > > in over ssh and it stopped echoing keystrokes. In parallel I was > > running a script that echoed the date, the results of "vmstat -i", and > > the results of "ntpq -c pe". The latter showed jitter and offset going > > insane. Eventually "make index" finished and the machine was responsive > > again, but the time was way off and ntpd croaked because the necessary > > time correction was too large. Nothing else anomalous showed up in the > > logs. Hmn, about half an hour after ntpd died I started my CPU time > > accounting test and two minutes into that test I got a spew of calcru > > messages ... > > I tried this experiment again using a kernel with WITNESS and > DEBUG_VFS_LOCKS compiled in, and pinging this machine from another. > Things look normal for a while, then the ping times get huge for a while > and then recover. > > 64 bytes from 192.168.101.3: icmp_seq=1169 ttl=64 time=0.135 ms > 64 bytes from 192.168.101.3: icmp_seq=1170 ttl=64 time=0.141 ms > 64 bytes from 192.168.101.3: icmp_seq=1171 ttl=64 time=0.130 ms > 64 bytes from 192.168.101.3: icmp_seq=1172 ttl=64 time=0.131 ms > 64 bytes from 192.168.101.3: icmp_seq=1173 ttl=64 time=0.128 ms > 64 bytes from 192.168.101.3: icmp_seq=1174 ttl=64 time=38232.140 ms > 64 bytes from 192.168.101.3: icmp_seq=1175 ttl=64 time=37231.309 ms > 64 bytes from 192.168.101.3: icmp_seq=1176 ttl=64 time=36230.470 ms > 64 bytes from 192.168.101.3: icmp_seq=1177 ttl=64 time=35229.632 ms > 64 bytes from 192.168.101.3: icmp_seq=1178 ttl=64 time=34228.791 ms > 64 bytes from 192.168.101.3: icmp_seq=1179 ttl=64 time=33227.953 ms > 64 bytes from 192.168.101.3: icmp_seq=1180 ttl=64 time=32227.091 ms > 64 bytes from 192.168.101.3: icmp_seq=1181 ttl=64 time=31226.262 ms > 64 bytes from 192.168.101.3: icmp_seq=1182 ttl=64 time=30225.425 ms > 64 bytes from 192.168.101.3: icmp_seq=1183 ttl=64 time=29224.597 ms > 64 bytes from 192.168.101.3: icmp_seq=1184 ttl=64 time=28223.757 ms > 64 bytes from 192.168.101.3: icmp_seq=1185 ttl=64 time=27222.918 ms > 64 bytes from 192.168.101.3: icmp_seq=1186 ttl=64 time=26222.086 ms > 64 bytes from 192.168.101.3: icmp_seq=1187 ttl=64 time=25221.164 ms > 64 bytes from 192.168.101.3: icmp_seq=1188 ttl=64 time=24220.407 ms > 64 bytes from 192.168.101.3: icmp_seq=1189 ttl=64 time=23219.575 ms > 64 bytes from 192.168.101.3: icmp_seq=1190 ttl=64 time=22218.737 ms > 64 bytes from 192.168.101.3: icmp_seq=1191 ttl=64 time=21217.905 ms > 64 bytes from 192.168.101.3: icmp_seq=1192 ttl=64 time=20217.066 ms > 64 bytes from 192.168.101.3: icmp_seq=1193 ttl=64 time=19216.228 ms > 64 bytes from 192.168.101.3: icmp_seq=1194 ttl=64 time=18215.333 ms > 64 bytes from 192.168.101.3: icmp_seq=1195 ttl=64 time=17214.503 ms > 64 bytes from 192.168.101.3: icmp_seq=1196 ttl=64 time=16213.720 ms > 64 bytes from 192.168.101.3: icmp_seq=1197 ttl=64 time=15210.912 ms > 64 bytes from 192.168.101.3: icmp_seq=1198 ttl=64 time=14210.044 ms > 64 bytes from 192.168.101.3: icmp_seq=1199 ttl=64 time=13209.194 ms > 64 bytes from 192.168.101.3: icmp_seq=1200 ttl=64 time=12208.376 ms > 64 bytes from 192.168.101.3: icmp_seq=1201 ttl=64 time=11207.536 ms > 64 bytes from 192.168.101.3: icmp_seq=1202 ttl=64 time=10206.694 ms > 64 bytes from 192.168.101.3: icmp_seq=1203 ttl=64 time=9205.816 ms > 64 bytes from 192.168.101.3: icmp_seq=1204 ttl=64 time=8205.014 ms > 64 bytes from 192.168.101.3: icmp_seq=1205 ttl=64 time=7204.186 ms > 64 bytes from 192.168.101.3: icmp_seq=1206 ttl=64 time=6203.294 ms > 64 bytes from 192.168.101.3: icmp_seq=1207 ttl=64 time=5202.510 ms > 64 bytes from 192.168.101.3: icmp_seq=1208 ttl=64 time=4201.677 ms > 64 bytes from 192.168.101.3: icmp_seq=1209 ttl=64 time=3200.851 ms > 64 bytes from 192.168.101.3: icmp_seq=1210 ttl=64 time=2200.013 ms > 64 bytes from 192.168.101.3: icmp_seq=1211 ttl=64 time=1199.100 ms > 64 bytes from 192.168.101.3: icmp_seq=1212 ttl=64 time=198.331 ms > 64 bytes from 192.168.101.3: icmp_seq=1213 ttl=64 time=0.129 ms > 64 bytes from 192.168.101.3: icmp_seq=1214 ttl=64 time=58223.470 ms > 64 bytes from 192.168.101.3: icmp_seq=1215 ttl=64 time=57222.637 ms > 64 bytes from 192.168.101.3: icmp_seq=1216 ttl=64 time=56221.800 ms > 64 bytes from 192.168.101.3: icmp_seq=1217 ttl=64 time=55220.960 ms > 64 bytes from 192.168.101.3: icmp_seq=1218 ttl=64 time=54220.116 ms > 64 bytes from 192.168.101.3: icmp_seq=1219 ttl=64 time=53219.282 ms > 64 bytes from 192.168.101.3: icmp_seq=1220 ttl=64 time=52218.444 ms > 64 bytes from 192.168.101.3: icmp_seq=1221 ttl=64 time=51217.618 ms > 64 bytes from 192.168.101.3: icmp_seq=1222 ttl=64 time=50216.778 ms > 64 bytes from 192.168.101.3: icmp_seq=1223 ttl=64 time=49215.932 ms > 64 bytes from 192.168.101.3: icmp_seq=1224 ttl=64 time=48215.095 ms > 64 bytes from 192.168.101.3: icmp_seq=1225 ttl=64 time=47214.262 ms > 64 bytes from 192.168.101.3: icmp_seq=1226 ttl=64 time=46213.440 ms > 64 bytes from 192.168.101.3: icmp_seq=1227 ttl=64 time=45212.623 ms > 64 bytes from 192.168.101.3: icmp_seq=1228 ttl=64 time=44211.783 ms > 64 bytes from 192.168.101.3: icmp_seq=1229 ttl=64 time=43210.903 ms > 64 bytes from 192.168.101.3: icmp_seq=1230 ttl=64 time=42210.111 ms > 64 bytes from 192.168.101.3: icmp_seq=1231 ttl=64 time=41209.274 ms > 64 bytes from 192.168.101.3: icmp_seq=1232 ttl=64 time=40208.448 ms > 64 bytes from 192.168.101.3: icmp_seq=1233 ttl=64 time=39207.608 ms > 64 bytes from 192.168.101.3: icmp_seq=1234 ttl=64 time=38206.774 ms > 64 bytes from 192.168.101.3: icmp_seq=1235 ttl=64 time=37205.842 ms > 64 bytes from 192.168.101.3: icmp_seq=1236 ttl=64 time=36205.104 ms > 64 bytes from 192.168.101.3: icmp_seq=1237 ttl=64 time=35204.270 ms > 64 bytes from 192.168.101.3: icmp_seq=1238 ttl=64 time=34203.433 ms > 64 bytes from 192.168.101.3: icmp_seq=1239 ttl=64 time=33202.603 ms > 64 bytes from 192.168.101.3: icmp_seq=1240 ttl=64 time=32201.764 ms > 64 bytes from 192.168.101.3: icmp_seq=1241 ttl=64 time=31200.924 ms > 64 bytes from 192.168.101.3: icmp_seq=1242 ttl=64 time=30200.082 ms > 64 bytes from 192.168.101.3: icmp_seq=1243 ttl=64 time=29198.883 ms > 64 bytes from 192.168.101.3: icmp_seq=1244 ttl=64 time=28198.414 ms > 64 bytes from 192.168.101.3: icmp_seq=1245 ttl=64 time=27197.434 ms > 64 bytes from 192.168.101.3: icmp_seq=1246 ttl=64 time=26196.738 ms > 64 bytes from 192.168.101.3: icmp_seq=1247 ttl=64 time=25195.912 ms > 64 bytes from 192.168.101.3: icmp_seq=1248 ttl=64 time=24195.074 ms > 64 bytes from 192.168.101.3: icmp_seq=1249 ttl=64 time=23194.231 ms > 64 bytes from 192.168.101.3: icmp_seq=1250 ttl=64 time=22193.407 ms > 64 bytes from 192.168.101.3: icmp_seq=1251 ttl=64 time=21192.565 ms > 64 bytes from 192.168.101.3: icmp_seq=1252 ttl=64 time=20191.725 ms > 64 bytes from 192.168.101.3: icmp_seq=1253 ttl=64 time=19190.852 ms > 64 bytes from 192.168.101.3: icmp_seq=1254 ttl=64 time=18190.060 ms > 64 bytes from 192.168.101.3: icmp_seq=1255 ttl=64 time=17189.220 ms > 64 bytes from 192.168.101.3: icmp_seq=1256 ttl=64 time=16188.381 ms > 64 bytes from 192.168.101.3: icmp_seq=1257 ttl=64 time=15183.118 ms > 64 bytes from 192.168.101.3: icmp_seq=1258 ttl=64 time=14182.711 ms > 64 bytes from 192.168.101.3: icmp_seq=1259 ttl=64 time=13181.876 ms > 64 bytes from 192.168.101.3: icmp_seq=1260 ttl=64 time=12181.034 ms > 64 bytes from 192.168.101.3: icmp_seq=1261 ttl=64 time=11180.192 ms > 64 bytes from 192.168.101.3: icmp_seq=1262 ttl=64 time=10179.357 ms > 64 bytes from 192.168.101.3: icmp_seq=1263 ttl=64 time=9178.522 ms > 64 bytes from 192.168.101.3: icmp_seq=1264 ttl=64 time=8177.692 ms > 64 bytes from 192.168.101.3: icmp_seq=1265 ttl=64 time=7176.850 ms > 64 bytes from 192.168.101.3: icmp_seq=1266 ttl=64 time=6176.026 ms > 64 bytes from 192.168.101.3: icmp_seq=1267 ttl=64 time=5175.185 ms > 64 bytes from 192.168.101.3: icmp_seq=1268 ttl=64 time=4174.355 ms > 64 bytes from 192.168.101.3: icmp_seq=1269 ttl=64 time=3173.479 ms > 64 bytes from 192.168.101.3: icmp_seq=1270 ttl=64 time=2172.658 ms > 64 bytes from 192.168.101.3: icmp_seq=1271 ttl=64 time=1171.835 ms > 64 bytes from 192.168.101.3: icmp_seq=1272 ttl=64 time=170.971 ms > 64 bytes from 192.168.101.3: icmp_seq=1273 ttl=64 time=0.138 ms > 64 bytes from 192.168.101.3: icmp_seq=1274 ttl=64 time=0.162 ms > 64 bytes from 192.168.101.3: icmp_seq=1275 ttl=64 time=0.133 ms > 64 bytes from 192.168.101.3: icmp_seq=1276 ttl=64 time=0.140 ms > 64 bytes from 192.168.101.3: icmp_seq=1277 ttl=64 time=0.138 ms > 64 bytes from 192.168.101.3: icmp_seq=1278 ttl=64 time=0.132 ms > 64 bytes from 192.168.101.3: icmp_seq=1279 ttl=64 time=0.132 ms > 64 bytes from 192.168.101.3: icmp_seq=1280 ttl=64 time=0.132 ms > 64 bytes from 192.168.101.3: icmp_seq=1281 ttl=64 time=0.129 ms > > At that point the machine silently rebooted inspite of being compiled > with KDB and DDB and not KDB_UNATTENDED. This silent reboot is > reproduceable. Given all the information here, in addition to the other portion of the thread (indicating ntpd reports extreme offset between the system clock and its stratum 1 source), I would say the motherboard is faulty or there is a system device which is behaving badly (possibly something pertaining to interrupts, but I don't know how to debug this on a low level). Can you boot verbosely and provide all of the output here or somewhere on the web? If possible, I would start by replacing the mainboard. The board looks to be a consumer-level board (I see an nfe(4) controller, for example). -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100929070017.GA82362>