From owner-freebsd-questions@freebsd.org Sat Aug 17 04:06:14 2019 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A8E51BA60D for ; Sat, 17 Aug 2019 04:06:14 +0000 (UTC) (envelope-from dnewman@networktest.com) Received: from mail8.networktest.com (mail8.networktest.com [192.73.244.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 469RSj5d90z3x7G for ; Sat, 17 Aug 2019 04:06:13 +0000 (UTC) (envelope-from dnewman@networktest.com) Received: from mail8.networktest.com (localhost [127.0.0.1]) by mail8.networktest.com (Postfix) with ESMTP id 303036F4264 for ; Fri, 16 Aug 2019 21:06:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=networktest.com; h=content-transfer-encoding:content-language:content-type :content-type:mime-version:user-agent:date:date:message-id :organization:subject:subject:from:from:to; s=dkim; t= 1566014771; x=1566619572; bh=FL9VInxISKeiDfDMXUHqFdYvgiTF94dX3Bn tWhobzyU=; b=XToOuOE7vN87Yg05T5o0QE9dVWa9RwQzJAn6qK9dTqtkA62vjOE QhnlN8dDfoZXOGFhLY46PU6BqKAzh/ziYnMLv7kaA7n5VJSzaf5S8c9wY3CLeHud X81kMGzBl9E8X6jRi2zhhN2/QssmYc53m5iVrw2JEazg+s26cwo/Vpvk= X-Virus-Scanned: amavisd-new at mail8.networktest.com X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=x tagged_above=-100 required=1 WHITELISTED tests=[] autolearn=unavailable Received: from mail8.networktest.com ([127.0.0.1]) by mail8.networktest.com (mail8.networktest.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id SKYvXJsRuLrh for ; Fri, 16 Aug 2019 21:06:11 -0700 (PDT) Received: from tejay.local (cpe-75-82-86-131.socal.res.rr.com [75.82.86.131]) by mail8.networktest.com (Postfix) with ESMTPSA id C660C6F4254 for ; Fri, 16 Aug 2019 21:06:11 -0700 (PDT) To: freebsd-questions@freebsd.org From: David Newman Subject: intermittent network failures with drill and icinga2 Organization: Network Test Inc. Message-ID: Date: Fri, 16 Aug 2019 21:06:10 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 469RSj5d90z3x7G X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=networktest.com header.s=dkim header.b=XToOuOE7; dmarc=none; spf=pass (mx1.freebsd.org: domain of dnewman@networktest.com designates 192.73.244.137 as permitted sender) smtp.mailfrom=dnewman@networktest.com X-Spamd-Result: default: False [-4.26 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[networktest.com:s=dkim]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+mx:c]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; HAS_ORG_HEADER(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; DMARC_NA(0.00)[networktest.com]; DKIM_TRACE(0.00)[networktest.com:+]; NEURAL_HAM_SHORT(-0.98)[-0.978,0]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; IP_SCORE(-0.78)[asn: 36236(-3.87), country: US(-0.05)]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:36236, ipnet:192.73.244.0/24, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[131.86.82.75.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.10] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Aug 2019 04:06:14 -0000 12.0-RELEASE-p9, icinga2 2.10.5_1, drill 1.7.0 Do drill and ping use different system calls to resolve hostnames to IP addresses? Asking because around 5x-10x per day, icinga2 returns an error because this system can't resolve a hostname to an IP address. However, the system is reachable by ssh during these error periods, and it _can_ resolve hostnames when using ping. Here's an example where drill doesn't work and ping does: [dnewman@hood ~]$ drill mail.networktest.com @puck.nether.net Error: error sending query: Could not send or receive, because of network error [dnewman@hood ~]$ ping puck.nether.net PING puck.nether.net (204.42.254.5): 56 data bytes 64 bytes from 204.42.254.5: icmp_seq=0 ttl=51 time=76.332 ms [dnewman@hood ~]$ drill mail.networktest.com @puck.nether.net Error: error sending query: Could not send or receive, because of network error The /etc/resolv.conf file points to two internal nameservers, both reachable: [dnewman@hood ~]$ cat /etc/resolv.conf search inf.networktest.com networktest.com nameserver 172.31.53.12 nameserver 172.31.53.13 Also, icinga2 resolves hundreds of hostnames but almost exclusively this problem occurs when doing a check on puck.nether.net. I don't think there's anything wrong with puck.nether.net DNS or reachability; even this system can ping it, and I can resolve it from any other host. Other host checks and networking on this system otherwise work fine. Thanks in advance for clues on what might cause these intermittent failures in drill and icinga2, and what to do to fix them. dn ps. This system is a VMware VM. I don't believe it's a VMware issue, however; aside from periodic inability to reach one host its networking works OK, and all other server VMs on the same VMware host with similar network configurations don't have this issue.