From owner-freebsd-net@freebsd.org Fri Jul 17 17:51:18 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3ED6436AE26 for ; Fri, 17 Jul 2020 17:51:18 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [IPv6:2a02:21e0:16e0:fe::101:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "raven.bwct.de", Issuer "raven.bwct.de" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B7dw54zbCz4Ky6 for ; Fri, 17 Jul 2020 17:51:17 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from mail.cicely.de ([10.1.1.37]) by raven.bwct.de (8.15.2/8.15.2) with ESMTPS id 06HHpBSA088566 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Fri, 17 Jul 2020 19:51:13 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cicely.de; s=default; t=1595008273; bh=JpjRWz33C+Hs/pvdhvOZGr33nsQB+RQB4LQ4ndUevPs=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To; b=PEhSGe3U5WJt5oM2hFNVES2fC0K6vRUJJUME0H1yNa6FT+q/jT6Dq416rMV3ZdGX5 ULOYWceERXRDGDDBAbWltFGsdVglIQdaT7HhP/oi9j8S9bvA3G2j/0qcqE6SrFFUG1 6VKsSOBaYXb1YMStjM1ElYouosD65qKRinzu9zl0= Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by mail.cicely.de (8.14.5/8.14.4) with ESMTP id 06HHp1fw064576 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 17 Jul 2020 19:51:01 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.15.2/8.15.2) with ESMTP id 06HHov39081275; Fri, 17 Jul 2020 19:50:57 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.15.2/8.15.2/Submit) id 06HHot1k081274; Fri, 17 Jul 2020 19:50:55 +0200 (CEST) (envelope-from ticso) Date: Fri, 17 Jul 2020 19:50:55 +0200 From: Bernd Walter To: Michael Tuexen Cc: ticso@cicely.de, freebsd-net@freebsd.org, Bernd Walter Subject: Re: SCTP problem, how to debug? Message-ID: <20200717175055.GB79604@cicely7.cicely.de> Reply-To: ticso@cicely.de References: <20200717160739.GA79604@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD cicely7.cicely.de 12.0-STABLE amd64 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.9 required=4.5 tests=ALL_TRUSTED=-1, BAYES_00=-1.9 autolearn=ham version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on spamd.cicely.de X-Rspamd-Queue-Id: 4B7dw54zbCz4Ky6 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=pass header.d=cicely.de header.s=default header.b=PEhSGe3U; dmarc=none; spf=none (mx1.freebsd.org: domain of ticso@cicely7.cicely.de has no SPF policy when checking 2a02:21e0:16e0:fe::101:1) smtp.mailfrom=ticso@cicely7.cicely.de X-Spamd-Result: default: False [-0.97 / 15.00]; HAS_REPLYTO(0.00)[ticso@cicely.de]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[cicely.de:s=default]; NEURAL_HAM_MEDIUM(-0.95)[-0.955]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; MV_CASE(0.50)[]; NEURAL_HAM_LONG(-0.98)[-0.979]; MIME_GOOD(-0.10)[text/plain]; REPLYTO_DOM_NEQ_FROM_DOM(0.00)[]; DMARC_NA(0.00)[cicely.de]; TO_DN_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[cicely.de:+]; NEURAL_HAM_SHORT(-0.24)[-0.241]; R_SPF_NA(0.00)[no SPF record]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:21461, ipnet:2a02:21e0::/32, country:DE]; RCVD_TLS_LAST(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jul 2020 17:51:18 -0000 On Fri, Jul 17, 2020 at 07:27:00PM +0200, Michael Tuexen wrote: > > > > On 17. Jul 2020, at 18:07, Bernd Walter wrote: > > > > I'm running an LED matrix with SCTP. > > The matrix consists from 24 raspberry pi running NFS-root FreeBSD > > 12.0-RELEASE (they have an SD card for u-boot and loader). > > A client system is running FreeBSD 12.1-RELEASE. > I fixed iterator related bugs and this was MFCed to stable/12 recently. > The problem was that the iterator stopped sending. The client side should be easy to update. The modules can be a bit more tricky, but I will try. > > > > The matrix modules have a one to many service socket. > > The daemon regularily sends status informations (temperature, etc.) to > > each association and has a second thread to receive. > > > > The client system has two deamons running. > > > > One daemon is to control power output related to temperature states. > > It has one thread reestablishing associations via sctp_connectx() to > > each of the 24 modules using a single one to many socket. > > Another thread collects all regular received data and updates stored > > values. > > Yet another thread sends power control data via SCTP_SENDALL, so that all modules know > > the maxed allowed brightness rating. > > > > The other daemon uses the same threads to reconnect and receive. > > It connects to the very same sockets on the modules. > > Another thread updates picture data and wanted power rating. > > That is sending picture data to the given matrix module and then > > SCTP_SENDALL an update trigger to all modules. > > That is reduced brightness at night times, ... > > > > All SCTP_SENDALL are just trigger with 0 length and different ppid values. > Are you really sending messages of length 0? That shouldn't work... No - I was wrong. Just checked and I do send a dummy byte: void apa_push_leds() { // send dummy content, because we wouldn't send anything without char message = 0; send_message(&message, 1, 0, 0x00000002, SCTP_SENDALL); } send_message(const void* data, size_t len, uint32_t stream, uint32_t ppid, uint32_t flags, const String& dest = "") { Mutex::Guard mtx(sctp_mtx); ssize_t remain = len; if (!dest.empty()) { struct addrinfo ai; struct addrinfo *lips; bzero(&ai, sizeof(ai)); ai.ai_flags = AI_ADDRCONFIG | AI_NUMERICSERV; ai.ai_family = AF_INET6; ai.ai_protocol = IPPROTO_SCTP; ai.ai_socktype = SOCK_SEQPACKET; int res = 0; String sport = port; String addr = dest; res = getaddrinfo(addr.c_str(), sport.c_str(), &ai, &lips); if (res != 0) { throw Error("failed to resolve local ips"); } struct addrinfo *lip; for (lip = lips; lip && remain > 0; lip = lip->ai_next) { while (remain > 0) { ssize_t res; res = sctp_sendmsgx(sctp_socket, ((uint8_t*)data) + (len - remain), remain, lip->ai_addr, 1, ppid, flags | SCTP_EOR, stream, 0, 0); if (res > 0) { remain -= res; } else { if (errno != EAGAIN && errno != ENOBUFS) { return; } else { int res; do { struct pollfd pfd; pfd.fd = sctp_socket; pfd.events = POLLOUT; pfd.revents = 0; res = poll(&pfd, 1, 5000); } while (res == 0); } } } } freeaddrinfo(lips); } else { while (remain > 0) { ssize_t res; res = sctp_sendmsg(sctp_socket, ((uint8_t*)data) + (len - remain), remain, NULL, 0, ppid, flags | SCTP_EOR, stream, 0, 0); if (res > 0) { remain -= res; } else { if (errno != EAGAIN && errno != ENOBUFS) { return; } else { int res; do { struct pollfd pfd; pfd.fd = sctp_socket; pfd.events = POLLOUT; pfd.revents = 0; res = poll(&pfd, 1, 5000); } while (res == 0); } } } } } > > > > From time to time (1-5 days) I notice that a module won't get updates > > anymore. > > I see that the association got a SCTP_SENDER_DRY_EVENT event. > > Therefor my expectation is that there is nothing to send. > > I still see the association in the socket list and also receive the regular > > temperature data. > > However, obviously sending won't happen. > > The other modules still get data. > > > > When I restart the client daemon, things start to work again. > > > > Currently I'm clueless on how to debug this problem any fsurther. > Can you try stable/12? > > Doing a full network log would be too big and I'm not very experienced > > to understand the SCTP packets. > > I have no idea to see what data is in the send buffer. > > netstat with TCP would show send and receive queue, not so with SCTP. > > Data is send with a single thread, which sctp_sendmsgx() the data for > > all modules sequencially into the same socket. > I'm not sure I understood what you are actually doing on which socket > and how many associations are involved. Each of the 24 modules has a single socket with two associations from the client host. The client host has two daemon, which has a socket each and both sockets have an association to each of the 24 modules. This is the client host: Proto Type Local Address Foreign Address (state) sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe66:62de.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fecf:7cb7.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe4c:b9c9.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fee6:41f6.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec4:6a45.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe93:5ab4.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec6:aaea.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:feef:ba3.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe87:b229.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe82:9ece.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe96:bf2a.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe10:195b.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe46:cb7.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:feb5:65dc.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec5:30dd.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe52:54bc.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe8a:2fcb.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec6:5d6e.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe03:c920.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fecb:66a3.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe9c:9e54.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe7c:5702.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fef2:186c.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe4d:3de5.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fecb:66a3.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe66:62de.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fecf:7cb7.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe4c:b9c9.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fee6:41f6.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec4:6a45.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe93:5ab4.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec6:aaea.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:feef:ba3.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe87:b229.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe82:9ece.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe96:bf2a.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe10:195b.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe46:cb7.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:feb5:65dc.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec5:30dd.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe52:54bc.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe8a:2fcb.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec6:5d6e.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe03:c920.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe9c:9e54.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe7c:5702.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fef2:186c.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe4d:3de5.1000 ESTABLISHED This is one of the 24 modules: Proto Type Local Address Foreign Address (state) sctp46 1toN 127.0.0.1.1000 LISTEN fe80::1%lo0.1000 ::1.1000 2003:df:b017:115:ba27:ebff:fe87:b229.1000 fe80::ba27:ebff:fe87:b229%ue0.1000 10.215.74.118.1000 sctp46 1toN 2003:df:b017:115:ba27:ebff:fe87:b229.1000 2003:df:b017:115::100.10405 ESTABLISHED sctp46 1toN 2003:df:b017:115:ba27:ebff:fe87:b229.1000 2003:df:b017:115::100.38953 ESTABLISHED > > I havn't checked yet if I get an error with the write to the specific > > module IP. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.