Date: Fri, 17 Jul 2020 19:50:55 +0200 From: Bernd Walter <ticso@cicely7.cicely.de> To: Michael Tuexen <Michael.Tuexen@lurchi.franken.de> Cc: ticso@cicely.de, freebsd-net@freebsd.org, Bernd Walter <ticso@cicely7.cicely.de> Subject: Re: SCTP problem, how to debug? Message-ID: <20200717175055.GB79604@cicely7.cicely.de> In-Reply-To: <EE7DEB1F-D941-4FC2-91FB-7A1A3D3E11C3@lurchi.franken.de> References: <20200717160739.GA79604@cicely7.cicely.de> <EE7DEB1F-D941-4FC2-91FB-7A1A3D3E11C3@lurchi.franken.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 17, 2020 at 07:27:00PM +0200, Michael Tuexen wrote: > > > > On 17. Jul 2020, at 18:07, Bernd Walter <ticso@cicely7.cicely.de> wrote: > > > > I'm running an LED matrix with SCTP. > > The matrix consists from 24 raspberry pi running NFS-root FreeBSD > > 12.0-RELEASE (they have an SD card for u-boot and loader). > > A client system is running FreeBSD 12.1-RELEASE. > I fixed iterator related bugs and this was MFCed to stable/12 recently. > The problem was that the iterator stopped sending. The client side should be easy to update. The modules can be a bit more tricky, but I will try. > > > > The matrix modules have a one to many service socket. > > The daemon regularily sends status informations (temperature, etc.) to > > each association and has a second thread to receive. > > > > The client system has two deamons running. > > > > One daemon is to control power output related to temperature states. > > It has one thread reestablishing associations via sctp_connectx() to > > each of the 24 modules using a single one to many socket. > > Another thread collects all regular received data and updates stored > > values. > > Yet another thread sends power control data via SCTP_SENDALL, so that all modules know > > the maxed allowed brightness rating. > > > > The other daemon uses the same threads to reconnect and receive. > > It connects to the very same sockets on the modules. > > Another thread updates picture data and wanted power rating. > > That is sending picture data to the given matrix module and then > > SCTP_SENDALL an update trigger to all modules. > > That is reduced brightness at night times, ... > > > > All SCTP_SENDALL are just trigger with 0 length and different ppid values. > Are you really sending messages of length 0? That shouldn't work... No - I was wrong. Just checked and I do send a dummy byte: void apa_push_leds() { // send dummy content, because we wouldn't send anything without char message = 0; send_message(&message, 1, 0, 0x00000002, SCTP_SENDALL); } send_message(const void* data, size_t len, uint32_t stream, uint32_t ppid, uint32_t flags, const String& dest = "") { Mutex::Guard mtx(sctp_mtx); ssize_t remain = len; if (!dest.empty()) { struct addrinfo ai; struct addrinfo *lips; bzero(&ai, sizeof(ai)); ai.ai_flags = AI_ADDRCONFIG | AI_NUMERICSERV; ai.ai_family = AF_INET6; ai.ai_protocol = IPPROTO_SCTP; ai.ai_socktype = SOCK_SEQPACKET; int res = 0; String sport = port; String addr = dest; res = getaddrinfo(addr.c_str(), sport.c_str(), &ai, &lips); if (res != 0) { throw Error("failed to resolve local ips"); } struct addrinfo *lip; for (lip = lips; lip && remain > 0; lip = lip->ai_next) { while (remain > 0) { ssize_t res; res = sctp_sendmsgx(sctp_socket, ((uint8_t*)data) + (len - remain), remain, lip->ai_addr, 1, ppid, flags | SCTP_EOR, stream, 0, 0); if (res > 0) { remain -= res; } else { if (errno != EAGAIN && errno != ENOBUFS) { return; } else { int res; do { struct pollfd pfd; pfd.fd = sctp_socket; pfd.events = POLLOUT; pfd.revents = 0; res = poll(&pfd, 1, 5000); } while (res == 0); } } } } freeaddrinfo(lips); } else { while (remain > 0) { ssize_t res; res = sctp_sendmsg(sctp_socket, ((uint8_t*)data) + (len - remain), remain, NULL, 0, ppid, flags | SCTP_EOR, stream, 0, 0); if (res > 0) { remain -= res; } else { if (errno != EAGAIN && errno != ENOBUFS) { return; } else { int res; do { struct pollfd pfd; pfd.fd = sctp_socket; pfd.events = POLLOUT; pfd.revents = 0; res = poll(&pfd, 1, 5000); } while (res == 0); } } } } } > > > > From time to time (1-5 days) I notice that a module won't get updates > > anymore. > > I see that the association got a SCTP_SENDER_DRY_EVENT event. > > Therefor my expectation is that there is nothing to send. > > I still see the association in the socket list and also receive the regular > > temperature data. > > However, obviously sending won't happen. > > The other modules still get data. > > > > When I restart the client daemon, things start to work again. > > > > Currently I'm clueless on how to debug this problem any fsurther. > Can you try stable/12? > > Doing a full network log would be too big and I'm not very experienced > > to understand the SCTP packets. > > I have no idea to see what data is in the send buffer. > > netstat with TCP would show send and receive queue, not so with SCTP. > > Data is send with a single thread, which sctp_sendmsgx() the data for > > all modules sequencially into the same socket. > I'm not sure I understood what you are actually doing on which socket > and how many associations are involved. Each of the 24 modules has a single socket with two associations from the client host. The client host has two daemon, which has a socket each and both sockets have an association to each of the 24 modules. This is the client host: Proto Type Local Address Foreign Address (state) sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe66:62de.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fecf:7cb7.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe4c:b9c9.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fee6:41f6.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec4:6a45.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe93:5ab4.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec6:aaea.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:feef:ba3.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe87:b229.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe82:9ece.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe96:bf2a.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe10:195b.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe46:cb7.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:feb5:65dc.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec5:30dd.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe52:54bc.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe8a:2fcb.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec6:5d6e.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe03:c920.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fecb:66a3.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe9c:9e54.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe7c:5702.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fef2:186c.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe4d:3de5.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fecb:66a3.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe66:62de.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fecf:7cb7.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe4c:b9c9.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fee6:41f6.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec4:6a45.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe93:5ab4.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec6:aaea.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:feef:ba3.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe87:b229.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe82:9ece.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe96:bf2a.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe10:195b.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe46:cb7.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:feb5:65dc.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec5:30dd.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe52:54bc.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe8a:2fcb.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec6:5d6e.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe03:c920.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe9c:9e54.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe7c:5702.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fef2:186c.1000 ESTABLISHED sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe4d:3de5.1000 ESTABLISHED This is one of the 24 modules: Proto Type Local Address Foreign Address (state) sctp46 1toN 127.0.0.1.1000 LISTEN fe80::1%lo0.1000 ::1.1000 2003:df:b017:115:ba27:ebff:fe87:b229.1000 fe80::ba27:ebff:fe87:b229%ue0.1000 10.215.74.118.1000 sctp46 1toN 2003:df:b017:115:ba27:ebff:fe87:b229.1000 2003:df:b017:115::100.10405 ESTABLISHED sctp46 1toN 2003:df:b017:115:ba27:ebff:fe87:b229.1000 2003:df:b017:115::100.38953 ESTABLISHED > > I havn't checked yet if I get an error with the write to the specific > > module IP. -- B.Walter <bernd@bwct.de> http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200717175055.GB79604>