Since commit https://github.com/InterLinked1/lbbs/commit/acfdcda3c1391de7faead7ec59ae42559fc5ec5d, test_terminals has become a flaky test:
[2024-11-09 01:17:44.923] WARNING: test.c:346 test_client_expect_eventually_buf: Failed to receive expected output at line 223: �� (read returned 0)
[2024-11-09 01:17:44.923] DEBUG: test.c:871 run_test: Test '/home/runner/work/lbbs/lbbs/tests/test_terminals.so' returned -1
[2024-11-10 01:01:43.552] DEBUG[36049]: thread.c:140 __thread_unregister: Thread 36050 has been joined by thread 36049 at node.c:617 node_shutdown()
[2024-11-10 01:01:43.552] WARNING: test.c:346 test_client_expect_eventually_buf: Failed to receive expected output at line 223: �� (read returned 0)
Output in a run leading up to the latter:
[2024-11-10 01:01:19.902] DEBUG[36040]: node.c:1116 bbs_node_update_winsize: Node 2's terminal now has 80 cols and 50 rows
[2024-11-10 01:01:19.902] DEBUG[36040]: node.c:1147 bbs_node_update_winsize: Skipping TIOCSWINSZ for winsize on node 2 (no active PTY allocation)
[2024-11-10 01:01:19.902] DEBUG[36040]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC DONT NAWS
[2024-11-10 01:01:20.202] DEBUG[36040]: socket.c:1456 bbs_poll: poll returned 0 on fd 12
[2024-11-10 01:01:20.202] DEBUG[36040]: net_telnet.c:103 telnet_read_command: poll returned 0
[2024-11-10 01:01:20.202] DEBUG[36040]: net_telnet.c:879 telnet_handshake: Request to enable ECHO not yet acknowledged, retrying
[2024-11-10 01:01:20.202] DEBUG[36040]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: YES, usq: EMPTY
[2024-11-10 01:01:20.202] DEBUG[36040]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC WONT ECHO
[2024-11-10 01:01:20.202] DEBUG[36040]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-10 01:01:20.202] DEBUG[36040]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC WILL ECHO
[2024-11-10 01:01:20.502] DEBUG[36040]: socket.c:1456 bbs_poll: poll returned 0 on fd 12
[2024-11-10 01:01:20.502] DEBUG[36040]: net_telnet.c:103 telnet_read_command: poll returned 0
[2024-11-10 01:01:20.502] DEBUG[36049]: thread.c:106 thread_register: Thread 36049 spawned from handler started by thread 36040 at socket.c:1098 __bbs_tcp_listener()
[2024-11-10 01:01:20.502] DEBUG[36049]: node.c:2189 bbs_node_begin: Running BBS for node 2
[2024-11-10 01:01:20.502] AUTH[36049]: node.c:2190 bbs_node_begin: New TELNET connection to node 2 from 127.0.0.1:44972
[2024-11-10 01:01:20.503] DEBUG[36049]: pty.c:273 bbs_pty_allocate: PTY thread 140142142555840 allocated for node 2
[2024-11-10 01:01:20.503] DEBUG[36049]: socket.c:325 bbs_set_fd_tcp_nodelay: Disabled Nagle's algorithm on socket 12
[2024-11-10 01:01:20.503] DEBUG[36049]: term.c:116 bbs_node_set_input: Node 2 (fd 13): input now unbuffered, echo disabled
[2024-11-10 01:01:20.503] DEBUG[36050]: thread.c:106 thread_register: Thread 36050 spawned from pty_master started by thread 36049 at pty.c:269 bbs_pty_allocate()
[2024-11-10 01:01:20.503] DEBUG[36050]: pty.c:543 pty_master: Starting PTY master for node 2: 9 => /dev/pts/1
[2024-11-10 01:01:22.829] DEBUG[36042]: socket.c:1674 bbs_node_poll: Node 1: poll returned 0
[2024-11-10 01:01:23.506] DEBUG[36049]: socket.c:1674 bbs_node_poll: Node 2: poll returned 0
[2024-11-10 01:01:23.506] DEBUG[36049]: node.c:1500 node_read_cursor_pos: No response to cursor position query after 3 seconds...
[2024-11-10 01:01:26.833] DEBUG[36042]: socket.c:1674 bbs_node_poll: Node 1: poll returned 0
[2024-11-10 01:01:27.510] DEBUG[36049]: socket.c:1674 bbs_node_poll: Node 2: poll returned 0
[2024-11-10 01:01:30.837] DEBUG[36042]: socket.c:1674 bbs_node_poll: Node 1: poll returned 0
[2024-11-10 01:01:31.514] DEBUG[36049]: socket.c:1674 bbs_node_poll: Node 2: poll returned 0
[2024-11-10 01:01:34.841] DEBUG[36042]: socket.c:1674 bbs_node_poll: Node 1: poll returned 0
[2024-11-10 01:01:35.518] DEBUG[36049]: socket.c:1674 bbs_node_poll: Node 2: poll returned 0
[2024-11-10 01:01:38.846] DEBUG[36042]: socket.c:1674 bbs_node_poll: Node 1: poll returned 0
[2024-11-10 01:01:38.846] -- No input received from node, disconnecting...
[2024-11-10 01:01:38.846] DEBUG[36042]: node.c:2161 node_handler_term: Exiting
[2024-11-10 01:01:38.846] DEBUG[36042]: node.c:2228 bbs_node_exit: Node 1 has ended its TELNET session
[2024-11-10 01:01:38.846] DEBUG[36042]: node.c:583 node_shutdown: Terminating node 1
[2024-11-10 01:01:38.846] DEBUG[36042]: term.c:116 bbs_node_set_input: Node 1 (fd 11): input now buffered, echo enabled
[2024-11-10 01:01:38.846] DEBUG[36042]: thread.c:327 __bbs_pthread_join: Attempting to join thread 140142163527360 (LWP 36043) at node.c:617 node_shutdown()
[2024-11-10 01:01:38.846] DEBUG[36043]: pty.c:877 pty_master: poll returned 1 (revent[1] = POLLNVAL)
[2024-11-10 01:01:38.846] DEBUG[36043]: pty.c:879 pty_master: PTY slave (server) closed the connection
[2024-11-10 01:01:38.846] DEBUG[36043]: pty.c:894 pty_master: PTY master exiting for node 1
[2024-11-10 01:01:38.846] DEBUG[36043]: thread.c:145 __thread_unregister: Thread 36043 is exiting (must be joined)
[2024-11-10 01:01:38.871] DEBUG[36042]: socket.c:1456 bbs_poll: poll returned 0 on fd 3
[2024-11-10 01:01:38.871] DEBUG[36042]: thread.c:140 __thread_unregister: Thread 36043 has been joined by thread 36042 at node.c:617 node_shutdown()
[2024-11-10 01:01:38.871] DEBUG[36042]: node.c:658 node_shutdown: Shutdown pending finalization for node 1
[2024-11-10 01:01:38.871] DEBUG[36042]: node.c:676 node_free: Node 1 now freed
[2024-11-10 01:01:38.871] == Node 1 has exited
[2024-11-10 01:01:38.871] DEBUG[36042]: thread.c:138 __thread_unregister: Thread 36042 is exiting (detached)
[2024-11-10 01:01:39.522] DEBUG[36049]: socket.c:1674 bbs_node_poll: Node 2: poll returned 0
[2024-11-10 01:01:43.526] DEBUG[36049]: socket.c:1674 bbs_node_poll: Node 2: poll returned 0
[2024-11-10 01:01:43.526] -- No input received from node, disconnecting...
[2024-11-10 01:01:43.526] DEBUG[36049]: node.c:2161 node_handler_term: Exiting
[2024-11-10 01:01:43.526] DEBUG[36049]: node.c:2228 bbs_node_exit: Node 2 has ended its TELNET session
[2024-11-10 01:01:43.526] DEBUG[36049]: node.c:583 node_shutdown: Terminating node 2
[2024-11-10 01:01:43.526] DEBUG[36049]: term.c:116 bbs_node_set_input: Node 2 (fd 13): input now buffered, echo enabled
[2024-11-10 01:01:43.526] DEBUG[36049]: thread.c:327 __bbs_pthread_join: Attempting to join thread 140142142555840 (LWP 36050) at node.c:617 node_shutdown()
[2024-11-10 01:01:43.527] DEBUG[36050]: pty.c:877 pty_master: poll returned 1 (revent[1] = POLLNVAL)
[2024-11-10 01:01:43.527] DEBUG[36050]: pty.c:879 pty_master: PTY slave (server) closed the connection
[2024-11-10 01:01:43.527] DEBUG[36050]: pty.c:894 pty_master: PTY master exiting for node 2
[2024-11-10 01:01:43.527] DEBUG[36050]: thread.c:145 __thread_unregister: Thread 36050 is exiting (must be joined)
[2024-11-10 01:01:43.552] DEBUG[36049]: socket.c:1456 bbs_poll: poll returned 0 on fd 3
[2024-11-10 01:01:43.552] DEBUG[36049]: thread.c:140 __thread_unregister: Thread 36050 has been joined by thread 36049 at node.c:617 node_shutdown()
[2024-11-10 01:01:43.552] WARNING: test.c:346 test_client_expect_eventually_buf: Failed to receive expected output at line 223: �� (read returned 0)
Side-by-side comparison of failing vs passing test:
FAIL:
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:847 telnet_handshake: Finished processing commands received at connection time
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC DO TERMINAL TYPE
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:134 telnet_read_command: Received Telnet command IAC WILL TERMINAL TYPE
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:485 __telnet_process_command: him: WANTYES, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:91 telnet_send_command6: Sent Telnet command: IAC SB TERMINAL TYPE ECHO IAC SE
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC DO NAWS
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:134 telnet_read_command: Received Telnet command IAC SB TERMINAL TYPE
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:756 __telnet_process_command: Terminal type is xterm
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:04:54.175] DEBUG[35984]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC DO TSPEED
[2024-11-13 01:04:54.176] DEBUG[35984]: net_telnet.c:134 telnet_read_command: Received Telnet command IAC SB NAWS
[2024-11-13 01:04:54.176] DEBUG[35984]: net_telnet.c:703 __telnet_process_command: Got 0 80 48 50 52 255
[2024-11-13 01:04:54.176] DEBUG[35984]: node.c:1116 bbs_node_update_winsize: Node 2's terminal now has 80 cols and 50 rows
[2024-11-13 01:04:54.176] DEBUG[35984]: node.c:1147 bbs_node_update_winsize: Skipping TIOCSWINSZ for winsize on node 2 (no active PTY allocation)
[2024-11-13 01:04:54.176] DEBUG[35984]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC DONT NAWS
PASS:
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:847 telnet_handshake: Finished processing commands received at connection time
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC DO TERMINAL TYPE
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:134 telnet_read_command: Received Telnet command IAC WILL TERMINAL TYPE
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:485 __telnet_process_command: him: WANTYES, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:91 telnet_send_command6: Sent Telnet command: IAC SB TERMINAL TYPE ECHO IAC SE
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC DO NAWS
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:134 telnet_read_command: Received Telnet command IAC SB TERMINAL TYPE
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:756 __telnet_process_command: Terminal type is xterm
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:212 telnet_process_command_additional: Processing additional Telnet command IAC SB NAWS
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:703 __telnet_process_command: Got 0 80 48 50 52 255
[2024-11-13 01:27:02.046] DEBUG[35731]: node.c:1116 bbs_node_update_winsize: Node 2's terminal now has 80 cols and 50 rows
[2024-11-13 01:27:02.046] DEBUG[35731]: node.c:1147 bbs_node_update_winsize: Skipping TIOCSWINSZ for winsize on node 2 (no active PTY allocation)
[2024-11-13 01:27:02.046] DEBUG[35731]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC DONT NAWS
[2024-11-13 01:27:02.347] DEBUG[35731]: socket.c:1456 bbs_poll: poll returned 0 on fd 9
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:103 telnet_read_command: poll returned 0
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC DO TSPEED
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:134 telnet_read_command: Received Telnet command IAC WILL TSPEED
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:485 __telnet_process_command: him: WANTYES, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:91 telnet_send_command6: Sent Telnet command: IAC SB TSPEED ECHO IAC SE
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:879 telnet_handshake: Request to enable ECHO not yet acknowledged, retrying
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: YES, usq: EMPTY
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC WONT ECHO
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:221 telnet_option_send: him: NO, himq: EMPTY, us: NO, usq: EMPTY
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:75 telnet_send_command: Sent Telnet command: IAC WILL ECHO
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:134 telnet_read_command: Received Telnet command IAC SB TSPEED
[2024-11-13 01:27:02.347] DEBUG[35731]: net_telnet.c:772 __telnet_process_command: Terminal speed is 38400,38400
Analysis:
Both TERMINAL TYPE and NAWS involve multiple exchanges if the client agrees to use the option.
In the case of TERMINAL TYPE, we first try to negotiate TERMINAL TYPE.
Then, we send IAC SB TERMINAL TYPE to probe the terminal type.
The response is IAC SB TERMINAL TYPE, with the terminal type.
Likewise for NAWS.
Problems with test_terminals can begin to arise if these are interleaved... for example:
<- IAC DO TERMINAL TYPE
-> IAC WILL TERMINAL TYPE
<- IAC SB TERMINAL TYPE
<- IAC DO NAWS
-> IAC SB TERMINAL TYPE
<- IAC DO TSPEED
-> IAC SB NAWS
<- IAC DONT NAWS
(echo not acknowledged branch)
<- IAC WONT ECHO
<- IAC WILL ECHO
(no further responses, spawn PTY)
(No response to cursor position query)
Notice that we sent IAC DO NAWS before we received IAC SB TERMINAL TYPE.
In this example, that was due to procesing the TERMINAL TYPE response above,
as when the option is negotiated, all that we do is send IAC SB TERMINAL TYPE...
but due to not waiting for the response, we would just proceed onwards
and then make the IAC DO NAWS request.
Now, this is perfectly legal, option negotiation does not need to be synchronous,
and this "pipelining" can make negotiation faster. However, if at each subsequent
stage we only read a single command, we'll then be "one step behind" in processing
commands, and the test will get out of synchronization and stall.
XXX And compounding this issue, not all command handlers are written to elegantly
handle receiving multiple responses at once (though the TERMINAL TYPE handler is).
For that reason, the below exchange would not cause the same issue:
<- IAC DO TERMINAL TYPE
-> IAC WILL TERMINAL TYPE
<- IAC SB TERMINAL TYPE
<- IAC DO NAWS
-> IAC SB TERMINAL TYPE , IAC SB NAWS
<- IAC DONT NAWS
<- IAC DO TSPEED
-> IAC WILL TSPEED
<- IAC SB TSPEED
<- IAC WONT ECHO
<- IAC WILL ECHO
-> IAC SB TSPEED
(no further responses, spawn PTY)
The main difference here is IAC SB TERMINAL TYPE , IAC SB NAWS are sent in the same TCP packet,
allowing them to be read in the same read() call and processed. As mentioned, the TERMINAL TYPE
handler has supported to recognize it received multiple commands and parse them out separately.
XXX Although again, this should be generic so that we could successfully handle multiple commands
at any stage. An easy way would be using bbs_readline with IAC SE as the delimiter when needed.
There are a few workarounds that mitigate the issue described above:
1. When we send IAC SB TERMINAL TYPE, explicitly wait for the IAC SB TERMINAL TYPE response
and process it before sending IAC DO NAWS.
2. After the read_and_process_command immediately following sending IAC DO NAWS.
check if we actually got a response for NAWS (either refusal of the option or the dimensions).
If not, we probably handled something else, and should try reading it again before moving on.
3. In the most event-driven format, continue onwards, and after reading all remaining input at the end,
if we have not finalized processing of a particular option, circle back to it then.
(Of the first two options, #2 is obviously better; it's not necessary to wait synchronously in #1)
Now, although it's NAWS in the example above, if we already had dimensions,
then it would be the same situation with TSPEED
Note also that TERMINAL TYPE involves us sending IAC SB TERMINAL TYPE,
but there is no corresponding IAC SB NAWS sent to the client.
For NAWS, if the option is negotiated, the client will send the dimensions.
For some reason, TERMINAL TYPE requires us to request it even after the option is negotiated,
so the overall "negotiation flow" differs slightly for these two options.
Circling back to the terminal tests, the tests sporadically fail in the first case above,
due to waiting for a IAC DO TSPEED, which is not received. The last synchronized thing it
receives is IAC DONT NAWS. However, that was after sending the TSPEED command in the failing case,
(as opposed to before, in the passing one, where it was processed in the NAWS block).
You must be
11/13/2024 6:54 PM — InterLinked
Autoresolved by code commit on GitHub:
net_telnet: Fix flaky test_terminals test.
https://github.com/InterLinked1/lbbs/commit/2aa33a2ecbbc536be40ca4c86cbaf914fdca6cd9