Strange deadlock observed tonight:
Last few successfully logged (and flushed to log file) log messages:
[2024-11-29 21:09:42.351] DEBUG[1493365]: imap_client.c:147 imap_poll: Polling 12 fds for IMAP session 0x7f55b1f845d0 (for 25s)
[2024-11-29 21:10:06.828] DEBUG[1493808]: readline.c:165 bbs_readline: bbs_poll_read returned 0
[2024-11-29 21:10:06.829] DEBUG[1493808]: net_imap.c:4892 handle_client: 0x7f559fff45d0 <= * BYE IMAP4rev1 server terminating connection
[2024-11-29 21:10:06.829] DEBUG[1493808]: mod_mail.c:232 mailbox_dispatch_event: Dispatching mailbox event 'Logout' (maildir: )
[2024-11-29 21:10:06.829] DEBUG[1493808]: node.c:2259 bbs_node_exit: Node 2 has ended its IMAP session
[2024-11-29 21:10:06.829] DEBUG[1493808]: io.c:414 bbs_io_teardown_all_transformers: Removing I/O transformer at index 0
Issue appears similar to this one: https://stackoverflow.com/questions/6697058/pthread-mutex-lock-locks-but-no-owner-is-set
In the backtrace, all the threads are blocked on line 533 of logger.c, which is RWLIST_RDLOCK(&remote_log_fds);
Since they are all trying to do a read lock, they all succeed, but what they are all blocking on is line 345 in lock.c
, since this is a mutex (the internal BBS lock mutex).
As can be seen from the gdb output (bottom), this mutex has no apparent owner (normally, the locking LWP would be listed here). So, there are no clues as to why we are failing to acquire the mutex. In the code, there is no way that this could still be held legitimately (according to the backtrace), and the lock doesn't seem to be held at all, so it's unclear why all attempts to acquire the mutex are blocked.
Of course, it's always possible a thread was cancelled while it was holding the mutex... seems unlikely, but I can't think of another explanation either.
(gdb) print t
$1 = (bbs_rwlock_t *) 0x55af29d6f910 <remote_log_fds+16>
$2 = {lock = {__data = {__readers = 104, __writers = 0, __wrphase_futex = 0, __writers_futex = 0, __pad3 = 0, __pad4 = 0, __cur_writer = 0, __shared = 0, __rwelision = 0 '\000',
__pad1 = "\000\000\000\000\000\000", __pad2 = 0, __flags = 0}, __size = "h", '\000' <repeats 54 times>, __align = 104}, intlock = {__data = {__lock = 2, __count = 0, __owner = 0,
__nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002", '\000' <repeats 38 times>, __align = 2}, info = {initialized = 1,
staticinit = 1, destroyed = 0, owners = 0, lastlocked = 1732932606, lwp = 1493808, lineno = 533, filename = "logger.c", '\000' <repeats 15 times>}}
Thread 22 (Thread 0x7f559fff76c0 (LWP 1493808) "lbbs"):
#0 futex_wait (private=0, expected=2, futex_word=0x55af29d6f948 <remote_log_fds+72>) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x55af29d6f948 <remote_log_fds+72>, private=0) at ./nptl/lowlevellock.c:49
#2 0x00007f55ebd2d3c2 in lll_mutex_lock_optimized (mutex=0x55af29d6f948 <remote_log_fds+72>) at ./nptl/pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=mutex@entry=0x55af29d6f948 <remote_log_fds+72>) at ./nptl/pthread_mutex_lock.c:93
#4 0x000055af29d1ef14 in __bbs_rwlock_rdlock (t=t@entry=0x55af29d6f910 <remote_log_fds+16>, filename=filename@entry=0x55af29d57ec6 "logger.c", lineno=lineno@entry=533, func=func@entry=0x55af29d58f9>
#5 0x000055af29d20916 in __bbs_log (loglevel=loglevel@entry=LOG_DEBUG, level=level@entry=6, file=file@entry=0x55af29d61d38 "thread.c", lineno=lineno@entry=327, func=func@entry=0x55af29d621e0 at logger.c:533
(gdb) print *mutex
$7 = {__data = {__lock = 2, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002", '\000' <repeats 38 times>,
__align = 2}
You must be