This is filed under LBBS for mod_webmail, but this applies equally to evergreen as well.
The current mechanism for extracting the plaintext/HTML message component is to parse the body manually and look at the content types. Instead, we should be able to make use of the BODYSTRUCTURE
command in IMAP to do this more efficiently, also minimizing unnecessary content downloads. This can also be used for parsing attachments out and sending those to the frontend.
Informative Reference: https://en.wikipedia.org/wiki/MIME#Multipart_messages
Edit: I've changed this from an Improvement to a Bug, because the current parsing does not work for certain messages, e.g. UID 1978 on INT.
[2025-08-05 13:45:00.891] DEBUG[285761]: imap_server_fetch.c:1045 process_fetch: 0x7f4fd0ff6870 <= 26 OK UID FETCH Completed
[2025-08-05 13:45:00.933] DEBUG[285760]: mod_webmail.c:2948 fetch_mime: FETCH result: want HTML=1, bodylen=1580
[2025-08-05 13:45:00.938] DEBUG[285760]: string.c:736 bbs_utf8_remove_invalid: Invalid UTF-8 sequence(1) (A0...) encountered (position 89)
[2025-08-05 13:45:00.942] DEBUG[285760]: string.c:736 bbs_utf8_remove_invalid: Invalid UTF-8 sequence(0) (20...) encountered (position 90)
[2025-08-05 13:45:00.946] WARNING[285760]: string.c:751 bbs_utf8_remove_invalid: 2 invalid UTF-8 sequences removed
[2025-08-05 13:45:00.950] WARNING[285760]: mod_webmail.c:3025 fetch_mime: Failed to encode body 0x7f4fd8040f00 (1579) as JSON
[2025-08-05 13:45:00.953] WARNING[285760]: mod_webmail.c:3263 handle_fetch: Message 1978 missing body
[2025-08-05 13:45:00.958] DEBUG[285760]: mod_webmail.c:3612 idle_start: Starting IDLE...
I've seen this before without UTF-8 errors as well, this is more of a specific case.
You must be