Sujet : Re: How to manage accented characters in mail header?
De : pkpearson (at) *nospam* nowhere.invalid (Peter Pearson)
Groupes : comp.lang.pythonDate : 04. Jan 2025, 16:00:21
Autres entêtes
Message-ID : <ltt0o4FlcuoU1@mid.individual.net>
References : 1
User-Agent : slrn/1.0.3 (Linux)
On Sat, 4 Jan 2025 14:31:24 +0000, Chris Green <
cl@isbd.net> wrote:
I have a Python script that filters my incoming E-Mail. It has been
working OK (with various updates and improvements) for many years.
>
I now have a minor new problem when handling E-Mail with a From: that
has accented characters in it:-
>
From: Sébastien Crignon <sebastien.crignon@amvs.fr>
>
>
I use Python mailbox to parse the message:-
>
import mailbox
...
...
msg = mailbox.MaildirMessage(sys.stdin.buffer.read())
>
Then various mailbox methods to get headers etc.
I use the following to get the From: address:-
>
str(msg.get('from', "unknown").lower()
>
The result has the part with the accented character wrapped as follows:-
>
From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon@amvs.fr>
>
>
I know I have hit this issue before but I can't rememeber the fix. The
problem I have now is that searching the above doesn't work as
expected. Basically I just need to get rid of the ?utf-8? wrapped bit
altogether as I'm only interested in the 'real' address. How can I
easily remove the UTF8 section in a way that will work whether or not
it's there?
This seemed to work for me:
import email.header
text, encoding = email.header.decode_header(some_string)[0]
-- To email me, substitute nowhere->runbox, invalid->com.