Sujet : Re: Printing UTF-8 mail to terminal
De : loris.bennett (at) *nospam* fu-berlin.de (Loris Bennett)
Groupes : comp.lang.pythonDate : 01. Nov 2024, 08:11:30
Autres entêtes
Organisation : FUB-IT, Freie Universität Berlin
Message-ID : <87msijo2cd.fsf@zedat.fu-berlin.de>
References : 1 2 3
User-Agent : Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cameron Simpson <
cs@cskk.id.au> writes:
On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
I have a command-line program which creates an email containing German
umlauts. On receiving the mail, my mail client displays the subject and
body correctly:
[...]
So far, so good. However, when I use the --verbose option to print
the mail to the terminal via
>
if args.verbose:
print(mail)
>
I get:
>
Subject: Übungsbetreff
>
Sehr geehrter Herr Dr. Bennett,
>
Dies ist eine =C3=9Cbung.
>
What do I need to do to prevent the body from getting mangled?
>
That looks to me like quoted-printable. This is an encoding for binary
transport of text to make it robust against not 8-buit clean
transports. So your Unicode text is encodings as UTF-8, and then that
is encoded in quoted-printable for transport through the email system.
As I mentioned, I think the problem is to do with the way the salutation
text provided by the "salutation server" and the mail body from a file
are encoded. This seems to be different.
Your terminal probably accepts UTF-8 - I imagine other German text
renders corectly?
Yes, it does.
You need to get the text and undo the quoted-printable encoding.
>
If you're using the Python email module to parse (or construct) the
message as a `Message` object I'd expect that to happen automatically.
I am using
email.message.EmailMessage
as, from the Python documentation
https://docs.python.org/3/library/email.examples.htmlI gathered that that is the standard approach.
And you are right that encoding for the actual mail which is received is
automatically sorted out. If I display the raw email in my client I get
the following:
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
...
Subject: =?utf-8?q?=C3=9Cbungsbetreff?=
...
Dies ist eine =C3=9Cbung.
I would interpret that as meaning that the subject and body are encoded
in the same way.
The problem just occurs with the unsent string representation printed to
the terminal.
Cheers,
Loris
-- This signature is currently under constuction.