SLRTR 000000: ZWJ in Sphinx

Liste des GroupesRevenir à cl python 
Sujet : SLRTR 000000: ZWJ in Sphinx
De : ram (at) *nospam* zedat.fu-berlin.de (Stefan Ram)
Groupes : comp.lang.python
Date : 05. May 2025, 12:34:52
Autres entêtes
Organisation : Stefan Ram
Message-ID : <SLRTR-000000-20250505123051@ram.dialup.fu-berlin.de>
The Use of the U+200D "ZERO WIDTH JOINER" (ZWJ) Character in
reStructuredText Input for "Sphinx"
(Technical Report SLRTR 000000)

(This technical report was prepared by the author during his spare
time.)

Stefan Ram
2025

Abstract - The character U+200D "ZERO WIDTH JOINER" (ZWJ) may be
employed in inputs written in the "reStructuredText" (rst) markup
notation for the software documentation tool "Sphinx" in order to permit
the inclusion of special characters within embedded code segments.
To ensure that Sphinx's automatic line breaking continues to function
correctly, two minor adjustments to Sphinx are required.

I. Introduction

The software documentation tool "Sphinx" accepts texts composed in the
"reStructuredText" (rst) notation. Within paragraphs, code segments are
denoted by enclosing the relevant text between pairs of grave accents
(``) as illustrated in Figure 1.

Figure 1: A Code Segment within a Paragraph

|... the expression ``x[ 2 ]`` may be used ...

Such segments are, however, subject to two restrictions:
- They must not begin or end with a space character (" ").
- They must not contain pairs of grave accents.

II. Versions of the Software Considered

This report pertains to Sphinx, version 8.2.3.

III. The U+200D ZERO WIDTH JOINER (ZWJ) Character as a Workaround

It is nevertheless possible to include a space at the beginning of
an embedded code segment by prefixing it with the invisible character
U+200D "ZERO WIDTH JOINER" (ZWJ). Similarly, a space may be appended
to the end of such a segment by suffixing it with a ZWJ. Furthermore,
a sequence of multiple grave accents within an embedded code segment
can be achieved by interposing a ZWJ between the grave accents.

The ZWJ character is invisible in Sphinx's output, or it may be
removed by means of post-processing if so desired.

IV. Consideration of ZWJ in Line Breaking and Word Division

Sphinx interprets a ZWJ as a character of width one and regards it as
a potential break point within words. Consequently, the formatting of
output text may be affected. This behavior can be modified by two
changes to the Sphinx source code.

A. Adjustment of Character Width

Within the Sphinx source file "docutils\utils\__init__.py", the width of
ZWJ characters should be subtracted from the total text width, so that
ZWJ is not counted as a character of length one. This is accomplished by
inserting the following line prior to the "return width" statement in
the definition of the column_width function:

Figure 2: The line to be inserted

|width -= text.count('\u200d')

B. Adjustment of Break Point Determination

(This adjustment is likely unnecessary for ZWJ within embedded code
segments, but may be required if ZWJ is used within words of running
text for any reason.)

In the Sphinx source file "sphinx\writers\text.py", words should not be
split at the occurrence of ZWJ within a word. To this end, the
definition shown in Figure 2 may be inserted below the definition of the
split function (which itself is within the definition of the _split
function in the TextWrapper class). The indentation of the new col_width
function should match that of the preceding split function.

Figure 3: The definition to be inserted

|def col_width(t: str) -> int:
|    '''for the purpose of word splitting, treat
|       zero-width characters just as characters
|       of width one.'''
|    width = column_width(t)
|    if width == 0: width = 1
|    return width

The source code should further be modified such that this new col_width
function is invoked in the call to "groupby" three lines below,
replacing the previous use of column_width.

(End of Technical Report)



Date Sujet#  Auteur
5 May 25 * SLRTR 000000: ZWJ in Sphinx3Stefan Ram
7 May 25 `* Re: SLRTR 000000: ZWJ in Sphinx2Mark Bourne
7 May 25  `- Re: SLRTR 000000: ZWJ in Sphinx1Stefan Ram

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal