Mini-Language for hyphenation

Liste des GroupesRevenir à cl misc 
Sujet : Mini-Language for hyphenation
De : ram (at) *nospam* zedat.fu-berlin.de (Stefan Ram)
Groupes : comp.lang.misc
Date : 21. Jan 2025, 12:53:55
Autres entêtes
Organisation : Stefan Ram
Message-ID : <hyphenation-20250121121804@ram.dialup.fu-berlin.de>
  I foresee the need for a mini-language for hyphenation in
  my current plain-text paragraph wrapper project. Here are
  my plans, comments are welcome:

example

  . This is an unadorned word "example". The system might automatically
  insert possibilities for hyphenation from a hyphenation dictionary.

ex[[-]]am[[-]]ple

  Here, possibilities for hyphenation have been inserted. It
  is assumed that nested brackets occur so rarely in natural
  texts, that this possibility is negligible. But means for
  escaping will be discussed below.

ba[ck[k-|k]]en

  This is a hyphenation of a German word according to the
  rules from 1973. It's either "backen" or "bak-
  ken".

Bett[[-|t]]uch

  "Bettuch" or "Bett-
  tuch", according to spelling rules from 1973.

  So, the general pattern in my mini-language is:

[no-hyphenation text[pre-break text|post-break text]]

  .

Bett[t]uch

  When brackets occur in the text that do no satisfy the
  syntax of my mini-language, they will simply be left alone.
  I.e., this is just literally "Bett[t]uch" with a "t" to
  be "typeset" in literal brackets.

ba[ck[k-|k][-|ck@-99]]

  Here, two possibilities for hyphenation are given, the second one
  has a value of -99 added to the quality of the break, which means
  that "[k-|k]" will be preferred.

backen[[#]]

  This inserts an invisible marker of width zero that then may be found
  in the wrapped paragraph to learn on which line the "n" has ended.

b[[#97]]cken

  Here, the "a" is given by its code point number.

b[[#u61]]cken

  Here, the "a" is given by its code point number in hex notation.

  Escape Mechanisms

  In programming language, we may indeed have nested brackets as
  in "a[ b[ 20 ]]". Using the above notation, this can be written
  as "a[[#91]] b[[#91]] 20 [[#93]][[#93]]".

  My mini-language is intended to be a low-level mechanism
  for the specification of hyphenation rules. Higher-level
  formatting languages may be built on top of it, which may
  automatically convert "a[ b[ 20 ]]" into "a[[#91]] b[[#91]] 20
  [[#93]][[#93]]" when it appears in the context of source code.

  However, as a last ressort, one may use a special notation to
  redefine the characters of the mini-language:

[[#40=#91]]
[[#91=]]

  Above, the parenthesis "(" (40) is given the role of the bracket
  "[" (91), and then the bracket is defined to have no special role
  in the mini-language. (The value right of "=" always represents
  the role this symbol has in the /original/ mini-language.)



Date Sujet#  Auteur
21 Jan 25 * Mini-Language for hyphenation3Stefan Ram
21 Jan 25 +- Re: Mini-Language for hyphenation1David Brown
28 Jan 25 `- Re: Mini-Language for hyphenation1Tim Rentsch

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal