On 03/11/2024 05:07 PM, Lawrence D'Oliveiro wrote:
On Fri, 8 Mar 2024 21:36:14 -0800, Ross Finlayson wrote:
>
What I'd like to know about is who keeps dialing the "harmonization"
efforts, which really must give grouse to the "harmonisation"
spellers ...
>
Some words came from French and had “-ize”, others did not and had “-ise”.
Some folks in Britain decided to change the former to the latter.
>
“Televise”, “merchandise”, “advertise” -- never any “-ize” form.
>
“Synchronize”, “harmonize”, “apologize” -- “-ize” originally.
>
Hey thanks that's something I hadn't thought,
that the harmonization was coming from this
side of the pond besides vice-versa, with regards
to that "harmonization" is an effort in controlled
languages in terms of natural languages which
are organic though of course subject their extended
memory the written corpi, which I write corpi, not corpora.
It's like when the dictionary adds new words,
the old words are still words, in, the "Wortbuch",
an abstract dictionary of all the words, that I read
about in Curme. (I'm a fan of Tesniere and Curme.)
About parsing and re-writing systems, I'm really wondering
a lot about, compilation units, lines, spacing and indentation,
blocks, comments, quoting, punctuation, identifiers,
brackets, commas, and stops, how to write grammars
for all sorts usual source language in those, and result,
a novel sort of linear data structure above those,
in whatever languages so recognized in those,
and any sections it doesn't as the source text.
I looked around a bit and after re-writing on the Wiki
and "multi-pass parser" there are some sorts ideas,
usually in terms of fungible intermediate languages
for targeting those to whatever languages, here
though mostly to deal with a gamut of existing code,
there are lots of syntax recognizers and highlighters
and this kind of thing, "auto-detect" in the static
analysis toolkit, the languages, then as with regards to
that a given compilation unit is only gonna be one or
a few languages in it, with regards for example to
"code in text" or "text in code", about comments,
sections, blocks, or "language integrated code"
or "convenience code", "sugar modes", you know,
about what the _grammar_ specifications would be,
and the lexical and syntax the specifications, to
arrive at a multi-pass parser, that compiles a whole
bunch of language specs, finds which ones apply
where to the compilation unit, then starts building
them up "lifting" them above the character sequence,
building an "abstract syntax sequence" (yeah I know)
above that, then building a model of the productions
directly above that, that happens to be exactly derived
from the grammar productions, with the same sort
of structure as the grammar productions.
(Order, loop, optional, a superset of eBNF, to support
syntaxes with bracket blocks like C-style and syntaxes
with indent blocks though I'm not into that, the various
inversions of comments and code, the various interpolations
of quoting, brackets and grouping and precedence,
commas and joining and separating, and because SQL
doesn't really comport itself to BNF, these kinds of things.)
Of course it's obligatory that this would be about C/C++
and as with regards to Java which of course is in the
same style, or that its derivative, is for example that
M4/C/C++ code is already to a multi-pass parser, and,
Java at some point added language features which
fundamentally require a multi-pass parser, so it's not
like the entire resources of the mainframe has to fit
a finite-state-machine on the read-head, in fact at
compile-time specifically there's "it's fair to consider
a concatenation of the compilation units as a linear
input in space", then figuring the "liftings" are linear
in that, in space, then that the productions whence
derived are as concise as the productions a minimal
model, thus discardable the intermediate bit, is for
introducing a sort of common model of language
representation, source language, for reference
implementations of the grammars, then to make
the act of ingestion of sources in languages as a
first-class kind of thing, I'm looking for one of those,
and that's about as much I've figured out it is.
It's such a usual idea I must imagine that it's
commonplace, as it's just the very most simple
act of the model of iterating these things and
reading them out.
I probably might not care about it but getting
to where it takes a parser that can parse SQL
for example, or, you know, when there are lots
of source formats but it's just data and definitions,
yeah if you know that there's like a very active
open project in that I'd be real interested in a
sort of "source/object/relational mapping", ...,
as it were, "source/grammatical-production mapping",
what results you identify grammars and pick sources
and it prints out the things.
I'm familiar with the traditional approaches,
and intend to employ them. I figure this
must be a very traditional approach if
nobody's heard of it.