Sujet : Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
De : 2QdxY4RzWzUUiLuE (at) *nospam* potatochowder.com
Groupes : comp.lang.pythonDate : 02. Oct 2024, 02:20:59
Autres entêtes
Message-ID : <mailman.25.1727828470.3018.python-list@python.org>
References : 1 2 3 4 5 6 7 8
On 2024-10-01 at 23:03:01 +0200,
Left Right <
olegsivokon@gmail.com> wrote:
If I recognize the first digit, then I *can* hand that over to an
external function to accumulate the digits that follow.
And what is that external function going to do with this information?
The point is you didn't parse anything if you just sent the digit.
You just delegated the parsing further. Parsing is only meaningful if
you extracted some information, but your idea is, essentially "what if
I do nothing?".
If the parser detects the first digit of a number, then the parser can
read digits one at a time (i.e., "streaming"), assimilate and accumulate
the value of the number being parsed, and successfully finish parsing
the number it reads a non-digit. Whether the function that accumulates
the value during the process is internal or external isn't relevant; the
point is that it is possible to parse integers from most significant
digit to least significant digit under a streaming model (and if you're
sufficiently clever, you can even write partial results to external
storage and/or another transmission protocol, thus allowing for numbers
bigger (as measured by JSON or your internal representation) than your
RAM).
At most, the parser has to remember the non-digit character it read so
that it (the parser) can begin to parse whatever comes after the number.
Does that break your notion of "streaming"?
Why do I have to start with the least significant digit?
Under that constraint, I'm not sure I can parse anything. How can I
parse a string (and hand it over to an external function) until I've
found the closing quote?
Nobody says that parsing a number is the only pathological case. You,
however, exaggerate by saying you cannot parse _anything_. You can
parse booleans or null, for example. There's no problem there.
My intent was only to repeat what you implied: that any parser that
reads its input until it has parsed a value is not streaming.
So how much information can the parser keep before you consider it not
to be "streaming"?
[...]
In principle, any language that has infinite words will have the same
problem with streaming [...]
So what magic allows anyone to stream any JSON file over SCSI or IP?
Let alone some kind of "live stream" that by definition is indefinite,
even if it only lasts a few tenths of a second?
[...] If you ever pondered h/w or low-level
protocols s.a. SCSI or IP [...]
I spent a good deal of my career designing and implementing all manner
of communicaations protocols, from transmitting and receiving single
bits over a wire all the way up to what are now known as session and
presentation layers. Some imposed maximum lengths in certain places;
some allowed for indefinite amounts of data to be transferred from one
end to the other without stopping, resetting, or overflowing. And yet
somehow, the universe never collapsed.
If you believe that some implementation of fsync fails to meet a
specification, or fails to work correctly on files containign JSON, then
file a bug report.