Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

Liste des GroupesRevenir à cl python 
Sujet : Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
De : 2QdxY4RzWzUUiLuE (at) *nospam* potatochowder.com
Groupes : comp.lang.python
Date : 01. Oct 2024, 16:47:24
Autres entêtes
Message-ID : <mailman.21.1727797649.3018.python-list@python.org>
References : 1 2 3 4 5 6
On 2024-09-30 at 21:34:07 +0200,
Regarding "Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API,"
Left Right via Python-list <python-list@python.org> wrote:

What am I missing?  Handwavingly, start with the first digit, and as
long as the next character is a digit, multipliy the accumulated result
by 10 (or the appropriate base) and add the next value.  Oh, and handle
scientific notation as a special case, and perhaps fail spectacularly
instead of recovering gracefully in certain edge cases.  And in the
pathological case of a single number with 60 billion digits, run out of
memory (and complain loudly to the person who claimed that the file
contained a "dataset").  But why do I need to start with the least
significant digit?
 
You probably forgot that it has to be _streaming_. Suppose you parse
the first digit: can you hand this information over to an external
function to process the parsed data? -- No! because you don't know the
magnitude yet.  What about two digits? -- Same thing.  You cannot
leave the parser code until you know the magnitude (otherwise the
information is useless to the external code).

If I recognize the first digit, then I *can* hand that over to an
external function to accumulate the digits that follow.

So, even if you have enough memory and don't care about special cases
like scientific notation: yes, you will be able to parse it, but it
won't be a streaming parser.

Under that constraint, I'm not sure I can parse anything.  How can I
parse a string (and hand it over to an external function) until I've
found the closing quote?

How much state can a parser maintain (before it invokes an external
function) and still be considered streaming?  I fear that we may be
getting hung up on terminology rather than solving the problem at hand.

Date Sujet#  Auteur
1 Oct 24 o Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API12QdxY4RzWzUUiLuE

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal