Sujet : Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
De : grant.b.edwards (at) *nospam* gmail.com (Grant Edwards)
Groupes : comp.lang.pythonDate : 30. Sep 2024, 20:41:46
Autres entêtes
Message-ID : <mailman.10.1727721708.3018.python-list@python.org>
References : 1 2 3 4 5 6 7
User-Agent : slrn/1.0.3 (Linux)
On 2024-09-30, Dan Sommers via Python-list <
python-list@python.org> wrote:
On 2024-09-30 at 11:44:50 -0400,
Grant Edwards via Python-list <python-list@python.org> wrote:
>
On 2024-09-30, Left Right via Python-list <python-list@python.org> wrote:
[...]
Imagine a pathological case of this shape: 1... <60GB of digits>. This
is still a valid JSON (it doesn't have any limits on how many digits a
number can have). And you cannot parse this number in a streaming way
because in order to do that, you need to start with the least
significant digit.
Which is how arabic numbers were originally parsed, but when
westerners adopted them from a R->L written language, thet didn't
flip them around to match the L->R written language into which they
were being adopted.
>
Interesting.
>
So now long numbers can't be parsed as a stream in software. They
should have anticipated this problem back in the 13th century and
flipped the numbers around.
>
What am I missing? Handwavingly, start with the first digit, and as
long as the next character is a digit, multipliy the accumulated
result by 10 (or the appropriate base) and add the next value.
[...] But why do I need to start with the least significant digit?
Excellent question. That's actully a pretty standard way to parse
numeric literals. I accepted the claim at face value that in JSON
there is something that requires parsing numeric literals from the
least significant end -- but I can't think of why the usual algorithms
used by other languages' lexers for yonks wouldn't work for JSON.
-- Grant