Newsportal USENET - Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

Sujet : Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
De : arj.python (at) *nospam* gmail.com (Abdur-Rahmaan Janhangeer)
Groupes : comp.lang.python
Date : 30. Sep 2024, 06:49:21

Autres entêtes

Message-ID : <mailman.1.1727675375.3018.python-list@python.org>
References : 1 2

Idk if you tried Polars, but it seems to work well with JSON data

import polars as pl
pl.read_json("file.json")

Kind Regards,

Abdur-Rahmaan Janhangeer
about <https://compileralchemy.github.io/> | blog
<https://www.pythonkitchen.com>
github <https://github.com/Abdur-RahmaanJ>
Mauritius

On Mon, Sep 30, 2024 at 8:00 AM Asif Ali Hirekumbi via Python-list <
python-list@python.org> wrote:

Dear Python Experts,
>
I am working with the Kenna Application's API to retrieve vulnerability
data. The API endpoint provides a single, massive JSON file in gzip format,
approximately 60 GB in size. Handling such a large dataset in one go is
proving to be quite challenging, especially in terms of memory management..
>
I am looking for guidance on how to efficiently stream this data and
process it in chunks using Python. Specifically, I am wondering if there’s
a way to use the requests library or any other libraries that would allow
us to pull data from the API endpoint in a memory-efficient manner.
>
Here are the relevant API endpoints from Kenna:
>
   - Kenna API Documentation
   <https://apidocs.kennasecurity.com/reference/welcome>
   - Kenna Vulnerabilities Export
   <https://apidocs.kennasecurity.com/reference/retrieve-data-export>
>
If anyone has experience with similar use cases or can offer any advice, it
would be greatly appreciated.
>
Thank you in advance for your help!
>
Best regards
Asif Ali
--
https://mail.python.org/mailman/listinfo/python-list
>

Date	Sujet	#		Auteur
30 Sep 24	Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API	1		Abdur-Rahmaan Janhangeer