Storing 4M+ json snippets Storing 4M+ json snippets
Page 1 of 2 12 LastLast
Posts 1 to 10 of 11
  1. #1

    Banned


    Join Date
    Sep 2020
    Posts
    50

    Default Storing 4M+ json snippets

    Hi all,

    I have a script that generates 4 million + snippets of json code.
    What would you agree on on being the best way to store them?

    I have considered:
    - json files - way too many even if split in subfolders
    - mongo DB
    - mySQL with json type field

    The aim is to be able to easy retrieve them for processing later.
    Language is python.

    Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?

  2. #2

    bored now

    eek's Avatar
    Join Date
    Jun 2010
    Location
    😂
    Posts
    25,839

    Default

    Your bonus question gives you the only 1 sane answer to your question but you haven't got there yet.
    merely at clientco for the entertainment

  3. #3

    Contractor Among Contractors

    _V_'s Avatar
    Join Date
    Aug 2006
    Posts
    1,791

    Default

    I would vote for MongoDB in this use case. Because BSON is the native format of documents stored in MongoDB, you can parse this and store in MongoDB as a queryable object.

    Of course if you just want to store them as strings you can choose pretty much any SQL database you like. One option if you don't want a server based DB is to insert them into a local SQLite database.

    SQLite Home Page
    You above all people should know probabilisitic analysis is a stochastic process of independent of deterministic variables subject to constant change....

  4. #4

    More fingers than teeth


    Join Date
    Apr 2008
    Posts
    17,261

    Default

    Hi SAS

  5. #5

    More fingers than teeth


    Join Date
    Apr 2008
    Posts
    17,261

    Default

    Ive done this exact same thing with cloud storage, both on azure and S3.

    About 600k folders with about 10 images and 15 json files in each.

  6. #6

    Fingers like lightning

    TheGreenBastard's Avatar
    Join Date
    Dec 2015
    Posts
    889

    Default

    Postgres + JSONB

  7. #7

    Super poster

    Hobosapien's Avatar
    Join Date
    Feb 2016
    Location
    LA - la la fantasy land
    Posts
    3,036

    Default

    Azure Table Storage for the Json snippets.
    Azure Blob container for the images.

    Link the json to the images using columns in the Azure table (one column for each image blob key).

    Use Python via Azure Functions to manipulate the data if want 'serverless'. i.e. M$ handle the infrastructure, availability, backup. Though regular sycing to a local or alternative cloud backup is a good idea.

    Not sure how much it may cost, so use the Azure pricing calculator based on your estimates for an idea.

    Sorted.
    Maybe tomorrow, I'll want to settle down. Until tomorrow, I'll just keep moving on.

  8. #8

    More time posting than coding


    Join Date
    May 2018
    Posts
    277

    Default

    Come out of old school and consider using

    - Azure Data Lake Storage (Gen2)
    - AWS S3
    - GCP Cloud Storage / Filestore.

    If you want to keep a copy without your permission and without being charged then go for Alibaba Cloud
    Last edited by BigDataPro; 23rd September 2020 at 15:59.

  9. #9

    More fingers than teeth

    OwlHoot's Avatar
    Join Date
    Jul 2005
    Posts
    14,778

    Default

    Quote Originally Posted by anim View Post
    Hi all,

    I have a script that generates 4 million + snippets of json code.
    What would you agree on on being the best way to store them?

    I have considered:
    - json files - way too many even if split in subfolders
    - mongo DB
    - mySQL with json type field

    The aim is to be able to easy retrieve them for processing later.
    Language is python.

    Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
    If you want to pack them away in the database pronto, but aren't too bothered about retrieval speed, then Cassandra would be a good choice.

    Edit: It's free (open source) and by now fairly mature
    Work in the public sector? Read the IR35 FAQ here

  10. #10

    More time posting than coding

    darrylmg's Avatar
    Join Date
    Sep 2012
    Location
    UK - South West
    Posts
    283

    Default

    Quote Originally Posted by anim View Post
    Hi all,

    I have a script that generates 4 million + snippets of json code.
    What would you agree on on being the best way to store them?

    I have considered:
    - json files - way too many even if split in subfolders
    - mongo DB
    - mySQL with json type field

    The aim is to be able to easy retrieve them for processing later.
    Language is python.

    Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
    We don't know how your app works.
    If you access the image first and then go looking for the related json, then you could embed the json in the image and save doing the subsequent lookup.
    Google for "steganography".
    Don't believe it, until you see it!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •