NodeJS

Parse large JSON file in NodeJS


Handling Huge JSON files in Node.js

Parsers of large JSON file in Node.js are pretty resource intensive, and things get worse with a file size larger than what fits into the memory. Reading large files into memory can even incur an out of memory error or be really slow. To overcome this, Node.js offers stream based solutions which can make it really possible to handle large JSON files in a very memory efficient manner. It talks about some of the best practices and tools provided in Node.js with respect to parsing large JSON files.

Why is it so hard to parse large JSON files?

The problem is not only in terms of file size but rather in the nature of the JSON format itself. Because JSON tends to be hierarchical, it contains nested objects or arrays, which makes the data hard to process incrementally. The process creates a possibility of running out of the maximum memory that Node.js can allow its processes to use, which could crash or downgrade the performance of your application.

For example: a 500MB JSON file could easily fill the memory of an average Node.js application. In such cases, streaming and chunk based processing would better scale up.

Approaches to Efficient Parsing

1. JSONStream for Streaming JSON Data

One of the most highly utilized libraries of Node.js when it comes to parsing large JSON files is the library that is known as JSONStream. This will allow you to stream the parsing of your JSON files so that instead of having the entire file loaded into memory, it streams the file in, line by line. Here's how to use JSONStream step by step

Installation

Install the JSONStream library by running the following command

 

npm install JSONStream 

Example Code: 

Parse large JSON file using JSONStream

Below, there is an example of how to parse a large JSON file using JSONStream

const fs = require('fs');const JSONStream = require('JSONStream');const es = require('event-stream');  // For handling additional streams// Function which returns read stream from the JSON fileconst getStream = function() {    const fileStream = fs.createReadStream('largefile.json', { encoding: 'utf8' });    return fileStream.pipe(JSONStream.parse('items.*'));  // Update path based on your file directory structure};// Pipe the data through the JSONStream parsergetStream()  .pipe(es.mapSync(function (data) {    console.log(data); // Process each item as it's parsed  }))  .on('error', function (err) {    console.error('Error occurred:', err);  }); 

In the code above

fs.createReadStream() reads a JSON file as a readable stream.

JSONStream.parse() feeds the chunked JSON processing. This depends on the JSON path you provide, like 'items.*' in case it is an array of objects that go under the item key.

event-stream: This processes the data in chunks, which then helps to process each chunk more.

This pattern keeps low your memory usage and does not make your application crash because of too large files size​.

2. Utilizing Event-Stream for More Elaborative Processing

JSONStream still performs the simple parsing operation quite efficiently. But if you do require some processing over each data chunk, event stream will be your ticket. Event stream is a good construct to accomplish flexible and powerful asynchronous handling of streamed data.

Example Code: Using Event Stream

As shown above, you could add more logic to process every chunk. For instance, after parsing the items into a database or any other task, move on to the following chunk

The sample below es.mapSync() applies a mapping function to each chunk synchronously. Using the control .pause() and .resume() manage streaming.

const fs = require('fs'); const JSONStream = require('JSONStream'); const es = require('event-stream'); // Creating a stream for parsing const getStream = function() { const fileStream = fs.createReadStream('largefile.json',  { encoding: 'utf8' }); return fileStream.pipe(JSONStream.parse('items.*')); }; // Process and save each item to a database (hypothetical) getStream() .pipe(es.mapSync(function (data) { console.log('Processing item:', data); processItem(data); this.pause(); return data;})) .on('error', function (err) { console.error('Error processing data:', err); }) .on('end', function () { console.log('Stream processing completed'); }); 

This is very handy if you need to do stuff like saving to a database or API calls between doing things to chunks

3. stream-json alternatives to JSONStream

Although JSONStream has been the default choice for many, there's a catch: The library hasn't been maintained actively since 2018. Thus, for modern applications, the newer libraries stream json would be preferable. For larger files, streamjson is faster and supports additional features

Install stream-json

Copy paste the following in your terminal to install streamjson

npm install stream-json 

Example Code with streamjson: Here’s how you can use streamjson to parse a large JSON file

const fs = require('fs');const { streamArray } = require('stream-json/streamers/StreamArray');const jsonStream = require('stream-json');// Read the large JSON file and stream itconst jsonStreamParser = jsonStream.createStream();fs.createReadStream('largefile.json')  .pipe(jsonStreamParser)  .pipe(streamArray())  .on('data', ({ key, value }) => {    console.log(value);  // Process each item  })  .on('end', () => {    console.log('File processing completed.');  }); 

This approach lets you process each element from the JSON file without loading everything in all at once, which keeps your application running smoothly and responding fast.

4. Error Handling and Best Practices

Use streams to gracefully handle errors. Use .on('error', callback) for errors that occur during parsing or streaming. Also, close your stream properly after processing through .on('end', callback).

With large files, you can split the in-memory data yourself if the default libraries don't do what you need. Using manual chunking, you might read some number of bytes at a time, processing them incrementally.

Conclusion

One can parse large JSON files with efficiency in Node.js by resorting to streaming techniques. The said libraries JSONStream, event stream, streamjson will help developers handle large files efficiently by reading data in chunks. This approach is better because it avoids memory problems and thus ensures that your application will not run into a memory issue. By using this method, you can process even the largest JSON files in your Node.js application without compromising on speed or reliability.

Ready to transform your business with our technology solutions?   Contact Us today to Leverage Our NodeJS Expertise.

0

NodeJS

Related Center Of Excellence