Parsers of large JSON file in Node.js are pretty resource intensive, and things get worse with a file size larger than what fits into the memory. Reading large files into memory can even incur an out of memory error or be really slow. To overcome this, Node.js offers stream based solutions which can make it really possible to handle large JSON files in a very memory efficient manner. It talks about some of the best practices and tools provided in Node.js with respect to parsing large JSON files.
The problem is not only in terms of file size but rather in the nature of the JSON format itself. Because JSON tends to be hierarchical, it contains nested objects or arrays, which makes the data hard to process incrementally. The process creates a possibility of running out of the maximum memory that Node.js can allow its processes to use, which could crash or downgrade the performance of your application.
For example: a 500MB JSON file could easily fill the memory of an average Node.js application. In such cases, streaming and chunk based processing would better scale up.
1. JSONStream for Streaming JSON Data
One of the most highly utilized libraries of Node.js when it comes to parsing large JSON files is the library that is known as JSONStream. This will allow you to stream the parsing of your JSON files so that instead of having the entire file loaded into memory, it streams the file in, line by line. Here's how to use JSONStream step by step
Installation
Install the JSONStream library by running the following command
npm install JSONStream
Example Code:
Parse large JSON file using JSONStream
Below, there is an example of how to parse a large JSON file using JSONStream
const fs = require('fs');
const JSONStream = require('JSONStream');
const es = require('event-stream'); // For handling additional streams
// Function which returns read stream from the JSON file
const getStream = function() {
const fileStream = fs.createReadStream('largefile.json', { encoding: 'utf8' });
return fileStream.pipe(JSONStream.parse('items.*')); // Update path based on your file directory structure
};
// Pipe the data through the JSONStream parser
getStream()
.pipe(es.mapSync(function (data) {
console.log(data); // Process each item as it's parsed
}))
.on('error', function (err) {
console.error('Error occurred:', err);
});
In the code above
fs.createReadStream() reads a JSON file as a readable stream.
JSONStream.parse() feeds the chunked JSON processing. This depends on the JSON path you provide, like 'items.*' in case it is an array of objects that go under the item key.
event-stream: This processes the data in chunks, which then helps to process each chunk more.
This pattern keeps low your memory usage and does not make your application crash because of too large files size.
2. Utilizing Event-Stream for More Elaborative Processing
JSONStream still performs the simple parsing operation quite efficiently. But if you do require some processing over each data chunk, event stream will be your ticket. Event stream is a good construct to accomplish flexible and powerful asynchronous handling of streamed data.
Example Code: Using Event Stream
As shown above, you could add more logic to process every chunk. For instance, after parsing the items into a database or any other task, move on to the following chunk
The sample below es.mapSync() applies a mapping function to each chunk synchronously. Using the control .pause() and .resume() manage streaming.
const fs = require('fs');
const JSONStream = require('JSONStream');
const es = require('event-stream');
// Creating a stream for parsing
const getStream = function() {
const fileStream = fs.createReadStream('largefile.json',
{ encoding: 'utf8' }
);
return fileStream.pipe(JSONStream.parse('items.*'));
};
// Process and save each item to a database (hypothetical)
getStream()
.pipe(es.mapSync(function (data) {
console.log('Processing item:', data);
processItem(data);
this.pause();
return data;
}))
.on('error', function (err) {
console.error('Error processing data:', err); })
.on('end', function () {
console.log('Stream processing completed');
});
This is very handy if you need to do stuff like saving to a database or API calls between doing things to chunks
3. stream-json alternatives to JSONStream
Although JSONStream has been the default choice for many, there's a catch: The library hasn't been maintained actively since 2018. Thus, for modern applications, the newer libraries stream json would be preferable. For larger files, streamjson is faster and supports additional features
Install stream-json
Copy paste the following in your terminal to install streamjson
npm install stream-json
Example Code with streamjson: Here’s how you can use streamjson to parse a large JSON file
const fs = require('fs');
const { streamArray } = require('stream-json/streamers/StreamArray');
const jsonStream = require('stream-json');
// Read the large JSON file and stream it
const jsonStreamParser = jsonStream.createStream();
fs.createReadStream('largefile.json')
.pipe(jsonStreamParser)
.pipe(streamArray())
.on('data', ({ key, value }) => {
console.log(value); // Process each item
})
.on('end', () => {
console.log('File processing completed.');
});
This approach lets you process each element from the JSON file without loading everything in all at once, which keeps your application running smoothly and responding fast.
4. Error Handling and Best Practices
Use streams to gracefully handle errors. Use .on('error', callback) for errors that occur during parsing or streaming. Also, close your stream properly after processing through .on('end', callback).
With large files, you can split the in-memory data yourself if the default libraries don't do what you need. Using manual chunking, you might read some number of bytes at a time, processing them incrementally.
One can parse large JSON files with efficiency in Node.js by resorting to streaming techniques. The said libraries JSONStream, event stream, streamjson will help developers handle large files efficiently by reading data in chunks. This approach is better because it avoids memory problems and thus ensures that your application will not run into a memory issue. By using this method, you can process even the largest JSON files in your Node.js application without compromising on speed or reliability.
Ready to transform your business with our technology solutions? Contact Us today to Leverage Our NodeJS Expertise.
0