Converting a large text file into an sqlite database

Sometimes we need to deal with large text files that are more than some text editors can handle. I recently have to convert a large text file containing a list of words which contains more than 600,000 lines.  The file size was just under 100MB and to open it with a text editor was an exercise in frustration. 

I decided that the file should be converted into a database, so that it will become easier to search and analyse the data. In my searches I came across a Nodejs module called line-by-line that can facilitate reading a large text file one line at a time. Since I want the database to be more portable, I was thinking of using something like SQLite, which is SQL compatible, can be easily transported/installed and can be visually edited if necessary, with tools like SQLite Browser. For this purpose, there is a Nodejs module called sqlite3.

So I set out to create a simple Nodejs app combining the line-by-line module to create an SQLite database from the large text file. I started with a directory with a package.json file:

{
   "name": "text-to-db",
   "description": "convert text to an sqlite db",
   "version": "0.0.1",
   "private": true,
   "dependencies": {
      "line-by-line": "*",
      "sqlite3":"*"
   }
}

The in the terminal I run

npm install

This will install all the dependencies inside a “node_modules” sub directory. Next I create an app.js file with the following code:

var fs = require('fs');
var reader = require('line-by-line');
var sqlite3 = require("sqlite3").verbose();
var file = "data.db";
var exists = fs.existsSync(file);
var db = new sqlite3.Database(file);
db.serialize(function(){
    if(!exists) {
       db.run("CREATE TABLE words (id INT PRIMARY KEY,word TEXT)");
    }
});
var counter=0;
lr = new reader('large-file.txt');
lr.on('error', function (err) {
    console.log("error has occured="+err);
});
lr.on('line', function (line) {
    lr.pause();
    setTimeout(function(){
    db.serialize(function(){
       var stmt = db.prepare("INSERT INTO words VALUES (?,?)");
       //line is the text read from current line
       stmt.run(counter,line);
       stmt.finalize(); 
    }
 );
 //show what line we're up to
 console.log(counter+" "+line);
 lr.resume();
 }
 ,100);
 counter++;
});
lr.on('end', function () {
    //end of the line
    db.close();
});

Obviously this needs to be adjusted to your needs, but it just demonstrates the versatility of Nodejs in running tasks. The SetTimeOut() was added to allow for more readability in the console while the data is being processed.

To run the app, just type in the console:

node app.js

Hope this is of use to someone out there.