2023-01-04

#node

Jordan Irabor

9984

Jan 4, 2023 ⋅ 9 min read

Natural language processing with Node.js

Jordan Irabor Jordan is an innovative software developer with over five years of experience developing software with high standards and ensuring clarity and quality. He also follows the latest blogs and writes technical articles as a guest author on several platforms.

Editor’s note: This article was last updated on 4 January 2023 to ensure that all information is compatible with the latest version of Node.js and to add information about other NLP libraries, like NLP.js and Compromise.cool.

A Guide To Node.js Natural Language Processing

The internet facilitates a never-ending creation of large volumes of unstructured textual data. Luckily, we have modern systems that can make sense of this kind of data.

Modern computer systems can make sense of natural languages using an underlying technology called natural language processing (NLP).

Python is usually the go-to language when it comes to NLP because of its wealth of language processing packages, like the Natural Language Toolkit. However, JavaScript is growing rapidly and the existence of npm gives its developers access to a large number of packages, including packages to perform NLP for different languages.

In this article, we will focus on getting started with NLP using Node. We will be using a JavaScript library called natural. By adding the natural library to our project, our code will be able to parse, interpret, manipulate, and understand natural languages from user input.

This article will barely scratch the surface of NLP, but it will be useful for developers who already use NLP with Python and want to transition to achieve the same results with Node. Complete newbies will also learn a lot about NLP as a technology and its usage with Node.

Jump ahead:

What is natural language processing?
Installing the natural package
Stemming
Porter algorithm vs. Lancaster algorithm
Measuring the similarity between words (string distance)
Test classification
Sentiment analysis
Phonetic matching
Spell check
Other NLP libraries

What is natural language processing?

Natural language processing technology can process human language as input and perform one or more of the following operations:

Sentiment analysis (Is it a positive or negative statement?)
Topic classification (What is it about?)
Decide on what actions should be taken based on this statement
Intent extraction (What is the intention behind this statement?)

NLP is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

Significant implementations of NLP aren’t too far from us these days as most of our devices integrate AI, ML, and NLP to enhance human-to-machine communications. Here are some common examples of NLP in action.

Search engines

One of the most helpful technologies is the Google Search engine. You put in text and receive millions of related results as a response. This is possible because of the NLP technology that can make sense of the input and perform a series of logical operations. This is also what allows Google Search to understand your intent and suggest the proper spelling to you when you spell a search term incorrectly.

Intelligent virtual assistants

Virtual assistants such as Siri, Alexa, and Google Assistant show an advanced level of the implementation of NLP. After receiving verbal input from you, they can identify the intent, perform an operation, and send back a response in a natural language.

Smart chatbots

Chatbots can analyze large amounts of textual data and give different responses based on large data and their ability to detect intent. This gives the overall feel of a natural conversation and not one with a machine.

Spam filters

Have you noticed that email clients are constantly getting better at filtering spam emails out of your inbox? This is possible because the filter engines can understand the content of emails — mostly using Bayesian spam filtering — and decide if it’s spam or not.

The use cases above show that AI, ML, and NLP are already being used heavily on the web. Because humans interact with websites using natural languages, we should build our websites with NLP capabilities.

Prerequisites

Basic knowledge of Node.js
A system that is set up to run Node code

To code along with this article, you will need to create an index.js file and paste in the snippet you want to try, then run the file with Node. Let’s begin!

Over 200k developers use LogRocket to create better digital experiences

Learn more →

Installation

We can install natural by running the following command:

npm install natural

The source code to each of the following usage examples in the next section is available on GitHub. Feel free to clone it, fork it, or submit an issue.

Usage

Let’s learn how to perform some basic but important NLP tasks using natural.

Tokenization

Tokenization is the process of dividing/splitting input characters or words into smaller parts known as “tokens.” The tokens could be characters, words, or subwords. Tokenization is the initial step in natural language processing, which entails gathering data and breaking it into parts so that a machine can understand it.

For example, let’s look at the text string: The quick brown fox jumps over the lazy dog

The string isn’t implicitly segmented in spaces, as a natural language speaker would do. The raw input, the 43 characters, must be explicitly split into the 9 tokens with a given space delimiter (i.e., matching the string " " or regular expression /\s{1}/).

natural ships with a number of smart tokenizer algorithms that can break text into arrays of tokens. Here’s a code snippet showing the usage of the Word tokenizer:

// index.js

var natural = require('natural');
var tokenizer = new natural.WordTokenizer();

console.log(tokenizer.tokenize("The quick brown fox jumps over the lazy dog"));

Running this with Node gives the following output:

[ 'The',
  'quick',
  'brown',
  'fox',
  'jumps',
  'over',
  'the',
  'lazy',
  'dog' ]

Stemming

Stemming is the act of reducing a word to its word stem (also known as base or root form). Stemming is a feature of artificial intelligence retrieval and extraction as well as linguistic morphology. It is used by search engines to index words. For example, words such as cats, catlike, and catty will be stemmed down to the root word, cat.

Natural currently supports two stemming algorithms: Porter and Lancaster (Paice/Husk). Here’s a code snippet implementing stemming using the Porter algorithm:

// index.js

const natural = require('natural');

console.log(natural.PorterStemmer.tokenizeAndStem("I can see that we are going to be friends"))

From the code above, we use the tokenizeAndStem() method under the Porter algorithm to break the string into individual words and reduce each word to their base form. The result is an array of stemmed tokens:

[ 'go', 'friend' ]

N.B., in the result above, stop words have been removed by the algorithm. Stop words are words that are filtered out before the processing of natural language (e.g., be, an, and to are all stop words).

Porter algorithm vs. Lancaster algorithm

Porter algorithm is the most well-known and oldest stemming algorithm because it is the least aggressive. The word stems are reasonably clear and intelligible. Porter stemmer is a suffix stripping algorithm. In essence, it strips words down to their most basic forms using pre-defined principles. Porter stemmer employs more than 50 rules, organized into five phases and a few substeps, to eliminate frequent suffixes.

Some examples of the rules are:

ATOR -> (operator -> operate)
SSES -> SS (oppresses -> oppress)
S -> (rules -> rule)
ION -> (prediction -> predict)
ING -> (going -> go, coming -> come)

On the other hand, Lancaster is quite aggressive due to its tight word-chopping style, which makes it incredibly perplexing. Because the stems lose some of their relatability, it is the least used. More than 100 rules make up Lancaster stemmer, which is roughly twice as many as Porter stemmer. A different notation than Porter’s stemming rules was used by the authors to define the rules. Each rule consists of five parts, of which two are optional.

Some examples of the rules are:

“nois4j>”: Substitute the ending “sion” with “j” and apply the stemmer again
“sei3y>”: if the word ends with “ies”, then substitute the final three letters with “y” and then apply the stemmer again to shortened form
“mu*2.”: if the word ends with “um” and if the word is intact, then remove the last 2 letters and terminate

Measuring the similarity between words (string distance)

Natural provides an implementation of four algorithms for calculating string distance, Hamming distance, Jaro-Winkler, Levenshtein distance, and Dice coefficient. Using these algorithms, we can tell if two strings match or not. For the sake of this project, we will be using Hamming distance.

Hamming distance measures the distance between two strings of equal length by counting the number of different characters. The third parameter indicates whether the case should be ignored. By default, the algorithm is case sensitive.

Here’s a code snippet showing the usage of the Hemming algorithm for calculating string distance:

// index.js

var natural = require('natural');

console.log(natural.HammingDistance("karolin", "kathrin", false));
console.log(natural.HammingDistance("karolin", "kerstin", false));
console.log(natural.HammingDistance("short string", "longer string", false));

The output:

3
3
-1

The first two comparisons return 3 because three letters differ. The last one returns -1 because the lengths of the strings being compared are different.

Classification

Text classification, also known as text tagging, is the process of classifying text into organized groups. That is, if we have a new unknown statement, our processing system can decide which category it fits into the most based on its content.

Some of the most common use cases for automatic text classification include the following:

Sentiment analysis
Topic detection
Language detection

natural currently supports two classifiers: Naive Bayes and logistic regression. The following examples use the BayesClassifier class:

// index.js

var natural = require('natural');

var classifier = new natural.BayesClassifier();
classifier.addDocument('i am long qqqq', 'buy');
classifier.addDocument('buy the q\'s', 'buy');
classifier.addDocument('short gold', 'sell');
classifier.addDocument('sell gold', 'sell');
classifier.train();

console.log(classifier.classify('i am short silver'));
console.log(classifier.classify('i am long copper'));

In the code above, we trained the classifier on sample text. It will use reasonable defaults to tokenize and stem the text. Based on the sample text, the console will log the following output:

sell
buy

Sentiment analysis

Sentiment analysis, also known as opinion mining or emotion AI, is one of the most used applications of NLP, which identifies and extracts viewpoints from spoken or written language to ascertain the emotion of a person.

To assess if a piece of information is positive, negative, or neutral, sentiment analysis is utilized. Businesses use sentiment analysis to monitor brand awareness and consumer feedback in order to understand how well a product is performing and what is required to increase sales.

Natural supports algorithms that can calculate the sentiment of each piece of text by summing the polarity of each word and normalizing it with the length of the sentence. If a negation occurs, the result is made negative.

Here’s an example of its usage:

// index.js

var natural = require('natural');
var Analyzer = natural.SentimentAnalyzer;
var stemmer = natural.PorterStemmer;
var analyzer = new Analyzer("English", stemmer, "afinn");

// getSentiment expects an array of strings
console.log(analyzer.getSentiment(["I", "don't", "want", "to", "play", "with", "you"]));

The constructor has three parameters:

Language
Stemmer: to increase the coverage of the sentiment analyzer a stemmer may be provided
Vocabulary: sets the type of vocabulary. "afinn", "senticon" or "pattern" are valid values

Running the code above gives the following output:

0.42857142857142855 // indicates a relatively negative statement

Phonetic matching

Using natural, we can compare two words that are spelled differently but sound similar using phonetic matching. Here’s an example using the metaphone.compare() method:

// index.js

var natural = require('natural');
var metaphone = natural.Metaphone;
var soundEx = natural.SoundEx;

var wordA = 'phonetics';
var wordB = 'fonetix';

if (metaphone.compare(wordA, wordB))
    console.log('They sound alike!');

// We can also obtain the raw phonetics of a word using process()
console.log(metaphone.process('phonetics'));

We also obtained the raw phonetics of a word using process(). We get the following output when we run the code above:

They sound alike!
FNTKS

Spell check

Users may make typographical errors when supplying input to a web application through a search bar or an input field. Natural has a probabilistic spellchecker that can suggest corrections for misspelled words using an array of tokens from a text corpus.

Let’s explore an example using an array of two words (also known as a corpus) for simplicity:

// index.js

var natural = require('natural');

var corpus = ['something', 'soothing'];
var spellcheck = new natural.Spellcheck(corpus);

console.log(spellcheck.getCorrections('soemthing', 1)); 
console.log(spellcheck.getCorrections('soemthing', 2));

It suggests corrections (sorted by probability in descending order) that are up to a maximum edit distance away from the input word. A maximum distance of one will cover 80% to 95% of spelling mistakes. After a distance of two, it becomes very slow.

We get the following output from running the code:

[ 'something' ]
[ 'something', 'soothing' ]

Other NLP libraries

NLP.js

Created by the AXA group, NLP.js is an NLP package for bot development that supports 40 languages. It offers entity extraction, sentiment analysis, automatic language identification, and other features. It is the ideal Node.js library for creating chatbots:

const { NlpManager } = require('node-nlp');

const manager = new NlpManager({ languages: ['en'], forceNER: true });

// Adds the utterances and intents for the NLP
manager.addDocument('en', 'bye bye take care', 'greetings.bye');
manager.addDocument('en', 'okay see you later', 'greetings.bye');
manager.addDocument('en', 'hello', 'greetings.hello');
manager.addDocument('en', 'hi', 'greetings.hello');

// Train also the NLG
manager.addAnswer('en', 'greetings.bye', 'Till next time');
manager.addAnswer('en', 'greetings.bye', 'see you soon!');
manager.addAnswer('en', 'greetings.hello', 'Hey there!');
manager.addAnswer('en', 'greetings.hello', 'Greetings!');

// Train and save the model.
(async() => {
    await manager.train();
    manager.save();
    const response = await manager.process('en', 'I should go now');
    console.log(response);
})();

Compromise.cool

Compromise.cool is an extremely user-friendly and lightweight library. By converting text to data, it may be used to run NLP in your browser and make defensible conclusions. Compromise only functions in the English language.

Here is a simple code snippet:

import nlp from 'compromise'

var doc = nlp('Sam is coming')
doc.verbs().toNegative()
// 'Sam is not coming'

Wink

Wink offers NLP features for a variety of tasks, including enhancing negations, controlling elisions, generating ngrams, stems, and phonetic codes for tokens. It provides a collection of APIs for working with strings like names, sentences, paragraphs, and tokens, which are each represented as an array of strings or words. They carry out the necessary preprocessing for many ML applications, including classification and semantic search:

// Load wink-nlp-utils
var nlp = require( 'wink-nlp-utils' );

// Extract person's name from a string:
var name = nlp.string.extractPersonsName( 'Dr. Sarah Connor M. Tech., PhD. - AI' );
console.log( name );
// -> 'Sarah Connor'

// Remove stop words:
var t = nlp.tokens.removeWords( [ 'mary', 'had', 'a', 'little', 'lamb' ] );
console.log( t );
// -> [ 'mary', 'little', 'lamb' ]

Conclusion

Here’s a quick summary of what we’ve learned so far in this article:

Computer systems are getting smarter by the day and can extract meaning from large volumes of unstructured textual data using NLP
Python has a wealth of intelligent packages for performing AI, ML, and NLP tasks but JavaScript is growing really rapidly and its package manager has an impressive number of packages capable of processing natural language
Natural, a JavaScript package, is robust in performing NLP operations and has a number of algorithm alternatives for each task

The source code to each of the following usage examples in the next section is available on GitHub. Feel free to clone it, fork it, or submit an issue.

200s only Monitor failed and slow network requests in production

Deploying a Node-based web app or website is the easy part. Making sure your Node instance continues to serve resources to your app is where things get tougher. If you’re interested in ensuring requests to the backend or third-party services are successful, try LogRocket.

LogRocket lets you replay user sessions, eliminating guesswork around why bugs happen by showing exactly what users experienced. It captures console logs, errors, network requests, and pixel-perfect DOM recordings — compatible with all frameworks.

LogRocket's Galileo AI watches sessions for you, instantly identifying and explaining user struggles with automated monitoring of your entire product experience.

LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. Start monitoring for free.

#node

It’s time to break the cycle of developer elitism

Let’s talk about one of the greatest problems in software development: nascent developers bouncing off grouchy superiors into the arms of AI.

Lewis Cianci

Jun 4, 2025 ⋅ 9 min read

When to use Flexbox and when to use CSS Grid

Flexbox and Grid are the heart of modern CSS layouts. Learn when to use each and how they help build flexible, responsive web designs — no more hacks or guesswork.

Leonardo Maldonado

Jun 3, 2025 ⋅ 9 min read

Using CSS breakpoints for fluid, future-proof layouts

Responsive design is evolving. This guide covers media queries, container queries, and fluid design techniques to help your layouts adapt naturally to any screen size.

Rob O'Leary

Jun 3, 2025 ⋅ 13 min read

React `forwardRef` explained: Usage, alternatives, and React 19 update

ForwardRef lets you pass refs through components to access child DOM nodes directly — learn how and when to use it in React 18 and earlier.

Peter Ekene Eze

Jun 3, 2025 ⋅ 14 min read

View all posts

2 Replies to "Natural language processing with Node.js"

Socialays says:

June 12, 2021 at 4:34 am

Thanks for sharing article with us.

Reply
Rohan says:

April 9, 2024 at 12:53 am

Thank you for providing a comprehensive guide to getting started with NLP in JavaScript. Your effort is highly appreciated!

Reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

Natural language processing with Node.js

What is natural language processing?

Search engines

Intelligent virtual assistants

Smart chatbots

Spam filters

Prerequisites

Over 200k developers use LogRocket to create better digital experiences

Installation

Usage

Tokenization

Stemming

Porter algorithm vs. Lancaster algorithm

Measuring the similarity between words (string distance)

Classification

Sentiment analysis

Phonetic matching

Spell check

Other NLP libraries

NLP.js

Compromise.cool

Wink

Conclusion

200s only Monitor failed and slow network requests in production

Stop guessing about your digital experience with LogRocket

Recent posts:

It’s time to break the cycle of developer elitism

When to use Flexbox and when to use CSS Grid

Using CSS breakpoints for fluid, future-proof layouts

React `forwardRef` explained: Usage, alternatives, and React 19 update

2 Replies to "Natural language processing with Node.js"

Leave a ReplyCancel reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

What is natural language processing?

Search engines

Intelligent virtual assistants

Smart chatbots

Spam filters

Prerequisites

Over 200k developers use LogRocket to create better digital experiences

Installation

Usage

Tokenization

Stemming

Porter algorithm vs. Lancaster algorithm

Measuring the similarity between words (string distance)

Classification

Sentiment analysis

Phonetic matching

Spell check

Other NLP libraries

NLP.js

Compromise.cool

Wink

Conclusion

200s only Monitor failed and slow network requests in production

Stop guessing about your digital experience with LogRocket

Recent posts:

It’s time to break the cycle of developer elitism

When to use Flexbox and when to use CSS Grid

Using CSS breakpoints for fluid, future-proof layouts

React forwardRef explained: Usage, alternatives, and React 19 update

2 Replies to "Natural language processing with Node.js"

Leave a ReplyCancel reply

React `forwardRef` explained: Usage, alternatives, and React 19 update