2019-02-12

1936

#node

Maciej Cieślar

338

Feb 12, 2019 ⋅ 6 min read

How to extract text from an image using JavaScript

Maciej Cieślar A JavaScript developer and a blogger at mcieslar.com.

Many note-taking apps nowadays offer to take a picture of a document and turn it into text. I was curious and decided to dig a little deeper to see what exactly was going on.

Having done a little research I came across Optical Character Recognition — a field of research in pattern recognition and AI revolving around precisely what we are interested in, reading text from an image. There is a very promising JavaScript library implementing OCR called tesseract.js, which not only works in Node but also in a browser — no server needed!

I would like to focus on working out how to add tesseract.js to an application and then check how well it does its job by creating a function to mark all of the matched words in an image.

Here’s a link to the repository.

Tesseract.js

To add tesseract to a project we can simply type this in the terminal:

npm install tesseract.js

After importing it into our codebase everything should work as expected. At least according to the package’s docs. In reality, though, I kept getting an error about missing worker.js file, and since the docs and very thorough googling wasn’t of much help I used a workaround. I copied a file called worker.min.js from node_modules/tesseract.js, and pasted it to my public folder from which I serve my static files. After that I changed the path to the worker inside tesseract like so:

tesseract.workerOptions.workerPath = ‘http://localhost:8080/worker.min.js';

and everything worked correctly.

Application

Let’s create a simple application to recognize text in an image. We would like it to render the image twice. Once to show the user their original image of choice and once to highlight the words that were matched. Finally, we would also like for our app to display for the user the progress it has made thus far (at all times).

HTML markup

<label for="recognition-image-input">Choose image</label>
<input type="file" accept="image/jpeg, image/png" id="recognition-image-input" /><br />
<label for="recognition-confidence-input">Confidence</label>
<input type="number" max="100" min="0" id="recognition-confidence-input" value="70" /><br />
<label for="recognition-progress">File recognition progress:</label>
<progress id="recognition-progress" max="100" value="0">0%</progress>
<div id="recognition-text"></div>
<div id="recognition-images">
  <div id="original-image"></div>
  <div id="labeled-image"></div>
</div>

<input type=”file”> lets the user choose an image and <input type=”number”> — the desired confidence, which indicates how certain of the result would the user like the app to be. Matches which do not meet the confidence requirement won’t show up in the result. <progress> informs the user how far along the recognition is, <div id=”recognition-text”> shows the recognized text and <div id=”recognition-images”> works as a placeholder for the images.

By listening on the change event of the <input type=”file” /> we can get the user’s image of choice and render the results.

Before that, however, let’s save the references to the HTML elements in variables for the future code snippets to be more readable:

const recognitionImageInputElement = document.querySelector(
 '#recognition-image-input',
);
const recognitionConfidenceInputElement = document.querySelector(
 '#recognition-confidence-input',
);
const recognitionProgressElement = document.querySelector('#recognition-progress');
const recognitionTextElement = document.querySelector('#recognition-text');
const originalImageElement = document.querySelector('#original-image');
const labeledImageElement = document.querySelector('#labeled-image');

Listening on the change event

When the user selects an image on their computer the change event is fired.

The <input type=”file”> element has a property called files which holds all the files the user has selected. We are not accepting multiple files, however, so there will always be just one file at the 0th index.

recognitionImageElement.addEventListener('change', () => {
 if (!recognitionImageElement.files) {
   return null;
 }
const file = recognitionImageElement.files[0];
})

How to recognize an image

Tesseract has a method called recognize which accepts two arguments — an imageLike and options. An imageLike can be many things. In our case, we are going to use a File object that will be available to us once a user chooses an image. options are only used to set the language of the image or (in some advanced cases) to change the defaults of tesseract. We won’t, however, be interested in that here.

Every text recognized by tesseract has a confidence value (from 0 to 100) that tells us how sure tesseract is of the result.

A note about confidence

Confidence can be tricky because of two things.

First, paragraphs have their own confidence, as do words and symbols. The confidence of a line is equal to the lowest amongst confidences of its constituent words. By the same principle, the confidence of a word is equal to the confidence of a symbol tesseract is least confident about.

This means that just because the confidence of a line is low doesn’t necessarily mean that the whole line was misrecognized — it could be just one word that is causing trouble.

Over 200k developers use LogRocket to create better digital experiences

Learn more →

Secondly, confidence indicates how much an object resembles a certain character.

If the image is, for instance, somebody’s face then the iris of their eye might be mistaken for the letter ‘O’ with fairly high confidence. This often means that filtering out everything below a given confidence level will leave us with nothing but good matches.

Recognizing an image

Now that we have a file let’s extract text from it by calling the .recognize() method. Also, by adding a handler to the .progress() method we can update the <progress> element.

return tesseract
  .recognize(file, {
    lang: 'eng',
  })
  .progress(({ progress, status }) => {
    if (!progress || !status || status !== 'recognizing text') {
      return null;
    }
  const p = (progress * 100).toFixed(2);
  recognitionProgressElement.textContent = `${status}: ${p}%`;
  recognitionProgressElement.value = p;
})

Inside the .progress() handler we are given the following information, progress (which is a number ranging from 0 to 1) tells us how far along the processing is, and status which is simply a message telling us what’s going on.

We multiply progress by a hundred, so that as a result in status we see 50 instead of 0.50.

Dealing with the result

The result of the .recognition() method is confusing, to say the least. It is not well documented and so we have to deduce some things on our own:

{
    blocks: Array[1]
    confidence: 87
    html: "<div class='ocr_page' id='page_1' ..."
    lines: Array[3]
    oem: "DEFAULT"
    paragraphs: Array[1]
    psm: "SINGLE_BLOCK"
    symbols: Array[33]
    text: "Hello World↵from beyond↵the Cosmic Void↵↵"
    version: "3.04.00"
    words: Array[7]
}

html is the extracted text embedded into HTML tags. text is the extracted text, paragraphs, words and symbols (which are paragraphs, words and characters in the text respectively) are arrays of objects that look something like this:

We are going to use the paragraphs property to show the extracted text to the user inside the <p> elements, and the words property to create black-bordered boxes and place them on the second picture to show the user exactly what the positions were of the matched words.

Showing extracted text to the user

We want to render the paragraphs to the user and the best way to do so is to create a <p> element for each paragraph. A paragraph has a text property that can be set as the <p> element’s textContent.

Inside the previously created <div id=”#recognition-text”> element we can render the paragraphs with the .append() method:

const paragraphsElements = res.paragraphs.map(({ text }) => {
  const p = document.createElement('p');
  p.textContent = text;
  return p;
});
recognitionTextElement.append(...paragraphsElements);

Rendering images

To render the images we have to create them first because so far we only have the <div> elements that work as containers:

const originalImage = document.createElement('img');

const labeledImage = originalImage.cloneNode(true);

There is a little problem, however, with setting their src property as we don’t have the URL that points to the image — instead we have a File object.

To render a File object inside the <img> tag we have to use the FileReader constructor like this:

const setImageSrc = (image: HTMLImageElement, imageFile: File) => {
 return new Promise((resolve, reject) => {
   const fr = new FileReader();
   fr.onload = function() {
     if (typeof fr.result !== 'string') {
       return reject(null);
     }
     image.src = fr.result;
     return resolve();
   };
   fr.onerror = reject;
   fr.readAsDataURL(imageFile);
 });
};

We pass the File object to the .readAsDataURL() method and then wait for the handler passed to the .onload() method to fire with the result. The result can now be set as the src of the image.

The code will look like this:

const originalImage = document.createElement('img');
await setImageSrc(originalImage, file);
const labeledImage = originalImage.cloneNode(true);

Marking the matched words

To show the box on every matched word we have to first filter out every word whose confidence is below the value previously set (inside the <input id=”recognition-confidence-input”> element):

const wordsElements = res.words
  .filter(({ confidence }) => {
    return confidence > parseInt(recognitionConfidenceInputElement.value, 10);
})

Then, thanks to a bbox property that is available on each word object we know the coordinates of every matched word. The coordinates are x0, x1, y0 and y1, where:

x0 — start of the word on the horizontal axis, it becomes the left CSS property

Testing it out

I have taken a screenshot of my recent post to see how well it handles a well-formatted text on a single-color background.

Original image:

Labeled image:

Here is the extracted text:

Recently on Facebook David Smooke (the CEO of Hackernoon) posted an article in which he listed 2018’s Top Tech Stories. He also mentioned that if someone wished to make a similar list about say JavaScript he would be happy to feature it on the frontpage of Hackernoon.

In a constant struggle to get more people to read my work I could not miss this opportunity, sol immediately started to plan how to approach making such a list.

And there you have it!

Conclusion

The tesseract.js library provides us with a ready-to-use OCR implementation that is efficient and, for the most part, accurate. The additional advantage of the library is its immense flexibility thanks to being compatible with both Node.js and a browser. There is even an option to include custom training data which could make it work better for your specific applications.

200s only Monitor failed and slow network requests in production

Deploying a Node-based web app or website is the easy part. Making sure your Node instance continues to serve resources to your app is where things get tougher. If you’re interested in ensuring requests to the backend or third-party services are successful, try LogRocket.

LogRocket lets you replay user sessions, eliminating guesswork around why bugs happen by showing exactly what users experienced. It captures console logs, errors, network requests, and pixel-perfect DOM recordings — compatible with all frameworks.

LogRocket's Galileo AI watches sessions for you, instantly identifying and explaining user struggles with automated monitoring of your entire product experience.

LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. Start monitoring for free.

#node

A guide to wrapper vs. container classes in CSS

A breakdown of the wrapper and container CSS classes, how they’re used in real-world code, and when it makes sense to use one over the other.

Temitope Oyedele

Jul 7, 2025 ⋅ 10 min read

Stagehand and Gemini logos on a gradient background symbolizing AI web automation

How to build a web-based AI agent with Stagehand and Gemini

This guide walks you through creating a web UI for an AI agent that browses, clicks, and extracts info from websites powered by Stagehand and Gemini.

Elijah Asaolu

Jul 4, 2025 ⋅ 8 min read

Getting started with Claude 4 API: A developer’s walkthrough

This guide explores how to use Anthropic’s Claude 4 models, including Opus 4 and Sonnet 4, to build AI-powered applications.

Andrew Baisden

Jul 3, 2025 ⋅ 16 min read

AI dev tool power rankings & comparison [July 2025 edition]

Which AI frontend dev tool reigns supreme in July 2025? Check out our power rankings and use our interactive comparison tool to find out.

Chizaram Ken

Jul 2, 2025 ⋅ 3 min read

View all posts

2 Replies to "How to extract text from an image using JavaScript"

Prashant Rana says:

September 15, 2020 at 8:04 am

do you have a live demo for this?

Reply
Sean Holt says:

April 15, 2021 at 6:03 pm

Under “Listening on the change event”, “recognotionImageElement” is not defined. Not sure which you set wrong, but it either needs to be “recognitionImageElement” or “recognitionImageInputElement”.

Reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →