OCR Online

# Online OCR Examples --- ## 1) Client-side (HTML + CSS + JavaScript) β€” Tesseract.js Save the block below as `ocr.html` and open in your browser. It uses Tesseract.js from a CDN to run OCR inside the browser (no server required). ```html Online OCR (Tesseract.js)

Online OCR β€” Tesseract.js

Preview: image preview
OCR output:
(no text yet)
``` **Notes:** - Tesseract.js runs OCR in the browser β€” suitable for small images and privacy-preserving use (your images don't leave the browser). - To support other languages, replace `'eng'` with language codes and add `await worker.loadLanguage('ara'); await worker.initialize('ara');` etc. Language data may be loaded from the CDN and increases loading time. --- ## 2) Server-side Java example (Spring Boot + Tess4J) This example demonstrates a minimal Spring Boot app that accepts an image upload and returns extracted text using Tess4J (a Java wrapper around Tesseract). It requires you to install Tesseract on the server and add Tess4J as a dependency. ### pom.xml (relevant dependencies) ```xml org.springframework.boot spring-boot-starter-web net.sourceforge.tess4j tess4j 5.4.0 ``` > Make sure tesseract is installed on your machine and the `TESSDATA_PREFIX` environment variable points to the tessdata folder, or pass `setDatapath` to the instance. ### `OcrController.java` ```java package com.example.ocr; import net.sourceforge.tess4j.ITesseract; import net.sourceforge.tess4j.Tesseract; import net.sourceforge.tess4j.TesseractException; import org.springframework.http.MediaType; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; import org.springframework.web.multipart.MultipartFile; import java.io.File; import java.io.IOException; @RestController public class OcrController { @PostMapping(value = "/api/ocr", consumes = MediaType.MULTIPART_FORM_DATA_VALUE) public String ocr(@RequestParam("file") MultipartFile file) throws IOException { // Save uploaded file to temp File temp = File.createTempFile("upload-", ".img"); file.transferTo(temp); ITesseract tesseract = new Tesseract(); // Optional: set datapath to tessdata if not on PATH // tesseract.setDatapath("/usr/share/tesseract-ocr/4.00/tessdata"); tesseract.setLanguage("eng"); try { String result = tesseract.doOCR(temp); temp.delete(); return result; } catch (TesseractException e) { temp.delete(); return "ERROR: " + e.getMessage(); } } } ``` ### Run notes - Install Tesseract (e.g., `sudo apt install tesseract-ocr` on Ubuntu) and verify `tesseract --version`. - If Java fails to find tessdata, set `TESSDATA_PREFIX` environment variable or call `tesseract.setDatapath("/path/to/tessdata");` in code. - This server endpoint returns plain text. For production, add validation, content-type checks, error handling, and rate-limiting. --- ## Quick comparison & recommendations - **If you want simplest and fastest to try:** use the HTML/Tesseract.js file β€” drop it in a browser and test immediately. - **If you need server-side processing, scalability, or specialized pre/post-processing:** use the Java/Tess4J approach or call a cloud OCR API (Google Vision, AWS Textract, Azure Computer Vision) from your backend. --- If you want, I can: - Provide a version that supports drag & drop + multiple languages for the browser. - Provide a full Spring Boot project zip with build files and Dockerfile. Save the code you want and tell me which version you want extended.
Scroll to Top