And to contribute to the democratization of AI, we trained an SLM that is at least 10x more economical than its competitors with similar or superior quality.

-> Now available on AWS Marketplace, to subscribe click here


-> Therefore, OCR is not just a technical tool — it is critical infrastructure for innovation in the Generative AI era.

-> Digitizing these archives is the first step to unlocking the true potential of generative AI.

Accessible, fast, and sustainable:
-> Our proposal is clear: to evolve the OCR market with a solution that delivers quality equivalent to or superior to systems based on LLMs, but with up to 10 times lower cost and unparalleled processing speed.


OCR with Agent Architecture.

Flexible, verticalized, and ready for generative AI.

-> Our solution is not just a text extractor.

-> It is a data transformation platform, capable of adapting to different contexts and sectors. The agent architecture allows each vertical — be it legal, educational, financial, or governmental — to have specific and optimized treatment.

Specific functionalities include:

  • Recognition of multiple choices in tests and forms
  • Direct processing of large PDF files
  • Automatic spell check on extracted text
  • Identification and separation of footers, headers, and margins
  • Integration with generative AI pipelines for model training
  • Support for metadata and semantic structuring

-> This flexibility allows companies to transform previously inaccessible archives into valuable digital assets, ready to feed AI models, generate insights, and accelerate decisions.

A green OCR

By using an Agent Architecture based on SLMs (Small Language Models), we can offer:

  • Advanced data post-processing functionalities, similar to tools like GPT-4 Vision and Document AI
  • Up to 10x lower operational cost
  • At least 10x reduction in CO₂ emissions, water consumption, and electricity usage

OCR as the engine of generative AI

-> By digitizing archives with accuracy and speed, Dharma-AI’s Smart OCR becomes the first link in the generative AI value chain. It prepares data, organizes content, and enables the training of models that can generate text, answer questions, summarize documents, and much more.

Comparison tables


* Google Vision also accepts PDFs but only from GCS (Google Storage) and up to 2k pages.
* Textract also accepts PDFs but only up to 3k pages of 500MB.
** With the addition of other services that significantly increase their prices