AI SummaryA practical integration guide for enhancing OCR text extraction with visual and language LLM capabilities using local Ollama models in Caption Extractor. Developers working with document processing, image analysis, and text correction workflows benefit from this reusable agent framework.
Install
Copy this and paste it into Claude Code, Cursor, or any AI assistant:
I want to set up the "AI Agent Integration Guide" agent in my project. Please run this command in my terminal: # Add AGENTS.md to your project root curl --retry 3 --retry-delay 2 --retry-all-errors -o AGENTS.md "https://raw.githubusercontent.com/amitkshirsagar13/caption-extractor/main/docs/AI_AGENTS.md" Then explain what the agent does and how to invoke it.
Description
This document explains how to use the AI agent features in Caption Extractor, which enhance OCR processing with visual LLM models using local Ollama.
Overview
The Caption Extractor now includes three processing stages that can be enabled or disabled independently: • OCR Processing - Traditional OCR using PaddleOCR • Image Agent - Visual LLM analysis for image description, scene, text, and story • Text Agent - LLM-based text correction and completion
Memory Usage
• OCR: ~2GB RAM • Image Agent (llava:latest): +4GB RAM • Text Agent (llama3.2:latest): +2GB RAM
AI Agent Integration Guide
This document explains how to use the AI agent features in Caption Extractor, which enhance OCR processing with visual LLM models using local Ollama.
Install Ollama
• Install Ollama from https://ollama.ai • Pull the required models: `bash
Discussion
Health Signals
My Fox Den
Community Rating
Sign in to rate this booster