Skip to content
Agent

AI Agent Integration Guide

by amitkshirsagar13

AI Summary

A practical integration guide for enhancing OCR text extraction with visual and language LLM capabilities using local Ollama models in Caption Extractor. Developers working with document processing, image analysis, and text correction workflows benefit from this reusable agent framework.

Install

Copy this and paste it into Claude Code, Cursor, or any AI assistant:

I want to set up the "AI Agent Integration Guide" agent in my project.

Please run this command in my terminal:
# Add AGENTS.md to your project root
curl --retry 3 --retry-delay 2 --retry-all-errors -o AGENTS.md "https://raw.githubusercontent.com/amitkshirsagar13/caption-extractor/main/docs/AI_AGENTS.md"

Then explain what the agent does and how to invoke it.

Description

This document explains how to use the AI agent features in Caption Extractor, which enhance OCR processing with visual LLM models using local Ollama.

Overview

The Caption Extractor now includes three processing stages that can be enabled or disabled independently: • OCR Processing - Traditional OCR using PaddleOCR • Image Agent - Visual LLM analysis for image description, scene, text, and story • Text Agent - LLM-based text correction and completion

Memory Usage

• OCR: ~2GB RAM • Image Agent (llava:latest): +4GB RAM • Text Agent (llama3.2:latest): +2GB RAM

AI Agent Integration Guide

This document explains how to use the AI agent features in Caption Extractor, which enhance OCR processing with visual LLM models using local Ollama.

Install Ollama

• Install Ollama from https://ollama.ai • Pull the required models: `bash

Discussion

0/2000
Loading comments...

Health Signals

MaintenanceCommitted 4mo ago
Stale
AdoptionUnder 100 stars
0 ★ · Niche
DocsREADME + description
Well-documented

GitHub Signals

Issues0
Updated4mo ago
View on GitHub
No License

My Fox Den

Community Rating

Sign in to rate this booster

Works With

Claude Code
Claude.ai