Pipelines¶
MyOCR pipelines orchestrate multiple components (predictors, models) to perform end-to-end OCR tasks. They provide a high-level interface for processing images or documents.
Available Pipelines¶
CommonOCRPipeline
¶
Defined in myocr/pipelines/common_ocr_pipeline.py
.
This pipeline performs standard OCR: text detection, optional text direction classification, and text recognition.
Initialization:
from myocr.pipelines import CommonOCRPipeline
from myocr.modeling.model import Device
# Initialize pipeline for GPU (or 'cpu')
pipeline = CommonOCRPipeline(device=Device('cuda:0'))
Configuration:
The pipeline loads configuration from myocr/pipelines/config/common_ocr_pipeline.yaml
. This file specifies the paths to the ONNX models used for detection, classification, and recognition relative to the MODEL_PATH
defined in myocr.config
.
# Example: myocr/pipelines/config/common_ocr_pipeline.yaml
model:
detection: "dbnet++.onnx"
cls_direction: "cls.onnx"
recognition: "rec.onnx"
Processing:
The __call__
method takes the path to an image file.
image_path = 'path/to/your/image.png'
ocr_results = pipeline(image_path)
if ocr_results:
# Access recognized text and bounding boxes
print(ocr_results)
Workflow:
- Loads the image.
- Uses
TextDetectionPredictor
to find text regions. - Uses
TextDirectionPredictor
to classify the orientation of detected regions. - Uses
TextRecognitionPredictor
to recognize the text within each oriented region. - Returns a result object containing bounding boxes, text, and potentially confidence scores (details depend on the
Predictor
implementation).
StructuredOutputOCRPipeline
¶
Defined in myocr/pipelines/structured_output_pipeline.py
.
This pipeline extends CommonOCRPipeline
by adding a step to extract structured information (e.g., JSON) from the recognized text using a large language model (LLM) via the OpenAiChatExtractor
.
Initialization:
Requires a device and a Pydantic model defining the desired JSON output schema.
from myocr.pipelines import StructuredOutputOCRPipeline
from myocr.modeling.model import Device
from pydantic import BaseModel, Field
# Define your desired output structure
class InvoiceData(BaseModel):
invoice_number: str = Field(description="The invoice number")
total_amount: float = Field(description="The total amount due")
due_date: str = Field(description="The payment due date")
# Initialize pipeline
pipeline = StructuredOutputOCRPipeline(device=Device('cuda:0'), json_schema=InvoiceData)
Configuration:
This pipeline loads its specific configuration from myocr/pipelines/config/structured_output_pipeline.yaml
, which includes settings for the OpenAiChatExtractor
(LLM model name, API base URL, API key).
# Example: myocr/pipelines/config/structured_output_pipeline.yaml
chat_bot:
model: "gpt-4o"
base_url: "https://api.openai.com/v1"
api_key: "YOUR_API_KEY"
Processing:
The __call__
method takes an image path.
image_path = 'path/to/your/invoice.pdf'
structured_data = pipeline(image_path)
if structured_data:
print(structured_data)
Workflow:
- Performs standard OCR using the inherited
CommonOCRPipeline
to get the raw recognized text. - If text is found, it passes the text content to the
OpenAiChatExtractor
. - The extractor interacts with the configured LLM, providing the text and the desired
json_schema
(Pydantic model) as instructions. - The LLM attempts to extract the relevant information and format it according to the schema.
- Returns an instance of the provided Pydantic model populated with the extracted data.
Performance Optimization¶
Batch Processing¶
GPU Acceleration¶
Memory Management¶
Error Handling¶
Pipelines handle various error cases:
- Invalid image format
- Missing model files
- GPU out of memory
- Invalid configuration
See the Troubleshooting Guide for common issues and solutions.