Models¶

This section provides details about the deep learning models used within the MyOCR project for tasks like text detection, recognition, and direction classification.

Model Loading and Management¶

MyOCR utilizes a flexible model loading system defined in myocr/modeling/model.py. It supports loading models in different formats:

ONNX (OrtModel): Loads and runs optimized models using the ONNX Runtime (onnxruntime). This is often preferred for inference due to performance benefits.
PyTorch (PyTorchModel): Loads standard PyTorch models, potentially leveraging pre-defined architectures from libraries like torchvision.
Custom PyTorch (CustomModel): Loads custom PyTorch models defined via YAML configuration files. These configurations specify the model's architecture, including backbones, necks, and heads, using components defined within myocr/modeling/.

A ModelLoader class acts as a factory to instantiate the correct model type based on the specified format (onnx, pt, custom).

# Example (Conceptual)
from myocr.modeling.model import ModelLoader, Device

# Load an ONNX model for CPU inference
loader = ModelLoader()
onnx_model = loader.load(model_format='onnx', model_name_path='path/to/your/model.onnx', device=Device('cpu'))

# Load a custom model defined by YAML for GPU inference
custom_model = loader.load(model_format='custom', model_name_path='path/to/your/config.yaml', device=Device('cuda:0'))

Model Architectures¶

The myocr/modeling/ directory houses the building blocks for custom models:

architectures/: Defines the overall structure connecting backbones, necks, and heads. (e.g., DBNet, CRNN).
backbones/: Contains feature extraction networks (e.g., ResNet, MobileNetV3).
necks/: Includes feature fusion modules (e.g., FPN - Feature Pyramid Network).
heads/: Defines the final layers responsible for specific tasks (e.g., detection probability maps, sequence decoding).

Available Models¶

Text Detection (DBNet++)¶

DBNet++: A state-of-the-art text detection model based on DBNet architecture
Input: RGB image
Output: Text region polygons
Features:
- High accuracy for arbitrary-shaped text
- Fast inference speed
- Robust to various text orientations

Architecture:

Backbone: ResNet
Neck: FPN
Head: DBHead

Text Recognition (CRNN)¶

CRNN: A hybrid CNN-RNN model for text recognition
Input: Cropped text region
Output: Recognized text
Features:
- Supports Chinese and English characters
- Handles variable-length text
- Robust to different fonts and styles
Architecture:
```
Backbone: CNN
Neck: BiLSTM
Head: CTC
```

Models¶

Model Loading and Management¶

Model Architectures¶

Available Models¶

Text Detection (DBNet++)¶

Text Recognition (CRNN)¶

Text Classification Models¶

Model Performance¶

Comments