Models¶
This section provides details about the deep learning models used within the MyOCR project for tasks like text detection, recognition, and direction classification.
Model Loading and Management¶
MyOCR utilizes a flexible model loading system defined in myocr/modeling/model.py. It supports loading models in different formats:
- ONNX (
OrtModel): Loads and runs optimized models using the ONNX Runtime (onnxruntime). This is often preferred for inference due to performance benefits. - PyTorch (
PyTorchModel): Loads standard PyTorch models, potentially leveraging pre-defined architectures from libraries liketorchvision. - Custom PyTorch (
CustomModel): Loads custom PyTorch models defined via YAML configuration files. These configurations specify the model's architecture, including backbones, necks, and heads, using components defined withinmyocr/modeling/. 
A ModelLoader class acts as a factory to instantiate the correct model type based on the specified format (onnx, pt, custom).
# Example (Conceptual)
from myocr.modeling.model import ModelLoader, Device
# Load an ONNX model for CPU inference
loader = ModelLoader()
onnx_model = loader.load(model_format='onnx', model_name_path='path/to/your/model.onnx', device=Device('cpu'))
# Load a custom model defined by YAML for GPU inference
custom_model = loader.load(model_format='custom', model_name_path='path/to/your/config.yaml', device=Device('cuda:0'))
Model Architectures¶
The myocr/modeling/ directory houses the building blocks for custom models:
architectures/: Defines the overall structure connecting backbones, necks, and heads. (e.g.,DBNet,CRNN).backbones/: Contains feature extraction networks (e.g.,ResNet,MobileNetV3).necks/: Includes feature fusion modules (e.g.,FPN- Feature Pyramid Network).heads/: Defines the final layers responsible for specific tasks (e.g., detection probability maps, sequence decoding).
Available Models¶
Text Detection (DBNet++)¶
- DBNet++: A state-of-the-art text detection model based on DBNet architecture
 - Input: RGB image
 - Output: Text region polygons
 - Features:
- High accuracy for arbitrary-shaped text
 - Fast inference speed
 - Robust to various text orientations
 
 - Architecture:
 
Text Recognition (CRNN)¶
- CRNN: A hybrid CNN-RNN model for text recognition
 - Input: Cropped text region
 - Output: Recognized text
 - Features:
- Supports Chinese and English characters
 - Handles variable-length text
 - Robust to different fonts and styles
 
 - Architecture:
 
Text Classification Models¶
- Text Direction Classifier: Determines text orientation
 - Input: Text region
 - Output: Orientation angle
 - Features:
- 0° and 180° classification
 - Helps improve recognition accuracy
 
 
Model Performance¶
- coming