Skip to content

Inference via REST API

MyOCR provides a built-in RESTful API service, allowing you to perform OCR tasks via HTTP requests. This is useful for integrating MyOCR into web applications, microservices, or accessing it from different programming languages.

You can run this API service directly for development or deploy it using Docker for production.

Option 1: Running Directly (for Development)

This method runs the API service directly on your host machine, typically suitable for local development and testing.

1. Prerequisites:

  • Ensure you have completed the Installation steps, including installing dependencies and downloading models.
  • Make sure you are in the root directory of the myocr project.

2. Start the Server:

# Start the server using python (check main.py for the exact host/port)
# This might use a development server (e.g., Flask's default) and port (e.g., 5000).
python main.py 
  • The server uses the models and configurations defined within the project.
  • The port and host depend on how main.py is configured.
  • Note: For production deployments, using Docker with Gunicorn (Option 2) is recommended.

3. API Endpoints (using example port 5000 - adjust if needed):

  • GET /ping: Checks if the service is running.

    curl http://127.0.0.1:5000/ping
    

  • POST /ocr: Performs basic OCR on an uploaded image.

    • Request: Send a POST request with the image as base64 encoded string.

      curl -X POST \
          -H "Content-Type: application/json" \
          -d '{"image": "BASE64_IMAGE"}'' \
          http://127.0.0.1:5000/ocr
      

    • Response: Returns a JSON object containing the recognized text and bounding box information (similar to the output of CommonOCRPipeline).

  • POST /ocr-json: Performs OCR and extracts structured information based on a schema.

    • Request: Send a POST request with the image base64 string
    curl -X POST \
        -H "Content-Type: application/json" \
        -d '{"image": "BASE64_IMAGE"}'' \
        http://127.0.0.1:5000/ocr-json
    
    • Response: Returns a JSON object matching the provided schema, populated with the extracted data.

4. Optional UI:

A separate Next.js based UI is available for interacting with these endpoints: doc-insight-ui.

Docker provides a containerized environment for running the API service, ensuring consistency and leveraging Gunicorn for performance.

1. Prerequisites:

  • Docker installed.
  • For GPU support: NVIDIA Container Toolkit installed.
  • Ensure models are downloaded to the default location (~/.MyOCR/models/) on the host machine before building the image, as the Docker build process might copy them.

2. Build the Docker Image using the Helper Script:

The recommended way to build the image is using the provided script. It handles tagging with the correct version.

# Ensure the script is executable
chmod +x scripts/build_docker_image.sh

# Determine the application version
VERSION=$(python -c 'import myocr.version; print(myocr.version.VERSION)')

# Build the desired image (replace [cpu|gpu] with 'cpu' or 'gpu')
bash scripts/build_docker_image.sh [cpu|gpu]

# Example: Build the CPU image for the current version
# bash scripts/build_docker_image.sh cpu 

# The script will output the final image tag (e.g., myocr:cpu-0.1.0)

3. Run the Docker Container:

Use the image tag generated by the build script (e.g., myocr:cpu-X.Y.Z or myocr:gpu-X.Y.Z). The service inside the container runs on port 8000.

Tip

Set these environment variables if necessary:

CHAT_BOT_MODEL=qwen2.5:14b

CHAT_BOT_BASEURL=http://127.0.0.1:11434/v1

CHAT_BOT_APIKEY=key

  • GPU Version (replace $IMAGE_TAG with the actual tag):
    # Example: docker run -d --gpus all -p 8000:8000 --name myocr-service myocr:gpu-0.1.0
    docker run -d --gpus all -p 8000:8000 --name myocr-service $IMAGE_TAG
    
  • CPU Version (replace $IMAGE_TAG with the actual tag):
    # Example: docker run -d -p 8000:8000 --name myocr-service myocr:cpu-0.1.0
    docker run -d -p 8000:8000 --name myocr-service $IMAGE_TAG
    
  • The -p 8000:8000 flag maps port 8000 on your host machine to port 8000 inside the container.

4. Accessing API Endpoints (Docker):

Once the container is running, access the API endpoints using the host machine's IP/hostname (or localhost) and the mapped host port (8000 in the examples):

# Example Ping
curl http://localhost:8000/ping 

# Image base64 encode
IMAGE_PATH="your_image.jpg"

BASE64_IMAGE=$(base64 -w 0 "$IMAGE_PATH")  # Linux
#BASE64_IMAGE=$(base64 -i "$IMAGE_PATH" | tr -d '\n') # macOS

# Example Basic OCR
curl -X POST \
  -H "Content-Type: application/json" \
  -d "{\"image\": \"${BASE64_IMAGE}\"}" \
  http://localhost:8000/ocr

# Example Structured OCR
curl -X POST \
  -H "Content-Type: application/json" \
  -d "{\"image\": \"${BASE64_IMAGE}\"}" \
  http://localhost:8000/ocr-json

Remember to replace /path/to/your/image.jpg with the actual path on the machine where you are running the curl command.

Comments