A lightweight wrapper for YOLOv8 using ONNX runtime

1*KaTRpA1qlwRrrS18hSIugg.webp Understanding traffic using YOLO

This approach significantly reduces the heavy computational and memory demands typically associated with PyTorch, especially when deploying machine learning models in production environments like Docker containers, where resources are often limited.

PyTorch is widely popular due to its flexibility and ease of use; however, it consumes substantial resources and storage space. Such requirements can become major obstacles when deploying models in real-world applications. The Open Neural Network Exchange (ONNX) framework addresses this issue by allowing models to be efficiently exported and executed with significantly lower computational demands.

Core Components Explained

Letterbox Resizing

The letterbox function resizes images while preserving their original aspect ratios by adding padding. Maintaining the correct aspect ratio is crucial because YOLO models require specific input dimensions; distorted images can result in inaccurate predictions.

python
def letterbox(im: np.ndarray, new_shape: int | tuple[int, int] = 640):
    """Resize while preserving aspect ratio using letterbox padding.

    Parameters
    ----------
    im : np.ndarray
        BGR image with shape ``HxWx3``.
    new_shape : int | tuple[int, int], default 640
        Desired output size. An *int* produces a square canvas
        ``new_shape x new_shape``; a tuple gives ``(height, width)``.

    Returns
    -------
    tuple[np.ndarray, tuple[float, float], tuple[int, int]]
        *(img, (rh, rw), (dw, dh))* where:
        * *img*  - padded / resized image,
        * *(rh, rw)* - resize ratio applied to height and width,
        * *(dw, dh)* - half of the padding added on width and height.
    """
    # Convert scalar → tuple (h, w)
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    h0, w0 = im.shape[:2]
    r = min(new_shape[0] / h0, new_shape[1] / w0)  # scale without distortion
    new_unpad = (int(round(w0 * r)), int(round(h0 * r)))

    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]
    dw, dh = dw / 2, dh / 2  # distribute padding equally on both sides
    top, bottom = int(np.floor(dh)), int(np.ceil(dh))
    left, right = int(np.floor(dw)), int(np.ceil(dw))

    if (w0, h0) != new_unpad:
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)

    # Add border (color 114 - same used in YOLOv8 training)
    im = cv2.copyMakeBorder(
        im,
        top, bottom, left, right,
        cv2.BORDER_CONSTANT, value=(114, 114, 114)
    )
    return im, (r, r), (left, top)

Non-Maximum Suppression (NMS)

The nms function removes redundant bounding boxes that significantly overlap with each other. This ensures that only the most accurate and relevant detections are presented, improving the overall clarity and effectiveness of the model’s output.

python
def nms(xyxy: np.ndarray, scores: np.ndarray, thr: float = 0.45) -> np.ndarray:
    """Greedy Non-Maximum Suppression in pure NumPy.

    Removes overlapping boxes whose Intersection over Union (IoU) exceeds
    *thr*.
    """
    keep: list[int] = []
    idxs = scores.argsort()[::-1]  # highest score first
    x1, y1, x2, y2 = xyxy.T
    areas = (x2 - x1) * (y2 - y1)

    while idxs.size:
        i, idxs = idxs[0], idxs[1:]
        keep.append(i)

        # IoU between the kept box and the rest
        xx1 = np.maximum(x1[i], x1[idxs])
        yy1 = np.maximum(y1[i], y1[idxs])
        xx2 = np.minimum(x2[i], x2[idxs])
        yy2 = np.minimum(y2[i], y2[idxs])

        inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
        iou = inter / (areas[i] + areas[idxs] - inter + 1e-6)

        # Discard boxes with IoU above the threshold
        idxs = idxs[iou <= thr]

    return np.asarray(keep, np.int32)

Segmentation Mask Reconstruction

The build_masks function reconstructs accurate segmentation masks by combining mask coefficients with mask prototypes. The process involves applying a sigmoid function to produce probabilities and resizing the masks to exactly match the dimensions of the original images, ensuring precise segmentation results.

python
def build_masks(coeffs: np.ndarray, proto: np.ndarray, shape: tuple[int, int]) -> np.ndarray:
    """Reconstruct and resize segmentation masks.

    The YOLO-Seg head outputs *coeffs* (mask coefficients) and *proto*
    (mask prototypes). A linear combination followed by a sigmoid produces
    per-pixel probabilities, which are then resized to the original image
    and thresholded at 0.5.
    """
    # Ensure channel-first order
    if proto.shape[0] != coeffs.shape[1]:
        proto = proto.transpose(2, 0, 1)  # HWC → CHW

    C, H, W = proto.shape
    masks = sigmoid(coeffs @ proto.reshape(C, -1)).reshape(-1, H, W)

    return np.stack([
        cv2.resize(m, (shape[1], shape[0]),
                   interpolation=cv2.INTER_LINEAR) > 0.5
        for m in masks
    ])

The YOLO_ONNX Wrapper Class

The YOLO_ONNX class orchestrates the entire process from initializing the ONNX runtime to preprocessing images, executing inference, and post-processing results. This structured pipeline guarantees outputs closely aligned with original YOLO results.

python
class YOLO_ONNX:
    """Thin wrapper around YOLOv8-Seg models exported to ONNX."""

    def __init__(
        self,
        model_path: str | bytes,
        names: list[str],
        imgsz: int = 640,
        conf_thres: float = 0.25,
        iou_thres: float = 0.45,
        providers: list[str] | None = None,
    ) -> None:
        self.names = names
        self.nc = len(names)
        self.imgsz = imgsz
        self.conf_thres = conf_thres
        self.iou_thres = iou_thres

        # Initialise onnxruntime session
        self.session = ort.InferenceSession(
            model_path,
            providers=providers or ["CPUExecutionProvider"],
        )
        self.input = self.session.get_inputs()[0].name
        self.out = [o.name for o in self.session.get_outputs()]

python
def predict(self, bgr: np.ndarray) -> list[Result]:
    """Run forward pass and post-processing on a BGR image."""

    im0 = bgr.copy()  # Preserve original for the output

    # 1) Pre-processing ---------------------------------------------------
    img, ratio, (dw, dh) = letterbox(im0, self.imgsz)
    # BGR→RGB, HWC→CHW, [0,1]
    inp = img[:, :, ::-1].transpose(2, 0, 1).astype(np.float32) / 255.0

    # 2) ONNX inference ---------------------------------------------------
    pred, proto = self.session.run(self.out, {self.input: inp[None]})
    pred, proto = pred[0].T, proto[0]  # Remove batch dimension

    # 3) Parse outputs ----------------------------------------------------
    boxes_xywh = pred[:, :4]                     # cx, cy, w, h
    cls_logits = pred[:, 4: 4 + self.nc]        # raw class scores
    mask_coeffs_all = pred[:, 4 + self.nc:]     # mask coefficients

    # highest confidence per row
    conf = cls_logits.max(1)
    # corresponding class indices
    cls_ids = cls_logits.argmax(1)

    keep = conf > self.conf_thres                # confidence threshold
    boxes_xywh, conf, cls_ids, mask_coeffs = (
        boxes_xywh[keep],
        conf[keep],
        cls_ids[keep],
        mask_coeffs_all[keep],
    )

    # Convert cx,cy,w,h → x1,y1,x2,y2 in letterbox space
    xyxy = np.empty_like(boxes_xywh)
    xyxy[:, 0] = boxes_xywh[:, 0] - boxes_xywh[:, 2] / 2  # x1
    xyxy[:, 1] = boxes_xywh[:, 1] - boxes_xywh[:, 3] / 2  # y1
    xyxy[:, 2] = boxes_xywh[:, 0] + boxes_xywh[:, 2] / 2  # x2
    xyxy[:, 3] = boxes_xywh[:, 1] + boxes_xywh[:, 3] / 2  # y2

    # Undo padding and scaling
    xyxy[:, [0, 2]] -= dw
    xyxy[:, [1, 3]] -= dh
    xyxy /= ratio[0]

    # Clip to image bounds
    xyxy[:, [0, 2]] = xyxy[:, [0, 2]].clip(0, im0.shape[1])
    xyxy[:, [1, 3]] = xyxy[:, [1, 3]].clip(0, im0.shape[0])

    # 4) Class-wise NMS ---------------------------------------------------
    keep_idx: list[int] = []
    for c in np.unique(cls_ids):
        idx = np.where(cls_ids == c)[0]
        keep_idx += idx[nms(xyxy[idx], conf[idx], self.iou_thres)].tolist()

    xyxy, cls_ids, conf, mask_coeffs = (
        xyxy[keep_idx],
        cls_ids[keep_idx],
        conf[keep_idx],
        mask_coeffs[keep_idx],
    )

    if mask_coeffs.shape[0] == 0:
        return [
            Result(
                im0,
                Boxes(np.zeros((0, 4)), np.array([]), np.array([])),
                Masks(
                    np.zeros((0, *im0.shape[:2]), dtype=bool), im0.shape[:2]),
                self.names
            )
        ]

    # 5) Mask reconstruction ---------------------------------------------
    masks = Masks(
        build_masks(mask_coeffs, proto, im0.shape[:2]),
        im0.shape[:2],
    )

    # Return Ultralytics-style list (one Result per input image)
    return [Result(im0, Boxes(xyxy, cls_ids, conf), masks, self.names)]

Helper Functions

Several essential utility functions are included, such as:

sigmoid: Converts raw model outputs into probability values, crucial for interpreting segmentation results.

python
def sigmoid(x: np.ndarray | float) -> np.ndarray | float:
    """Element-wise logistic sigmoid o(x) = 1 / (1 + e^{-x}).

    The function is used when reconstructing segmentation masks because the
    network outputs logits that must be converted to probabilities.
    """
    return 1 / (1 + np.exp(-x))

Significance for Practical Applications

Creating this ONNX-based wrapper was driven by the lack of existing resources and tools specifically tailored for integrating YOLO models with ONNX runtimes. While computational efficiency and resource management are frequently overlooked, they become essential in deployments with strict resource constraints. This optimized wrapper enhances performance, reduces resource consumption, and simplifies deployment, significantly improving the practical usability and scalability of YOLO-based segmentation models in diverse operational settings.