Gustavo Zeloni

0%
← back to writing

A lightweight wrapper for YOLOv8 using ONNX runtime

This article demonstrates how to create an efficient and streamlined wrapper for YOLOv8-Seg using the ONNX runtime.

1*KaTRpA1qlwRrrS18hSIugg.webp Understanding traffic using YOLO

This approach significantly reduces the heavy computational and memory demands typically associated with PyTorch, especially when deploying machine learning models in production environments like Docker containers, where resources are often limited.

PyTorch is widely popular due to its flexibility and ease of use; however, it consumes substantial resources and storage space. Such requirements can become major obstacles when deploying models in real-world applications. The Open Neural Network Exchange (ONNX) framework addresses this issue by allowing models to be efficiently exported and executed with significantly lower computational demands.

Core Components Explained

Letterbox Resizing

The letterbox function resizes images while preserving their original aspect ratios by adding padding. Maintaining the correct aspect ratio is crucial because YOLO models require specific input dimensions; distorted images can result in inaccurate predictions.

python
def letterbox(im: np.ndarray, new_shape: int | tuple[int, int] = 640): """Resize while preserving aspect ratio using letterbox padding. Parameters ---------- im : np.ndarray BGR image with shape ``HxWx3``. new_shape : int | tuple[int, int], default 640 Desired output size. An *int* produces a square canvas ``new_shape x new_shape``; a tuple gives ``(height, width)``. Returns ------- tuple[np.ndarray, tuple[float, float], tuple[int, int]] *(img, (rh, rw), (dw, dh))* where: * *img* - padded / resized image, * *(rh, rw)* - resize ratio applied to height and width, * *(dw, dh)* - half of the padding added on width and height. """ # Convert scalar → tuple (h, w) if isinstance(new_shape, int): new_shape = (new_shape, new_shape) h0, w0 = im.shape[:2] r = min(new_shape[0] / h0, new_shape[1] / w0) # scale without distortion new_unpad = (int(round(w0 * r)), int(round(h0 * r))) dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] dw, dh = dw / 2, dh / 2 # distribute padding equally on both sides top, bottom = int(np.floor(dh)), int(np.ceil(dh)) left, right = int(np.floor(dw)), int(np.ceil(dw)) if (w0, h0) != new_unpad: im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) # Add border (color 114 - same used in YOLOv8 training) im = cv2.copyMakeBorder( im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114) ) return im, (r, r), (left, top)

Non-Maximum Suppression (NMS)

The nms function removes redundant bounding boxes that significantly overlap with each other. This ensures that only the most accurate and relevant detections are presented, improving the overall clarity and effectiveness of the model’s output.

python
def nms(xyxy: np.ndarray, scores: np.ndarray, thr: float = 0.45) -> np.ndarray: """Greedy Non-Maximum Suppression in pure NumPy. Removes overlapping boxes whose Intersection over Union (IoU) exceeds *thr*. """ keep: list[int] = [] idxs = scores.argsort()[::-1] # highest score first x1, y1, x2, y2 = xyxy.T areas = (x2 - x1) * (y2 - y1) while idxs.size: i, idxs = idxs[0], idxs[1:] keep.append(i) # IoU between the kept box and the rest xx1 = np.maximum(x1[i], x1[idxs]) yy1 = np.maximum(y1[i], y1[idxs]) xx2 = np.minimum(x2[i], x2[idxs]) yy2 = np.minimum(y2[i], y2[idxs]) inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1) iou = inter / (areas[i] + areas[idxs] - inter + 1e-6) # Discard boxes with IoU above the threshold idxs = idxs[iou <= thr] return np.asarray(keep, np.int32)

Segmentation Mask Reconstruction

The build_masks function reconstructs accurate segmentation masks by combining mask coefficients with mask prototypes. The process involves applying a sigmoid function to produce probabilities and resizing the masks to exactly match the dimensions of the original images, ensuring precise segmentation results.

python
def build_masks(coeffs: np.ndarray, proto: np.ndarray, shape: tuple[int, int]) -> np.ndarray: """Reconstruct and resize segmentation masks. The YOLO-Seg head outputs *coeffs* (mask coefficients) and *proto* (mask prototypes). A linear combination followed by a sigmoid produces per-pixel probabilities, which are then resized to the original image and thresholded at 0.5. """ # Ensure channel-first order if proto.shape[0] != coeffs.shape[1]: proto = proto.transpose(2, 0, 1) # HWC → CHW C, H, W = proto.shape masks = sigmoid(coeffs @ proto.reshape(C, -1)).reshape(-1, H, W) return np.stack([ cv2.resize(m, (shape[1], shape[0]), interpolation=cv2.INTER_LINEAR) > 0.5 for m in masks ])

The YOLO_ONNX Wrapper Class

The YOLO_ONNX class orchestrates the entire process from initializing the ONNX runtime to preprocessing images, executing inference, and post-processing results. This structured pipeline guarantees outputs closely aligned with original YOLO results.

python
class YOLO_ONNX: """Thin wrapper around YOLOv8-Seg models exported to ONNX.""" def __init__( self, model_path: str | bytes, names: list[str], imgsz: int = 640, conf_thres: float = 0.25, iou_thres: float = 0.45, providers: list[str] | None = None, ) -> None: self.names = names self.nc = len(names) self.imgsz = imgsz self.conf_thres = conf_thres self.iou_thres = iou_thres # Initialise onnxruntime session self.session = ort.InferenceSession( model_path, providers=providers or ["CPUExecutionProvider"], ) self.input = self.session.get_inputs()[0].name self.out = [o.name for o in self.session.get_outputs()]
python
def predict(self, bgr: np.ndarray) -> list[Result]: """Run forward pass and post-processing on a BGR image.""" im0 = bgr.copy() # Preserve original for the output # 1) Pre-processing --------------------------------------------------- img, ratio, (dw, dh) = letterbox(im0, self.imgsz) # BGR→RGB, HWC→CHW, [0,1] inp = img[:, :, ::-1].transpose(2, 0, 1).astype(np.float32) / 255.0 # 2) ONNX inference --------------------------------------------------- pred, proto = self.session.run(self.out, {self.input: inp[None]}) pred, proto = pred[0].T, proto[0] # Remove batch dimension # 3) Parse outputs ---------------------------------------------------- boxes_xywh = pred[:, :4] # cx, cy, w, h cls_logits = pred[:, 4: 4 + self.nc] # raw class scores mask_coeffs_all = pred[:, 4 + self.nc:] # mask coefficients # highest confidence per row conf = cls_logits.max(1) # corresponding class indices cls_ids = cls_logits.argmax(1) keep = conf > self.conf_thres # confidence threshold boxes_xywh, conf, cls_ids, mask_coeffs = ( boxes_xywh[keep], conf[keep], cls_ids[keep], mask_coeffs_all[keep], ) # Convert cx,cy,w,h → x1,y1,x2,y2 in letterbox space xyxy = np.empty_like(boxes_xywh) xyxy[:, 0] = boxes_xywh[:, 0] - boxes_xywh[:, 2] / 2 # x1 xyxy[:, 1] = boxes_xywh[:, 1] - boxes_xywh[:, 3] / 2 # y1 xyxy[:, 2] = boxes_xywh[:, 0] + boxes_xywh[:, 2] / 2 # x2 xyxy[:, 3] = boxes_xywh[:, 1] + boxes_xywh[:, 3] / 2 # y2 # Undo padding and scaling xyxy[:, [0, 2]] -= dw xyxy[:, [1, 3]] -= dh xyxy /= ratio[0] # Clip to image bounds xyxy[:, [0, 2]] = xyxy[:, [0, 2]].clip(0, im0.shape[1]) xyxy[:, [1, 3]] = xyxy[:, [1, 3]].clip(0, im0.shape[0]) # 4) Class-wise NMS --------------------------------------------------- keep_idx: list[int] = [] for c in np.unique(cls_ids): idx = np.where(cls_ids == c)[0] keep_idx += idx[nms(xyxy[idx], conf[idx], self.iou_thres)].tolist() xyxy, cls_ids, conf, mask_coeffs = ( xyxy[keep_idx], cls_ids[keep_idx], conf[keep_idx], mask_coeffs[keep_idx], ) if mask_coeffs.shape[0] == 0: return [ Result( im0, Boxes(np.zeros((0, 4)), np.array([]), np.array([])), Masks( np.zeros((0, *im0.shape[:2]), dtype=bool), im0.shape[:2]), self.names ) ] # 5) Mask reconstruction --------------------------------------------- masks = Masks( build_masks(mask_coeffs, proto, im0.shape[:2]), im0.shape[:2], ) # Return Ultralytics-style list (one Result per input image) return [Result(im0, Boxes(xyxy, cls_ids, conf), masks, self.names)]

Helper Functions

Several essential utility functions are included, such as:

  • sigmoid: Converts raw model outputs into probability values, crucial for interpreting segmentation results.
python
def sigmoid(x: np.ndarray | float) -> np.ndarray | float: """Element-wise logistic sigmoid o(x) = 1 / (1 + e^{-x}). The function is used when reconstructing segmentation masks because the network outputs logits that must be converted to probabilities. """ return 1 / (1 + np.exp(-x))

Significance for Practical Applications

Creating this ONNX-based wrapper was driven by the lack of existing resources and tools specifically tailored for integrating YOLO models with ONNX runtimes. While computational efficiency and resource management are frequently overlooked, they become essential in deployments with strict resource constraints. This optimized wrapper enhances performance, reduces resource consumption, and simplifies deployment, significantly improving the practical usability and scalability of YOLO-based segmentation models in diverse operational settings.