A lightweight wrapper for YOLOv8 using ONNX runtime
This article demonstrates how to create an efficient and streamlined wrapper for YOLOv8-Seg using the ONNX runtime.
Understanding traffic using YOLO
This approach significantly reduces the heavy computational and memory demands typically associated with PyTorch, especially when deploying machine learning models in production environments like Docker containers, where resources are often limited.
PyTorch is widely popular due to its flexibility and ease of use; however, it consumes substantial resources and storage space. Such requirements can become major obstacles when deploying models in real-world applications. The Open Neural Network Exchange (ONNX) framework addresses this issue by allowing models to be efficiently exported and executed with significantly lower computational demands.
Core Components Explained
Letterbox Resizing
The letterbox function resizes images while preserving their original aspect ratios by adding padding. Maintaining the correct aspect ratio is crucial because YOLO models require specific input dimensions; distorted images can result in inaccurate predictions.
def letterbox(im: np.ndarray, new_shape: int | tuple[int, int] = 640):
"""Resize while preserving aspect ratio using letterbox padding.
Parameters
----------
im : np.ndarray
BGR image with shape ``HxWx3``.
new_shape : int | tuple[int, int], default 640
Desired output size. An *int* produces a square canvas
``new_shape x new_shape``; a tuple gives ``(height, width)``.
Returns
-------
tuple[np.ndarray, tuple[float, float], tuple[int, int]]
*(img, (rh, rw), (dw, dh))* where:
* *img* - padded / resized image,
* *(rh, rw)* - resize ratio applied to height and width,
* *(dw, dh)* - half of the padding added on width and height.
"""
# Convert scalar → tuple (h, w)
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
h0, w0 = im.shape[:2]
r = min(new_shape[0] / h0, new_shape[1] / w0) # scale without distortion
new_unpad = (int(round(w0 * r)), int(round(h0 * r)))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]
dw, dh = dw / 2, dh / 2 # distribute padding equally on both sides
top, bottom = int(np.floor(dh)), int(np.ceil(dh))
left, right = int(np.floor(dw)), int(np.ceil(dw))
if (w0, h0) != new_unpad:
im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
# Add border (color 114 - same used in YOLOv8 training)
im = cv2.copyMakeBorder(
im,
top, bottom, left, right,
cv2.BORDER_CONSTANT, value=(114, 114, 114)
)
return im, (r, r), (left, top)
Non-Maximum Suppression (NMS)
The nms function removes redundant bounding boxes that significantly overlap with each other. This ensures that only the most accurate and relevant detections are presented, improving the overall clarity and effectiveness of the model’s output.
def nms(xyxy: np.ndarray, scores: np.ndarray, thr: float = 0.45) -> np.ndarray:
"""Greedy Non-Maximum Suppression in pure NumPy.
Removes overlapping boxes whose Intersection over Union (IoU) exceeds
*thr*.
"""
keep: list[int] = []
idxs = scores.argsort()[::-1] # highest score first
x1, y1, x2, y2 = xyxy.T
areas = (x2 - x1) * (y2 - y1)
while idxs.size:
i, idxs = idxs[0], idxs[1:]
keep.append(i)
# IoU between the kept box and the rest
xx1 = np.maximum(x1[i], x1[idxs])
yy1 = np.maximum(y1[i], y1[idxs])
xx2 = np.minimum(x2[i], x2[idxs])
yy2 = np.minimum(y2[i], y2[idxs])
inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
iou = inter / (areas[i] + areas[idxs] - inter + 1e-6)
# Discard boxes with IoU above the threshold
idxs = idxs[iou <= thr]
return np.asarray(keep, np.int32)
Segmentation Mask Reconstruction
The build_masks function reconstructs accurate segmentation masks by combining mask coefficients with mask prototypes. The process involves applying a sigmoid function to produce probabilities and resizing the masks to exactly match the dimensions of the original images, ensuring precise segmentation results.
def build_masks(coeffs: np.ndarray, proto: np.ndarray, shape: tuple[int, int]) -> np.ndarray:
"""Reconstruct and resize segmentation masks.
The YOLO-Seg head outputs *coeffs* (mask coefficients) and *proto*
(mask prototypes). A linear combination followed by a sigmoid produces
per-pixel probabilities, which are then resized to the original image
and thresholded at 0.5.
"""
# Ensure channel-first order
if proto.shape[0] != coeffs.shape[1]:
proto = proto.transpose(2, 0, 1) # HWC → CHW
C, H, W = proto.shape
masks = sigmoid(coeffs @ proto.reshape(C, -1)).reshape(-1, H, W)
return np.stack([
cv2.resize(m, (shape[1], shape[0]),
interpolation=cv2.INTER_LINEAR) > 0.5
for m in masks
])
The YOLO_ONNX Wrapper Class
The YOLO_ONNX class orchestrates the entire process from initializing the ONNX runtime to preprocessing images, executing inference, and post-processing results. This structured pipeline guarantees outputs closely aligned with original YOLO results.
class YOLO_ONNX:
"""Thin wrapper around YOLOv8-Seg models exported to ONNX."""
def __init__(
self,
model_path: str | bytes,
names: list[str],
imgsz: int = 640,
conf_thres: float = 0.25,
iou_thres: float = 0.45,
providers: list[str] | None = None,
) -> None:
self.names = names
self.nc = len(names)
self.imgsz = imgsz
self.conf_thres = conf_thres
self.iou_thres = iou_thres
# Initialise onnxruntime session
self.session = ort.InferenceSession(
model_path,
providers=providers or ["CPUExecutionProvider"],
)
self.input = self.session.get_inputs()[0].name
self.out = [o.name for o in self.session.get_outputs()]
def predict(self, bgr: np.ndarray) -> list[Result]:
"""Run forward pass and post-processing on a BGR image."""
im0 = bgr.copy() # Preserve original for the output
# 1) Pre-processing ---------------------------------------------------
img, ratio, (dw, dh) = letterbox(im0, self.imgsz)
# BGR→RGB, HWC→CHW, [0,1]
inp = img[:, :, ::-1].transpose(2, 0, 1).astype(np.float32) / 255.0
# 2) ONNX inference ---------------------------------------------------
pred, proto = self.session.run(self.out, {self.input: inp[None]})
pred, proto = pred[0].T, proto[0] # Remove batch dimension
# 3) Parse outputs ----------------------------------------------------
boxes_xywh = pred[:, :4] # cx, cy, w, h
cls_logits = pred[:, 4: 4 + self.nc] # raw class scores
mask_coeffs_all = pred[:, 4 + self.nc:] # mask coefficients
# highest confidence per row
conf = cls_logits.max(1)
# corresponding class indices
cls_ids = cls_logits.argmax(1)
keep = conf > self.conf_thres # confidence threshold
boxes_xywh, conf, cls_ids, mask_coeffs = (
boxes_xywh[keep],
conf[keep],
cls_ids[keep],
mask_coeffs_all[keep],
)
# Convert cx,cy,w,h → x1,y1,x2,y2 in letterbox space
xyxy = np.empty_like(boxes_xywh)
xyxy[:, 0] = boxes_xywh[:, 0] - boxes_xywh[:, 2] / 2 # x1
xyxy[:, 1] = boxes_xywh[:, 1] - boxes_xywh[:, 3] / 2 # y1
xyxy[:, 2] = boxes_xywh[:, 0] + boxes_xywh[:, 2] / 2 # x2
xyxy[:, 3] = boxes_xywh[:, 1] + boxes_xywh[:, 3] / 2 # y2
# Undo padding and scaling
xyxy[:, [0, 2]] -= dw
xyxy[:, [1, 3]] -= dh
xyxy /= ratio[0]
# Clip to image bounds
xyxy[:, [0, 2]] = xyxy[:, [0, 2]].clip(0, im0.shape[1])
xyxy[:, [1, 3]] = xyxy[:, [1, 3]].clip(0, im0.shape[0])
# 4) Class-wise NMS ---------------------------------------------------
keep_idx: list[int] = []
for c in np.unique(cls_ids):
idx = np.where(cls_ids == c)[0]
keep_idx += idx[nms(xyxy[idx], conf[idx], self.iou_thres)].tolist()
xyxy, cls_ids, conf, mask_coeffs = (
xyxy[keep_idx],
cls_ids[keep_idx],
conf[keep_idx],
mask_coeffs[keep_idx],
)
if mask_coeffs.shape[0] == 0:
return [
Result(
im0,
Boxes(np.zeros((0, 4)), np.array([]), np.array([])),
Masks(
np.zeros((0, *im0.shape[:2]), dtype=bool), im0.shape[:2]),
self.names
)
]
# 5) Mask reconstruction ---------------------------------------------
masks = Masks(
build_masks(mask_coeffs, proto, im0.shape[:2]),
im0.shape[:2],
)
# Return Ultralytics-style list (one Result per input image)
return [Result(im0, Boxes(xyxy, cls_ids, conf), masks, self.names)]
Helper Functions
Several essential utility functions are included, such as:
sigmoid: Converts raw model outputs into probability values, crucial for interpreting segmentation results.
def sigmoid(x: np.ndarray | float) -> np.ndarray | float:
"""Element-wise logistic sigmoid o(x) = 1 / (1 + e^{-x}).
The function is used when reconstructing segmentation masks because the
network outputs logits that must be converted to probabilities.
"""
return 1 / (1 + np.exp(-x))
Significance for Practical Applications
Creating this ONNX-based wrapper was driven by the lack of existing resources and tools specifically tailored for integrating YOLO models with ONNX runtimes. While computational efficiency and resource management are frequently overlooked, they become essential in deployments with strict resource constraints. This optimized wrapper enhances performance, reduces resource consumption, and simplifies deployment, significantly improving the practical usability and scalability of YOLO-based segmentation models in diverse operational settings.