onnx-runtime

Name: onnx-runtime
Author: mkurman/zorai

$npx mdskill add mkurman/zorai/onnx-runtime

Accelerate ML inference by converting models to ONNX and optimizing execution.

Enables fast CPU, GPU, and mobile model inference across frameworks.
Integrates with PyTorch, TensorFlow, and scikit-learn conversion tools.
Selects execution providers and applies quantization automatically.
Delivers optimized model outputs directly to the application layer.

SKILL.md

.github/skills/onnx-runtimeView on GitHub ↗

---
name: onnx-runtime
description: "ONNX Runtime — cross-platform ML inference optimizer. Convert PyTorch, TensorFlow, scikit-learn models to ONNX. GPU, CPU, and mobile acceleration. Quantization, graph optimization, and custom ops."
tags: [onnx, model-optimization, inference, cross-platform, quantization, edge, zorai]
---
## Overview

ONNX Runtime is a cross-platform ML inference engine that runs models in the ONNX format. Supports CPU, GPU (CUDA, DirectML), and mobile inference with graph optimizations, quantization, and custom operators.

## Installation

```bash
uv pip install onnxruntime  # CPU
uv pip install onnxruntime-gpu  # CUDA

# Convert first: from transformers, PyTorch, etc.
```

## Basic Inference

```python
import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

result = session.run([output_name], {input_name: np.random.randn(1, 3, 224, 224).astype(np.float32)})
print(result[0].shape)
```

## GPU and Optimization

```python
# GPU inference
session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])

# Enable optimizations
options = ort.SessionOptions()
options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
session = ort.InferenceSession("model.onnx", sess_options=options)
```

## Quantization

```python
from onnxruntime.quantization import quantize_dynamic, QuantType

quantize_dynamic("model.onnx", "model_quantized.onnx", weight_type=QuantType.QInt8)
# ~4x smaller, minimal accuracy loss
```

## References
- [ONNX Runtime docs](https://onnxruntime.ai/docs/)
- [ONNX model zoo](https://github.com/onnx/models)

More from mkurman/zorai

Skill	Description
account-management	>
agile-scrum	>
albumentations	Fast image augmentation library (Albumentations). 70+ transforms for classification, segmentation, object detection, keypoints, and pose estimation. Optimized OpenCV-based pipeline with unified API across all CV tasks. Supports images, masks, bounding boxes, and keypoints simultaneously. Note: classic Albumentations (MIT) is no longer maintained; successor AlbumentationsX uses AGPL-3.0. For torchvision-native augmentations, use torchvision.transforms.v2.
aml-compliance	Anti-Money Laundering (AML) and Know Your Customer (KYC) compliance workflow. Sanctions screening, PEP detection, transaction monitoring, suspicious activity reporting (SAR), and OFAC compliance.
anki-connect	This skill is for interacting with Anki through AnkiConnect, and should be used whenever a user asks to interact with Anki, including to read or modify decks, notes, cards, models, media, or sync operations.
approval-checkpoint-long-task	Canonical long-task pack for daemon-managed work with deliberate approval checkpoints, status summaries, rollback notes, and mobile-safe governance-aware updates.
auditing-goal-artifacts	Use when reviewing recent zorai goal run outputs, closure markers, ledgers, or evidence bundles to judge whether completion is credible or to identify remaining uncertainty.
autogen	AutoGen (Microsoft) — multi-agent conversation framework. Agent-to-agent chat, code generation & execution, tool use, group chat, and human-in-the-loop. Build collaborative AI systems with specialized agents.
backtrader	Python backtesting framework for trading strategies. Data feeds, brokers, analyzers, and live trading support. Strategy development with commission models, slippage, and signal-based execution.
beautiful-mermaid	Render Mermaid diagrams as SVG and PNG using the Beautiful Mermaid library. Use when the user asks to render a Mermaid diagram.