On-Device
iOS SDK
Deploy models natively on iPhone and iPad
Android SDK
Deploy models natively on Android devices
llama.cpp
CPU-first inference with cross-platform support
MLX
Optimized inference on Apple Silicon
ONNX
Cross-platform inference with ONNX Runtime
Ollama
Easy local deployment and model management
GPU Inference
Transformers
Flexible inference with Hugging Face Transformers
vLLM
High-throughput production serving
SGLang
Structured generation and fast serving
Modal
Serverless GPU deployment
Baseten
Production model inference platform
Fal
Fast inference API platform
Tools
Model Bundling Services
Package and distribute optimized model bundles for edge deployment