Skip to main content
LFM models are designed for efficient deployment across a wide range of platforms. Run models on-device for privacy and low latency, or scale up with GPU inference for production workloads.

On-Device

iOS SDK

Deploy models natively on iPhone and iPad

Android SDK

Deploy models natively on Android devices

llama.cpp

CPU-first inference with cross-platform support

MLX

Optimized inference on Apple Silicon

ONNX

Cross-platform inference with ONNX Runtime

Ollama

Easy local deployment and model management

GPU Inference

Transformers

Flexible inference with Hugging Face Transformers

vLLM

High-throughput production serving

SGLang

Structured generation and fast serving

Modal

Serverless GPU deployment

Baseten

Production model inference platform

Fal

Fast inference API platform

Tools

Model Bundling Services

Package and distribute optimized model bundles for edge deployment