Efficient LoRA fine-tuning for vision-language models — LLaVA, Qwen-VL, InternVL, PaliGemma, and any HF VLM.
Sister library to langtune. Same FastVisionModel API. Triton kernels applied automatically to the language decoder. Train locally or dispatch to the cloud — zero code changes.
SUPPORTED MODELS
Decoder + CLIP
Q-Former + LLM
ViT + Qwen2
InternViT + LLM
SigLIP + Gemma2
Q-Former + OPT/T5
ViT + Qwen2
AutoModelForVision2Seq
FASTVISIONMODEL API
12 TRAINING METHODS
Captioning, VQA, instruction following
4-bit NF4 — run 70B VLMs on 24 GB GPU
Weight-decomposed LoRA on vision decoder
Preference optimization for image-text pairs
Odds Ratio Preference Optimization
Simple Preference Optimization
Kahneman-Tversky for vision preferences
PPO with a vision reward model
Group Relative Policy Optimization (RLVR)
Infused adapter on vision-language layers
Prefix tuning for task steering
Standard LoRA on language decoder
THE LANGTRAIN LIBRARY SUITE
Install langvision and fine-tune any HuggingFace VLM in minutes.