PYTHON LIBRARY

langvision
vision LLMs

Efficient LoRA fine-tuning for vision-language models — LLaVA, Qwen-VL, InternVL, PaliGemma, and any HF VLM.

Sister library to langtune. Same FastVisionModel API. Triton kernels applied automatically to the language decoder. Train locally or dispatch to the cloud — zero code changes.

Install langvision Read the docs

$pip install langvision

SUPPORTED MODELS

Any vision-language model on HuggingFace

LLaVA 1.5/1.6POPULAR

Decoder + CLIP

InstructBLIP

Q-Former + LLM

Qwen-VL

ViT + Qwen2

InternVL2

InternViT + LLM

PaliGemmaNEW

SigLIP + Gemma2

BLIP-2

Q-Former + OPT/T5

mPLUG-Owl3

ViT + Qwen2

Any HF VLMOPEN

AutoModelForVision2Seq

FASTVISIONMODEL API

Local or remote. Your choice.

local_vision_train.py

from langvision import FastVisionModel
from PIL import Image

# Load LLaVA with 4-bit + Langtrain Triton kernels
model, processor = FastVisionModel.from_pretrained(
    "llava-hf/llava-1.5-7b-hf",
    load_in_4bit=True,
)

# Add LoRA on language decoder only (vision encoder frozen)
model = FastVisionModel.get_peft_model(
    model, r=16, method="qlora",
    train_vision_encoder=False,
)

# Fine-tune on your image-text pairs
FastVisionModel.train(
    model, processor, dataset,
    method="qlora",
    output_dir="./vision-model",
)

# Run inference
image = Image.open("cat.jpg")
response = FastVisionModel.generate(
    model, processor, image,
    prompt="Describe this image in detail."
)

12 TRAINING METHODS

From SFT to GRPO. All vision-aware.

SFT

Captioning, VQA, instruction following

QLoRA

4-bit NF4 — run 70B VLMs on 24 GB GPU

DoRA

Weight-decomposed LoRA on vision decoder

DPO

Preference optimization for image-text pairs

ORPO

Odds Ratio Preference Optimization

SimPO

Simple Preference Optimization

KTO

Kahneman-Tversky for vision preferences

RLHF

PPO with a vision reward model

GRPO

Group Relative Policy Optimization (RLVR)

IA³

Infused adapter on vision-language layers

Prefix

Prefix tuning for task steering

LoRA

Standard LoRA on language decoder

THE LANGTRAIN LIBRARY SUITE

langtune + langvision

FEATURElangtunelangvision

Unsloth-compatible API✓✓

Local GPU training✓✓

Remote cloud dispatch✓✓

Triton/CUDA kernels✓✓

12 training methods✓—

12 vision methods—✓

Vision encoder LoRA—✓

Multimodal DPO/ORPO—✓

Streaming remote metrics✓✓

Fine-tune vision models today

Install langvision and fine-tune any HuggingFace VLM in minutes.

GitHub Documentation langtune →

langvision
vision LLMs

Efficient LoRA fine-tuning for vision-language models — LLaVA, Qwen-VL, InternVL, PaliGemma, and any HF VLM.

Sister library to langtune. Same FastVisionModel API. Triton kernels applied automatically to the language decoder. Train locally or dispatch to the cloud — zero code changes.

$pip install langvision

from langvision import FastVisionModel from PIL import Image # Load LLaVA with 4-bit + Langtrain Triton kernels model, processor = FastVisionModel.from_pretrained( "llava-hf/llava-1.5-7b-hf", load_in_4bit=True, ) # Add LoRA on language decoder only (vision encoder frozen) model = FastVisionModel.get_peft_model( model, r=16, method="qlora", train_vision_encoder=False, ) # Fine-tune on your image-text pairs FastVisionModel.train( model, processor, dataset, method="qlora", output_dir="./vision-model", ) # Run inference image = Image.open("cat.jpg") response = FastVisionModel.generate( model, processor, image, prompt="Describe this image in detail." )

langvisionvision LLMs

Any vision-language model on HuggingFace

Local or remote. Your choice.

From SFT to GRPO. All vision-aware.

langtune + langvision

Fine-tune vision models today

langvision
vision LLMs