PYTHON LIBRARY

langtune
text LLMs

Efficient LoRA fine-tuning for large language models — with custom Triton and CUDA kernels for maximum GPU throughput.

Unsloth-compatible API. Trains locally on your GPU or dispatches to langtrain-server cloud workers. 12 training methods. Zero code changes to switch modes.

Install langtuneRead the docs
$pip install langtune

TRITON + CUDA KERNELS

Applied automatically. Always.

Every training run automatically patches your model with the fastest available kernels — compiled CUDA extension first, Triton JIT fallback, PyTorch native last.

RELATIVE SPEEDUP OVER PYTORCH BASELINE

FlashAttention2
2–4×

vs HF eager

FusedRMSNorm
2.6×

vs nn.LayerNorm

Triton RoPE
3.5×

vs HF apply_rotary

FusedCrossEntropy
26× VRAM

vs F.cross_entropy

TurboQuant KV
4× KV

vs FP16 KV cache

3-TIER ACCELERATION HIERARCHY

1
Compiled CUDA ExtensionFASTEST

Pre-compiled from cuda_kernels/csrc/. Zero JIT overhead. Max GPU occupancy.

2
Triton JIT KernelsDEFAULT

JIT compiled per SM arch on first call. RMSNorm, RoPE, FusedCE, KV Quant.

3
PyTorch NativeFALLBACK

Always available. Used when GPU is absent or kernels not built.

FASTLANGUAGEMODEL API

Same code. Two execution modes.

local_train.py
from langtune import FastLanguageModel

# Load with 4-bit QLoRA + Langtrain Triton kernels
model, tokenizer = FastLanguageModel.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    load_in_4bit=True,     # NF4 QLoRA
)

# Add LoRA adapter
model = FastLanguageModel.get_peft_model(
    model, r=16, method="qlora"
)

# Train — FusedRMSNorm, RoPE, FlashAttn2 applied automatically
FastLanguageModel.train(
    model, tokenizer, dataset,
    method="qlora",
    output_dir="./output",
)

12 TRAINING METHODS

Every alignment technique. One library.

SFT

Supervised fine-tuning on instruction data

LoRA

Low-rank adaptation — train <1% of parameters

QLoRA

4-bit NF4 quantization + LoRA — run on 8 GB VRAM

DoRA

Weight-decomposed LoRA — improved convergence

GaLore

Gradient low-rank projection — full-param expressivity

IA³

Infused adapter — 100× fewer params than LoRA

DPO

Direct Preference Optimization — no reward model

ORPO

Odds Ratio Preference Optimization

SimPO

Simple Preference Optimization

KTO

Kahneman-Tversky Optimization

RLHF

PPO with custom reward model

Prefix

Prefix tuning for task steering

Ready to fine-tune?

Install langtune and start training in under 5 minutes.

GitHubDocumentationLangvision →
L
Langtrain

The fine-tuning platform for production LLMs.
Built for builders who demand sovereignty.

GithubHuggingFace
All Systems Operational

Product

  • Fine-Tuning
  • PlaygroundNew
  • RL Environment
  • Guardrails
  • AI Agents
  • Model Hub
  • Pricing
  • Enterprise

Use Cases

  • Customer Support AI
  • Internal Code Assistants
  • Healthcare & HIPAA
  • Financial Services
  • Legal Document QA

Resources

  • Documentation
  • Quick Start
  • API Reference
  • Python SDK
  • Node SDK
  • Blog
  • Changelog
  • Status

Company

  • About Us
  • Careers
  • Contact
  • Community
  • Support

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Data Processing Agreement
© 2026 Langtrain AI Private Limited. All rights reserved.
PrivacyTermsMade with ♥ in India

LANGTRAIN