L

Initializing Studio...

Documentation
Last updated: October 10, 2025

Getting Started

  • Introduction
  • Quick Start
  • Installation

Fine-tuning

  • LoRA & QLoRA
  • Full Fine-tuning

API & SDK

  • REST API
  • Python SDK

Deployment

  • Cloud Deployment
  • Security

Resources

  • FAQ
  • Changelog

Cloud Deployment

Deploy your fine-tuned models to production with auto-scaling, monitoring, and CI/CD integration.

Quick Deploy

Deploy your model to production in minutes:

One-Click Deploy: Simple deployment from the dashboard

Custom Domains: Use your own domain with SSL certificates
Code Example
# Deploy via CLI
langtrain deploy create \
  --model my-fine-tuned-model \
  --name production-api \
  --region us-east-1 \
  --min-instances 1 \
  --max-instances 10

# Deploy via Python SDK
deployment = client.deployments.create(
    model_id="your-model-id",
    name="production-api",
    config={
        "region": "us-east-1",
        "instance_type": "gpu-medium",
        "min_instances": 1,
        "max_instances": 10,
        "auto_scaling": True
    }
)

Container Deployment

Deploy using Docker containers for maximum flexibility:

Custom Images: Bring your own Docker images

Kubernetes: Native Kubernetes support with Helm charts
Code Example
# Generate Dockerfile
langtrain deploy generate-dockerfile --model my-model

# Build and deploy
docker build -t my-model:latest .
docker push your-registry/my-model:latest

# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: langtrain-model
spec:
  replicas: 3
  selector:
    matchLabels:
      app: langtrain-model
  template:
    metadata:
      labels:
        app: langtrain-model
    spec:
      containers:
      - name: model
        image: your-registry/my-model:latest
        ports:
        - containerPort: 8000

Load Balancing

Distribute traffic across multiple instances:

Health Checks: Automatic health monitoring and failover

Traffic Routing: Smart routing based on model performance
Code Example
# Configure load balancer
deployment_config = {
    "load_balancer": {
        "algorithm": "round_robin",
        "health_check": {
            "path": "/health",
            "interval": 30,
            "timeout": 5,
            "healthy_threshold": 2,
            "unhealthy_threshold": 3
        },
        "sticky_sessions": False
    },
    "auto_scaling": {
        "metric": "requests_per_second",
        "target": 100,
        "scale_up_cooldown": 300,
        "scale_down_cooldown": 600
    }
}

Monitoring & Alerts

Monitor your deployed models in real-time:

Metrics: Request latency, throughput, error rates, and custom metrics

Alerts: Configure alerts for critical issues
Code Example
# Set up monitoring
client.monitoring.create_alert(
    deployment_id="your-deployment-id",
    metric="response_time_p95",
    threshold=2000,  # 2 seconds
    comparison="greater_than",
    notification_channels=["email", "slack"]
)

# Custom metrics
client.monitoring.track_metric(
    deployment_id="your-deployment-id",
    metric_name="business_metric",
    value=42,
    tags={"version": "v1.2", "region": "us-east-1"}
)

CI/CD Integration

Integrate with your existing CI/CD pipelines:

GitHub Actions: Pre-built actions for deployment automation

API Integration: REST API for programmatic deployments
Code Example
# GitHub Actions workflow
name: Deploy Model
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Deploy to LangTrain
      uses: langtrain/deploy-action@v1
      with:
        api-key: ${{ secrets.LANGTRAIN_API_KEY }}
        model-id: ${{ vars.MODEL_ID }}
        deployment-name: production-api

On this page

Quick DeployContainer DeploymentLoad BalancingMonitoring & AlertsCI/CD Integration