L

Initializing Studio...

Documentation
Last updated: October 10, 2025

Getting Started

  • Introduction
  • Quick Start
  • Installation

Fine-tuning

  • LoRA & QLoRA
  • Full Fine-tuning

API & SDK

  • REST API
  • Python SDK

Deployment

  • Cloud Deployment
  • Security

Resources

  • FAQ
  • Changelog

Full Fine-tuning

Complete guide to full parameter fine-tuning for maximum model customization and performance.

When to Use Full Fine-tuning

Full fine-tuning is recommended when:

Maximum Performance: You need the absolute best performance for your task

Domain Adaptation: Adapting to a very different domain from the base model

Task Specialization: Creating highly specialized models for specific tasks
Code Example
# Full fine-tuning configuration
config = {
    "method": "full",
    "model": "llama-2-7b",
    "learning_rate": 1e-5,
    "batch_size": 8,
    "gradient_accumulation_steps": 4,
    "epochs": 3,
    "warmup_steps": 500,
    "weight_decay": 0.01,
    "optimizer": "adamw",
    "scheduler": "cosine"
}

Dataset Preparation

Prepare your dataset for full fine-tuning:

Data Quality: High-quality, diverse training data is crucial

Format: Support for various formats including JSONL, CSV, and Parquet
Code Example
# Example training data format
{
    "instruction": "Summarize the following text:",
    "input": "Large language models have shown remarkable capabilities...",
    "output": "LLMs demonstrate strong performance across many NLP tasks."
}

# Upload dataset
dataset = client.datasets.upload(
    file_path="full_training_data.jsonl",
    name="full-finetune-dataset",
    validation_split=0.1
)

Training Configuration

Configure your full fine-tuning job:

Hardware Requirements: Full fine-tuning requires significant GPU memory

Training Time: Longer training times compared to LoRA methods
Code Example
# Start full fine-tuning job
job = client.fine_tune.create(
    model="mistral-7b",
    dataset=dataset.id,
    config={
        "method": "full",
        "learning_rate": 5e-6,
        "batch_size": 4,
        "epochs": 2,
        "gradient_checkpointing": True,
        "fp16": True,
        "deepspeed_stage": 2,
        "save_steps": 500,
        "logging_steps": 100,
        "evaluation_strategy": "steps",
        "eval_steps": 500
    }
)

print(f"Full fine-tuning job started: {job.id}")

Distributed Training

Scale your training across multiple GPUs:

Multi-GPU: Automatic data parallelism across available GPUs

DeepSpeed: Integration with DeepSpeed for memory-efficient training
Code Example
# Distributed training configuration
distributed_config = {
    "method": "full",
    "distributed": {
        "strategy": "deepspeed",
        "stage": 3,  # ZeRO stage 3 for maximum memory efficiency
        "gradient_clipping": 1.0,
        "allgather_bucket_size": 2e8,
        "reduce_bucket_size": 2e8
    },
    "hardware": {
        "gpu_count": 8,
        "instance_type": "gpu-large",
        "gradient_accumulation_steps": 16
    }
}

# Launch distributed training
job = client.fine_tune.create(
    model="llama-2-13b",
    dataset=dataset.id,
    config=distributed_config
)

Monitoring Training

Monitor your full fine-tuning progress:

Metrics: Track loss, learning rate, and validation metrics

Early Stopping: Automatic early stopping to prevent overfitting
Code Example
# Monitor training progress
while job.status in ["queued", "running"]:
    job = client.fine_tune.get(job.id)
    
    if job.metrics:
        print(f"Step: {job.metrics.step}")
        print(f"Training Loss: {job.metrics.train_loss:.4f}")
        print(f"Validation Loss: {job.metrics.eval_loss:.4f}")
        print(f"Learning Rate: {job.metrics.learning_rate:.2e}")
    
    time.sleep(60)

print(f"Training completed with status: {job.status}")

Best Practices

Tips for successful full fine-tuning:

Learning Rate: Start with lower learning rates (1e-5 to 5e-6)

Regularization: Use weight decay and dropout to prevent overfitting

Validation: Always use a validation set to monitor generalization
Code Example
# Best practices configuration
best_practices_config = {
    "method": "full",
    "learning_rate": 2e-6,  # Conservative learning rate
    "weight_decay": 0.01,   # L2 regularization
    "dropout": 0.1,         # Dropout for regularization
    "gradient_clipping": 1.0,  # Prevent gradient explosion
    "early_stopping": {
        "patience": 3,
        "metric": "eval_loss",
        "min_delta": 0.001
    },
    "save_strategy": "epoch",
    "load_best_model_at_end": True
}

On this page

When to Use Full Fine-tuningDataset PreparationTraining ConfigurationDistributed TrainingMonitoring TrainingBest Practices