MLOps in 2025: Deploying ML Models to Production the Right Way

You've trained a model. Accuracy is great. You show it to stakeholders and everyone is excited. Then you try to deploy it and everything falls apart. The model works on your laptop but breaks in the cloud. Data drifts and accuracy silently degrades. There's no way to roll back to the previous version. Retraining takes a weekend of manual effort. This is the reality of ML without MLOps.

MLOps (Machine Learning Operations) is the set of practices, tools, and culture that bridges the gap between model development and reliable production deployment. In 2025, any serious ML team — whether at a Kathmandu startup or a global tech company — needs MLOps discipline. This guide walks you through the complete modern MLOps stack, with real code you can use today.

The MLOps Lifecycle

MLOps is not a one-time setup — it's a continuous cycle. Unlike traditional software, ML systems have an extra challenge: they degrade over time as the world changes (data drift), and improving them requires retraining on new data.

MLOps Lifecycle — Continuous Loop

ML
System

📊

Data

🏋️

Train

📈

Evaluate

🚀

Deploy

👁️

Monitor

🔄

Feedback

⚠️The Pain Points Without MLOps

Works on my machine: Model trained on Python 3.9 breaks on the server running Python 3.11
Silent accuracy degradation: Input data distribution shifts; no one notices until customers complain
No experiment tracking: You don't remember which hyperparameters gave the best result last month
Manual deployments: Every update requires SSHing into servers and running scripts by hand
No rollback: If the new model breaks something, there's no quick way to revert
Data versioning chaos: Different team members train on different data slices

The Modern MLOps Stack

Category	Tool	Purpose	Free Tier?
Experiment Tracking	MLflow	Log metrics, params, models, artifacts	Yes — self-hosted
Experiment Tracking	Weights & Biases	Cloud-based tracking with rich visualisations	Yes — 100GB storage
Data Versioning	DVC	Git for large datasets and model files	Yes — open source
Model Serving	FastAPI + Uvicorn	Lightweight, async Python API server	Yes — open source
Containerisation	Docker	Package app + dependencies into portable images	Yes — free for public
Container Orchestration	Kubernetes	Scale and manage containerised ML services	GKE Autopilot free tier
Cloud Deployment	Railway / Render	Simple PaaS for deploying Docker containers	Yes — generous free tier
CI/CD	GitHub Actions	Automate build, test, deploy on git push	Yes — 2000 min/month
Model Registry	MLflow Registry	Version and stage models (Staging/Production)	Yes — self-hosted
Monitoring	Prometheus + Grafana	Metrics collection and dashboards	Yes — self-hosted
Feature Store	Feast	Manage and serve ML features consistently	Yes — open source

Step-by-Step Guide

Step 1: Train and Track with MLflow

MLflow experiment tracking is the foundation of MLOps. Every training run should log its parameters, metrics, and artifacts so you can compare experiments and reproduce results.

python

# train.py — Model training with MLflow experiment tracking
import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, roc_auc_score, classification_report
)
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import json
import os

# ──────────────────────────────────────────────────────────
# Configuration — all hyperparameters in one place
# ──────────────────────────────────────────────────────────
CONFIG = {
    "n_estimators": 200,
    "max_depth": 5,
    "learning_rate": 0.05,
    "subsample": 0.8,
    "min_samples_split": 10,
    "test_size": 0.2,
    "random_state": 42,
}


def load_data():
    """Load your dataset here. Replace with actual data loading."""
    from sklearn.datasets import make_classification
    X, y = make_classification(
        n_samples=5000, n_features=20, n_informative=15,
        n_redundant=5, random_state=42
    )
    return pd.DataFrame(X), pd.Series(y)


def train_model():
    # Set MLflow tracking URI — use a local server or remote
    mlflow.set_tracking_uri("http://localhost:5000")
    mlflow.set_experiment("fraud-detection-v2")

    with mlflow.start_run(run_name="GBM-experiment-1") as run:
        print(f"MLflow Run ID: {run.info.run_id}")

        # ── Load & Split Data ──
        X, y = load_data()
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=CONFIG["test_size"],
            random_state=CONFIG["random_state"], stratify=y
        )

        # ── Log configuration ──
        mlflow.log_params(CONFIG)
        mlflow.log_param("train_samples", len(X_train))
        mlflow.log_param("test_samples",  len(X_test))
        mlflow.log_param("n_features",    X.shape[1])

        # ── Build pipeline (scaler + model) ──
        pipeline = Pipeline([
            ("scaler", StandardScaler()),
            ("model", GradientBoostingClassifier(
                n_estimators   = CONFIG["n_estimators"],
                max_depth      = CONFIG["max_depth"],
                learning_rate  = CONFIG["learning_rate"],
                subsample      = CONFIG["subsample"],
                min_samples_split = CONFIG["min_samples_split"],
                random_state   = CONFIG["random_state"],
            ))
        ])

        # ── Train ──
        print("Training model...")
        pipeline.fit(X_train, y_train)

        # ── Evaluate ──
        y_pred  = pipeline.predict(X_test)
        y_proba = pipeline.predict_proba(X_test)[:, 1]

        metrics = {
            "accuracy":  accuracy_score(y_test, y_pred),
            "precision": precision_score(y_test, y_pred),
            "recall":    recall_score(y_test, y_pred),
            "f1":        f1_score(y_test, y_pred),
            "roc_auc":   roc_auc_score(y_test, y_proba),
        }

        # ── Log metrics ──
        mlflow.log_metrics(metrics)

        # ── Cross-validation ──
        cv_scores = cross_val_score(pipeline, X, y, cv=5, scoring="roc_auc")
        mlflow.log_metric("cv_roc_auc_mean", cv_scores.mean())
        mlflow.log_metric("cv_roc_auc_std",  cv_scores.std())

        print("
Results:")
        for k, v in metrics.items():
            print(f"  {k}: {v:.4f}")
        print(f"  CV ROC-AUC: {cv_scores.mean():.4f} ± {cv_scores.std():.4f}")

        # ── Save and log model ──
        mlflow.sklearn.log_model(
            pipeline,
            "model",
            registered_model_name="fraud-detector",
            input_example=X_test.iloc[:5],
            signature=mlflow.models.infer_signature(X_test, y_pred),
        )

        print(f"
Model saved to MLflow. Run ID: {run.info.run_id}")
        return run.info.run_id


if __name__ == "__main__":
    run_id = train_model()

Step 2: Serve with FastAPI

python

# serve.py — Production FastAPI model server
from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel, Field
import mlflow.sklearn
import numpy as np
import pandas as pd
import logging
import time
import os
from contextlib import asynccontextmanager
from prometheus_client import Counter, Histogram, generate_latest, CONTENT_TYPE_LATEST
from fastapi.responses import Response

# ── Logging ──
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# ── Prometheus metrics ──
REQUEST_COUNT    = Counter("predictions_total", "Total predictions", ["status"])
REQUEST_LATENCY  = Histogram("prediction_latency_seconds", "Prediction latency")
INPUT_ERRORS     = Counter("input_errors_total", "Invalid input count")

# ── Global model ──
model = None


@asynccontextmanager
async def lifespan(app: FastAPI):
    global model
    model_uri = os.getenv("MODEL_URI", "models:/fraud-detector/Production")
    logger.info(f"Loading model from: {model_uri}")
    try:
        model = mlflow.sklearn.load_model(model_uri)
        logger.info("Model loaded successfully")
    except Exception as e:
        logger.error(f"Failed to load model: {e}")
        raise
    yield


app = FastAPI(
    title="Fraud Detection API",
    description="ML model serving with MLflow + FastAPI",
    version="2.0.0",
    lifespan=lifespan,
)

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)


class PredictionRequest(BaseModel):
    features: list[float] = Field(..., description="Model input features", min_length=1)


class PredictionResponse(BaseModel):
    prediction: int
    probability: float
    confidence: str
    latency_ms: float


@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    if model is None:
        raise HTTPException(status_code=503, detail="Model not loaded")

    start = time.time()

    try:
        X = np.array(request.features).reshape(1, -1)
        pred  = int(model.predict(X)[0])
        proba = float(model.predict_proba(X)[0, 1])
    except Exception as e:
        INPUT_ERRORS.inc()
        REQUEST_COUNT.labels(status="error").inc()
        logger.error(f"Prediction error: {e}")
        raise HTTPException(status_code=422, detail=f"Prediction failed: {e}")

    latency = (time.time() - start) * 1000
    REQUEST_COUNT.labels(status="success").inc()
    REQUEST_LATENCY.observe(latency / 1000)

    confidence = "high" if proba > 0.8 or proba < 0.2 else "medium" if proba > 0.6 or proba < 0.4 else "low"

    return PredictionResponse(
        prediction=pred,
        probability=round(proba, 4),
        confidence=confidence,
        latency_ms=round(latency, 2),
    )


@app.get("/metrics")
def metrics():
    """Prometheus metrics endpoint."""
    return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)


@app.get("/health")
def health():
    return {"status": "healthy", "model_loaded": model is not None}

Step 3: Dockerise the Service

dockerfile

# Dockerfile — Multi-stage build for production ML API
# ──────────────────────────────────────────────────────────
# Stage 1: Build stage (installs all deps including build tools)
# ──────────────────────────────────────────────────────────
FROM python:3.11-slim AS builder

WORKDIR /build

# Install system build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc g++ libgomp1 && \
    rm -rf /var/lib/apt/lists/*

# Install Python dependencies into a virtual env
COPY requirements.txt .
RUN python -m venv /opt/venv && \
    /opt/venv/bin/pip install --no-cache-dir --upgrade pip && \
    /opt/venv/bin/pip install --no-cache-dir -r requirements.txt


# ──────────────────────────────────────────────────────────
# Stage 2: Runtime stage (lean production image)
# ──────────────────────────────────────────────────────────
FROM python:3.11-slim AS runtime

# Create non-root user for security
RUN useradd -m -u 1000 appuser
WORKDIR /app

# Copy virtual env from builder
COPY --from=builder /opt/venv /opt/venv

# Activate venv
ENV PATH="/opt/venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Copy application code
COPY --chown=appuser:appuser serve.py .

USER appuser

EXPOSE 8000

# Health check — Docker will restart unhealthy containers
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"

# Run with Gunicorn (production WSGI) + Uvicorn workers
CMD ["gunicorn", "serve:app", "--workers", "2", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "120"]

Step 4: Deploy to Railway or Render

For small to medium ML APIs, Railway and Render are the best options for Nepali developers — no complex Kubernetes setup, generous free tiers, and automatic HTTPS.

bash

# ──────────────────────────────────────────────────────────
# Option A: Deploy to Railway
# ──────────────────────────────────────────────────────────

# 1. Install Railway CLI
npm install -g @railway/cli

# 2. Login and initialise project
railway login
railway init  # Creates railway.json

# 3. Add environment variables
railway variables set OPENAI_API_KEY=sk-xxx
railway variables set MODEL_URI=models:/fraud-detector/Production
railway variables set MLFLOW_TRACKING_URI=https://your-mlflow-server.com

# 4. Deploy (Railway auto-detects Dockerfile)
railway up

# Your API is live at: https://your-app.railway.app


# ──────────────────────────────────────────────────────────
# Option B: Deploy to Render (via render.yaml)
# ──────────────────────────────────────────────────────────
cat > render.yaml << 'EOF'
services:
  - type: web
    name: fraud-detection-api
    runtime: docker
    dockerfilePath: ./Dockerfile
    region: singapore   # closest to Nepal
    plan: starter       # $7/month — 512MB RAM, 0.5 CPU
    healthCheckPath: /health
    envVars:
      - key: MODEL_URI
        value: models:/fraud-detector/Production
      - key: OPENAI_API_KEY
        sync: false    # Set in Render dashboard (secret)
EOF

# Push to GitHub and connect repo in Render dashboard
git add render.yaml && git commit -m "Add Render config"
git push origin main

Step 5: CI/CD with GitHub Actions

CI/CD Pipeline — Automated Deployment Flow

📤git push

→

⚡GitHub Actions triggered

→

🧪Run tests

→

🐳Docker build

→

📦Push to GHCR

→

🚀Deploy to Render

yaml

# .github/workflows/deploy.yml
# Runs on every push to main — tests, builds, and deploys
name: MLOps CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}/fraud-detection-api

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python 3.11
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: "pip"

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-asyncio httpx

      - name: Run unit tests
        run: pytest tests/ -v --tb=short

      - name: Run model validation
        run: python scripts/validate_model.py
        env:
          MODEL_URI: ${{ secrets.MODEL_URI }}
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'

    permissions:
      contents: read
      packages: write

    steps:
      - uses: actions/checkout@v4

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: |
            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'

    steps:
      - name: Trigger Render deployment
        run: |
          curl -s -X POST \
            -H "Authorization: Bearer ${{ secrets.RENDER_API_KEY }}" \
            "https://api.render.com/v1/services/${{ secrets.RENDER_SERVICE_ID }}/deploys" \
            -H "Content-Type: application/json" \
            -d '{"clearCache": false}'

      - name: Wait for deployment
        run: |
          echo "Waiting for Render to deploy..."
          sleep 60
          curl -sf https://your-app.onrender.com/health || exit 1
          echo "Deployment successful!"

💡Use DVC for Data Versioning

DVC (Data Version Control) is Git for large files — datasets, model checkpoints, and feature stores. It stores file metadata in Git and the actual data in cheap object storage (S3, GCS, Azure Blob).

pip install dvc[s3]

dvc init

dvc add data/train.csv # tracks with checksum

dvc remote add -d myremote s3://my-ml-bucket/dvc

dvc push # upload to S3

# teammates can now run:

dvc pull # get exact same data

Conclusion

Building a Reliable ML System

The stack we've covered — MLflow for experiment tracking, FastAPI for serving, Docker for packaging, GitHub Actions for CI/CD, and Railway/Render for deployment — gives you a production-grade MLOps foundation without requiring a dedicated DevOps team.

Start small: add MLflow tracking to your existing training scripts this week. Then containerise your best model. Then automate the deployment. Each step independently adds value and you can stop at any point.

In Nepal's growing AI ecosystem, the engineers who can both build models and reliably deploy them are the rarest and most valuable. MLOps is your competitive advantage.