Build software better, together

little51 / llm-dev

Star

《大模型项目实战：多领域智能应用开发》配套资源

chat-application llm llm-training llm-inference llm-deployment

Updated Mar 14, 2026
JavaScript

eos3-ai / bubble-rag

Star

rag llm-training llm-deployment rag-chatbot

Updated Sep 25, 2025
Python

A curated collection of open-source Large Language Model (LLM) projects that are production-ready and can be used for solving real-world problems. This repository focuses on high-performance, scalable LLM solutions across various industries and applications.

data-science production agents fine-tuning rag large-language-models llm llm-deployment

Updated May 29, 2026

paralleliq / piqc-knowledge-base

Star

Production-ready checklists and frameworks for deploying LLMs, GenAI models, and AI infrastructure. Covers vLLM, Kubernetes, GPU optimization, observability, compliance, and Day-0 to Day-2 operations.

kubernetes machine-learning deployment optimization best-practices checklists model-serving gpu-optimization mlops production-readiness ai-governance vllm genai llm-deployment ai-infrastructure

Updated Apr 15, 2026
Shell

emineugurlu / emineugurlu

Star

🚀 AI-Native Developer & Computer Engineer | Designing scalable cognitive ecosystems through advanced AI orchestration. Specializing in LLM deployment, Semantic Q&A, and high-performance backend architecture. "I don’t just write code — I orchestrate intelligence."

python open-source computer-engineering fastapi llm-deployment ai-orchestration cognitive-ecosystems high-performance-backend

Updated Jun 29, 2026

BjornMelin / local-llm-workbench

Star

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

cuda gpu-acceleration model-management inference-optimization model-quantization cpu-inference llama-cpp local-llm llm-deployment llm-benchmarking ollama-optimization hybrid-inference wsl-ai-setup context-window-scaling

Updated Mar 27, 2025
Shell

ajmaluk / 5-Day-AI-Agents-Intensive-Course-with-Google

Sponsor

Star

5-Day Hands-on AI Agents Course using Google ADK & Vertex AI | From first agent to production deployment

memory mcp multi-agent gemini observability ai-agents rag kaggle-course vertex-ai llm generative-ai llm-deployment agentic-workflows model-context-protocol google-adk

Updated Nov 25, 2025
Jupyter Notebook

SangiSI / llm-foundations-and-systems

Star

Structured repository covering LLM foundations, fine-tuning workflows, optimization strategies, deployment patterns, evaluation methods, and Responsible AI considerations.

python machine-learning fine-tuning responsible-ai large-language-models prompt-engineering llm-evaluation llm-deployment llm-optimization llm-monitoring llm-fundamentals

Updated Dec 19, 2025
Jupyter Notebook

davide97l / LLM-deploy-API

Star

API to efficiently deploy Language Model (LLM) applications using Flask API

flask-application flask-api llm-inference llm-deployment

Updated Feb 26, 2024

paralleliq / modelspec

Star

ModelSpec is an open, declarative specification for describing how AI models especially LLMs are deployed, served, and operated in production. It captures execution, serving, and orchestration intent to enable validation, reasoning, and automation across modern AI infrastructure.

kubernetes runtime json-schema inference autoscaling observability model-deployment model-serving declarative-config production-ai mlops ai-systems llm ai-ops gpu-inference vllm llm-deployment ai-infrastructure modelspec

Updated Apr 27, 2026
Python

linny006 / llmops-radar

Star

Live index of the newest LLMOps tooling — track what's shipping in LLM observability and deployment

machine-learning developer-tools awesome-list live-data mlops ai-engineering ai-tools auto-updated ai-monitoring ai-observability llmops prompt-management llm-deployment llm-observability ai-infrastructure llm-gateway llm-stack awesome-llmops

Updated Jun 30, 2026
Python

SabaSyed / Latex-to-Code

Star

Python codes generation from latex expressions. Using synthetic dataset and CodeT5-base model.

transformers postprocessing synthetic-data fine-tuning transformer-models t5-model inferencing large-language-models llm t5-base llm-deployment llm-models latex-to-python

Updated Oct 16, 2024
Jupyter Notebook

hexgrid-cloud / open-llm-benchmarks

Star

Open-source LLM inference benchmarks — TTFT, TPOT, Throughput, Latency & Cost-per-token for models like Llama, Qwen, Gemma, DeepSeek, Gpt-Oss etc. deployed on different dedicated GPUs.

gpu inference benchmarks llama quantization tensor-rt llm vllm open-source-llm qwen llm-deployment sglang

Updated Jun 18, 2026
Python

mairwunnx-infra / zeroclaw

Star

🧠 Инфраструктура для деплоймента Zeroclaw, докерезированная и совместимая с Portainer

docker docker-compose portainer llm-deployment ai-deployment openclaw zeroclaw openclaw-docker zeroclaw-docker

Updated May 28, 2026
Dockerfile

shyamsridhar123 / llmops-project

Star

LLMOps infrastructure and deployment pipelines using Bicep. Azure-based MLOps for large language model deployment and management.

azure infrastructure-as-code mlops bicep llmops llm-deployment

Updated Aug 16, 2024
Bicep

ajithvcoder / emlo4-session-16-ajithvcoder

Star

AWS EKS + IRSA, Volumes, ISTIO & KServe+ NextJS App + Fastapi Serve + kubernetes + Helm charts + Multimodel or LLM-Deployment The School of AI EMLO-V4 course assignment https://theschoolof.ai/#programs

docker kubernetes nextjs istio aws-eks fastapi kserve llm-deployment

Updated Jan 26, 2025
Python

HAYDARKILIC / mlops_and_deployment

Star

A 6-week hands-on masterclass in production MLOps engineering. Build a file-backed experiment tracker, a containerized model registry, an inference server with dynamic batching, a cached pipeline DAG, automated drift detectors (PSI/KS), and high-performance LLM serving infra (KV-cache/continuous batching) from scratch.