Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LLM Development Landscape

LLM Development Landscape

LLM Development Landscape
Presented at Data + AI Day 2024
6th October 2024

Avatar for Kamolphan Liwprasert

Kamolphan Liwprasert

October 06, 2024
Tweet

More Decks by Kamolphan Liwprasert

Other Decks in Technology

Transcript

  1. LLM Development Landscape ✨ Overview ภาพรวมในการพัฒนาแอป LLM ✨ Concept ที่น่ารู้เกี่ยวกับ

    LLM ✨ Dev Application LLM อย่างไรได้บ้าง ✨ มี framework อะไรให้เลือกใช้บ้าง
  2. นิยาม LLM A large language model (LLM) is a computational

    model capable of language generation or other natural language processing tasks. https://en.wikipedia.org/wiki/Large_language_model
  3. นิยาม Multimodal LLM Multimodal = characterized by several different modes

    of activity or occurrence. https://research.google/blog/multimodal-medical-ai/
  4. Model Serving Application 📱 💻 🌐 Model 🤖 API API

    = “Client - Server” Client Server
  5. Why self-host LLM? 💲 Cost efficient in long term (ie.

    on-premise) → Need to tune the latency to make the model faster ⚙ Customization & fine-tuning → No lock-in to a particular model 🔒 Security compliance & data residency / privacy
  6. LangChain 🦜🔗 Python / JS library framework for developing applications

    powered by large language models (LLMs). https://www.langchain.com/langchain
  7. LlamaIndex Turn your enterprise data into production-ready LLM applications. (Python

    / Typescript) https://www.llamaindex.ai/
  8. Semantic Kernel from Microsoft Semantic Kernel is an SDK that

    integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. https://github.com/microsoft/semantic-kernel
  9. vLLM = Model serving for LLM Easy, fast, and cheap

    LLM serving for everyone vLLM is fast with: ✅ State-of-the-art serving throughput ✅ Efficient management of attention key and value memory with PagedAttention ✅ Continuous batching of incoming requests ✅ Fast model execution with CUDA/HIP graph ✅ Quantization: GPTQ, AWQ, SqueezeLLM, FP8 KV Cache ✅ Optimized CUDA kernels https://github.com/vllm-project/vllm Throughput: Higher is better
  10. Responsible AI ✅ ตรวจสอบความถูกต้องเสมอ ✅ Human-centered Design ออกแบบสําหรับคนใช้ ⚠ ระวังเรื่อง

    Data Privacy ความเป็นส่วนตัวของข้อมูล ⚠ Biases and Fairness ทําให้มีความเป็นธรรมกับผู้ใช้
  11. Sunday 3 November 2024 @ K+ Building Samyan Register now:

    bit.ly/devfest-cloud-bkk24 Saturday 26 October 2024 @ Cleverse Register now: bit.ly/technologista-2024 ฝาก event :) Technologista By PyLadies x Women Techmakers DevFest Cloud Bangkok By GDG Cloud Bangkok
OSZAR »