BoostAI Infra

Getting Started

What's new
Provides the latest updates in documents of BoostAI Infra.

Open-Source Enablement

Dify
Provides guidance on version compatibility verification, and source code installation, compilation, and build for the open-source software Dify.
FlashAttention
Provides guidance on installation verification and source code compilation and build for the open-source software FlashAttention.
LangChain
Provides guidance on basic compatibility verification and source code compilation and build for the open-source software LangChain.
LlamaIndex
Provides guidance on installation verification and source code compilation and build for the open-source software LlamaIndex.
NumPy
Provides guidance on installation verification and source code compilation and build for the open-source software NumPy.
Ollama
Provides guidance on installation verification and source code compilation and build for the open-source software Ollama.
OpenClaw
Provides guidance on installation verification and source code compilation and build for the open-source software OpenClaw.
Paddle Inference
Provides guidance on installation, basic verification, and source code compilation and build for the open-source software Paddle Inference.
PaddlePaddle
Provides guidance on installation, basic verification, and source code compilation and build for the open-source software PaddlePaddle.
PyTorch
Provides guidance on installation, basic function verification, and source code compilation and build for the open-source software PyTorch.
Safetensors
Provides guidance on installation, basic function verification, and source code compilation and build for the open-source software Safetensors.
SGLang
Provides guidance on installation verification and source code compilation and build for the open-source software SGLang.
TensorFlow
Provides guidance on installation, basic function verification, and source code compilation and build for the open-source software TensorFlow.
Tokenizers
Provides guidance on installation verification and source code compilation and build for the open-source software Tokenizers.
Transformers
Provides guidance on installation verification and source code compilation and build for the open-source software Transformers.
vLLM
Provides guidance on version compatibility verification, basic installation verification, and source code compilation and build for the open-source software vLLM.

Acceleration

CMF
Cache Management Framework (CMF) is developed based on the Kunpeng hardware platform and consists of a kernel-mode driver and a command line tool. It modifies hardware registers to control the allocation of system resources such as the L2 cache.
Core Isolation
Dynamic core isolation is an optimization solution used on servers that execute both intelligent computing tasks and general-purpose computing tasks. It reduces resource contention between different tasks and minimizes the latency jitter of tasks such as data preparation and operator delivery for intelligent computing tasks.
vLLM-Router
vLLM-Router is a Kunpeng routing plugin for the vLLM open-source community. It aims to support data parallel deployment and provide high-performance request routing and load balancing capabilities.
vLLM-ops
It provides optimization patches for the Kunpeng platform based on the open-source community vLLM and vLLM-metax.

Tuning Guide

Kunpeng 920 + Atlas 800I A2 Inference Server
Details the deployment procedures for vLLM, vLLM-Ascend, and MindIE Turbo frameworks on Atlas 800I A2 inference servers running on Kunpeng 920 processors, covering both execution and tuning techniques for the DeepSeek 70B model.
Kunpeng 920 + Atlas 300I Duo Inference Card
Details the deployment of the DeepSeek 70B model in an environment with Kunpeng 920 and two Atlas 300I Duo inference cards, along with performance tuning procedures.