Introduction to vLLM-Router
Project Introduction
vLLM-Router is a routing plugin contributed by Kunpeng to the vLLM open-source community. It aims to support parallel data deployment and provide high-performance request routing and load balancing capabilities. This router includes multiple load balancing algorithms, such as prefix cache awareness, random, and round-robin, which can be flexibly chosen based on actual scenarios to optimize overall system performance.
Among these features, the prefix cache function can send prompts with the same prefix to the same service instance. Combined with vLLM's prefix-cache feature, this significantly reduces the time to first token (TTFT) and improves end-to-end throughput.
Directory Structure
vllm-router/
├── router
│ ├── __init__.py
│ └── protocol.py # Routing API definition
├── src # Source code implementation directory
│ ├── __init__.py
│ ├── router.py # Routing function implementation
│ └── tree.py # Prefix cache tree implementation
├── test
│ ├── online_test1.py # Simulating the first server
│ ├── online_test2.py # Simulating the second server
│ ├── test_router.py # Routing function unit test
│ ├── test_server.py # Server unit test
│ └── test_tree.py # Prefix cache tree unit test
├── utils
│ ├── __init__.py
│ ├── error.py # Customized errors
│ └── logger.py # Log
├── docs
| └── en # Document directory
│ ├── api_reference.md # API reference
│ ├── menu_vllm_router.md # Documentation guide
│ ├── release_notes.md # Basic information and feature updates of each released version
│ └── user_guide.md # User guide
├── LICENSE # Open-source license file
├── CC-BY # Open-source document license file
├── README_en.md # Project description
├── launch_server.py # Service startup entry script
└── requirements.txt # Python dependency list
Release Notes
For details about the vLLM-Router version description, see Release Notes.
Documents
| Resource Type | Resource Name | Resource Description |
|---|---|---|
| Document | API Reference | Provides the vLLM-Router API description. |
| Document | Release Notes | Provides basic information and feature updates of each vLLM-Router version. |
| Document | User Guide | Provides a quick start guide for vLLM-Router. |
Communication Matrix
| Source Device | Source IP Address | Source Port | Destination Device | Destination IP Address | Destination Port (Listening) | Protocol | Port Description | Listening Port Configurable (Yes/No) | Authentication Mode | Encryption Mode | Plane | Version | Special Scenario |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| User server | IP address of the user server | * | Server running the vLLM-Router service | IP address of the server running the vLLM-Router service | 7000–9000 | HTTP/HTTPS | Receiving inference requests from users | Yes | N/A | N/A | Service plane | All | None |
Contribution Statement
We welcome your contributions to the community. If you have any questions/suggestions or want to provide feedback on feature requirements and bug reports, you can submit issues. For details, see the contribution guideline. You are also welcome to share insights in the Discussions. Thank you for your support.
Disclaimer
This code repository contributes to the vLLM open-source project by adding the parallel data deployment capability to vLLM. It strictly adheres to the coding style and methods, as well as security design of the native open-source software. Any vulnerability and security issues of the software shall be resolved by the corresponding upstream communities according to their response mechanisms. Please pay attention to the notifications and version updates released by the upstream communities. The Kunpeng computing community does not assume any responsibility for software vulnerabilities and security issues.
License
This project is released under the Apache License 2.0. For details, see LICENSE. The documents of this project are licensed under CC-BY 4.0. For details, see CC-BY.
Acknowledgments
vLLM-Router is jointly developed by the following Huawei department:
Kunpeng Computing BoostKit Development Dept
Thank you to everyone in the community for your PRs. We warmly welcome contributions to vLLM-Router!