Rate This Document
Findability
Accuracy
Completeness
Readability

Introduction to vLLM-Router

Project Introduction

vLLM-Router is a routing plugin contributed by Kunpeng to the vLLM open-source community. It aims to support parallel data deployment and provide high-performance request routing and load balancing capabilities. This router includes multiple load balancing algorithms, such as prefix cache awareness, random, and round-robin, which can be flexibly chosen based on actual scenarios to optimize overall system performance.

Among these features, the prefix cache function can send prompts with the same prefix to the same service instance. Combined with vLLM's prefix-cache feature, this significantly reduces the time to first token (TTFT) and improves end-to-end throughput.

Directory Structure

vllm-router/
├── router                    
│   ├── __init__.py        
│   └── protocol.py              # Routing API definition
├── src                          # Source code implementation directory
│   ├── __init__.py        
│   ├── router.py                # Routing function implementation
│   └── tree.py                  # Prefix cache tree implementation
├── test                      
│   ├── online_test1.py          # Simulating the first server
│   ├── online_test2.py          # Simulating the second server
│   ├── test_router.py           # Routing function unit test
│   ├── test_server.py           # Server unit test
│   └── test_tree.py             # Prefix cache tree unit test
├── utils                     
│   ├── __init__.py            
│   ├── error.py                 # Customized errors
│   └── logger.py                # Log
├── docs 
|   └── en                       # Document directory
│      ├── api_reference.md      # API reference            
│      ├── menu_vllm_router.md   # Documentation guide           
│      ├── release_notes.md      # Basic information and feature updates of each released version
│      └── user_guide.md         # User guide
├── LICENSE                      # Open-source license file
├── CC-BY                        # Open-source document license file
├── README_en.md                 # Project description
├── launch_server.py             # Service startup entry script
└── requirements.txt             # Python dependency list

Release Notes

For details about the vLLM-Router version description, see Release Notes.

Documents

Resource Type Resource Name Resource Description
Document API Reference Provides the vLLM-Router API description.
Document Release Notes Provides basic information and feature updates of each vLLM-Router version.
Document User Guide Provides a quick start guide for vLLM-Router.

Communication Matrix

Source Device Source IP Address Source Port Destination Device Destination IP Address Destination Port (Listening) Protocol Port Description Listening Port Configurable (Yes/No) Authentication Mode Encryption Mode Plane Version Special Scenario
User server IP address of the user server * Server running the vLLM-Router service IP address of the server running the vLLM-Router service 7000–9000 HTTP/HTTPS Receiving inference requests from users Yes N/A N/A Service plane All None

Contribution Statement

We welcome your contributions to the community. If you have any questions/suggestions or want to provide feedback on feature requirements and bug reports, you can submit issues. For details, see the contribution guideline. You are also welcome to share insights in the Discussions. Thank you for your support.

Disclaimer

This code repository contributes to the vLLM open-source project by adding the parallel data deployment capability to vLLM. It strictly adheres to the coding style and methods, as well as security design of the native open-source software. Any vulnerability and security issues of the software shall be resolved by the corresponding upstream communities according to their response mechanisms. Please pay attention to the notifications and version updates released by the upstream communities. The Kunpeng computing community does not assume any responsibility for software vulnerabilities and security issues.

License

This project is released under the Apache License 2.0. For details, see LICENSE. The documents of this project are licensed under CC-BY 4.0. For details, see CC-BY.

Acknowledgments

vLLM-Router is jointly developed by the following Huawei department:

Kunpeng Computing BoostKit Development Dept

Thank you to everyone in the community for your PRs. We warmly welcome contributions to vLLM-Router!