通过pip安装vLLM:
pip install vllm
加载和运行模型:
vllm serve "Qwen/Qwen3-0.6B"
使用curl调用服务:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Qwen/Qwen3-0.6B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'
