-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Checklist / 检查清单
- I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues,确认这是一个新的问题与讨论。
Question Description / 问题描述
python代码做Qwen3的推理时,如何关闭thinking,3.12版本
https://swift.readthedocs.io/en/v3.12/Instruction/Inference-and-deployment.html
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.infer_engine import TransformersEngine, RequestConfig, InferRequest
model = 'Qwen/Qwen2.5-0.5B-Instruct'
# Load the inference engine
engine = TransformersEngine(model, max_batch_size=2)
request_config = RequestConfig(max_tokens=512, temperature=0)
# Using 2 infer_requests to demonstrate batch inference
infer_requests = [
InferRequest(messages=[{'role': 'user', 'content': 'Who are you?'}]),
InferRequest(messages=[{'role': 'user', 'content': 'Where is the capital of Zhejiang?'},
{'role': 'assistant', 'content': 'The capital of Zhejiang Province, China, is Hangzhou.'},
{'role': 'user', 'content': 'What are some fun places here?'}]),
]
resp_list = engine.infer(infer_requests, request_config)
query0 = infer_requests[0].messages[0]['content']
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested