NPU function error: aclrtSynchronizeStream(stream_), error code is 107003

加载模型时出错：NPU function error: aclrtSynchronizeStream(stream_), error code is 107003BashOrPython。

ZEBRONE

712人浏览 · 2025-01-06 17:14:23

ZEBRONE · 2025-01-06 17:14:23 发布

快速解决方案：

加载模型时出错：NPU function error: aclrtSynchronizeStream(stream_), error code is 107003

Bash

export PYTORCH_NPU_ALLOC_CONF=expandable_segments:False

Python

os.environ['PYTORCH_NPU_ALLOC_CONF'] = 'expandable_segments:False'

一、环境

系统：aarch64 aarch64 GNU/Linux Eulerosv2r13
设备：910B3
torch==2.1.0
torch-npu==2.1.0.post3-20240523
python=3.9.10
CANN 8.0.T13

二、问题现象（附报错日志上下文）：

加载模型时出错：

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "/home/zsy/zsy/Models/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("/home/zsy/zsy/Models/Qwen2-VL-7B-Instruct")

报错traceback:

(agent) [root@03e78bdcc010 Vid-RAG]# python test_env.py
tools module loaded
/home/ma-user/anaconda3/envs/agent/lib/python3.9/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards:   0%|                                                            | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/zsy/zsy/Agent-4-Video/Vid-RAG/test_env.py", line 43, in <module>
    model = Qwen2VLForConditionalGeneration.from_pretrained(
  File "/home/ma-user/anaconda3/envs/agent/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4264, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/ma-user/anaconda3/envs/agent/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4777, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/ma-user/anaconda3/envs/agent/lib/python3.9/site-packages/transformers/modeling_utils.py", line 942, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/ma-user/anaconda3/envs/agent/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 397, in set_module_tensor_to_device
    clear_device_cache()
  File "/home/ma-user/anaconda3/envs/agent/lib/python3.9/site-packages/accelerate/utils/memory.py", line 56, in clear_device_cache
    torch.npu.empty_cache()
  File "/home/ma-user/anaconda3/envs/agent/lib/python3.9/site-packages/torch_npu/npu/memory.py", line 143, in empty_cache
    torch_npu._C._npu_emptyCache()
RuntimeError: unmapHandles:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:401 NPU function error: aclrtSynchronizeStream(stream_), error code is 107003
[ERROR] 2025-01-03-22:31:30 (PID:91978, Device:0, RankID:-1) ERR00100 PTA call acl api failed
[Error]: The stream is not in the current context.
        Check whether the context where the stream is located is the same as the current context.
EE9999: Inner Error!
EE9999: 2025-01-03-22:31:30.241.883  Stream synchronize failed, stream is not in current ctx, stream_id=2.[FUNC:StreamSynchronize][FILE:api_impl.cc][LINE:1018]
        TraceBack (most recent call last):
        rtStreamSynchronize execute failed, reason=[stream not in current context][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
        synchronize stream failed, runtime result = 107003[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

三、解决方法

huangyunlong
2个月前

建议关闭虚拟内存看下，

export PYTORCH_NPU_ALLOC_CONF=expandable_segments:False

来源：RuntimeError: unmapHandles:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:401 NPU function error: aclrtSynchronizeStream(stream_), error code is 107003 · Issue #IB361T · Ascend/pytorch - Gitee.com

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

UFW防火墙安全指南

UFW（Uncomplicated Firewall）是Ubuntu/Debian系统中简化防火墙管理的工具，通过直观命令帮助用户有效控制网络流量，提升系统安全性。文章详细介绍了UFW的基本命令，包括启停防火墙、添加规则、限制连接速率和日志配置等操作，并提供了安全最佳实践，如默认拒绝策略、IP地址限制和服务级规则管理。同时，还涵盖高级配置技巧，例如多网络接口设置、规则优先级调整、IPv6支持及与f