TensorRT 是 NVIDIA 的深度学习推理优化库,可以将 PyTorch 模型优化 2-5 倍速度。
优点:
缺点:
优点:
缺点:
# 安装 Torch-TensorRT
pip install torch-tensorrt
# 验证安装
python -c "import torch_tensorrt; print('TensorRT installed successfully')"
在 demo_lightglue_camera_position_async.py 中添加:
1. 在文件开头添加导入:
try:
from tensorrt_wrapper import create_tensorrt_models
TENSORRT_AVAILABLE = True
except ImportError:
TENSORRT_AVAILABLE = False
2. 在 argparse 中添加参数:
parser.add_argument("--use_tensorrt", action="store_true",
help="Use TensorRT optimized models")
parser.add_argument("--tensorrt_precision", type=str, default="fp16",
choices=["fp32", "fp16", "int8"],
help="TensorRT precision mode")
3. 在模型加载后(约第338行)添加:
print("Loaded SuperPoint and LightGlue models")
# TensorRT 优化
if opt.use_tensorrt and TENSORRT_AVAILABLE and device == "cuda":
try:
print("Compiling models with TensorRT...")
extractor, matcher = create_tensorrt_models(
extractor, matcher, precision=opt.tensorrt_precision
)
print("✓ TensorRT models compiled successfully")
except Exception as e:
print(f"✗ Failed to compile with TensorRT: {e}")
print("Falling back to PyTorch models")
# 首次运行(会编译模型,需要5-15分钟)
python demo_lightglue_camera_position_async.py \
--input "udp://0.0.0.0:12346" \
--max_keypoints 128 \
--use_fp16 \
--use_tensorrt \
--tensorrt_precision fp16
# 第二次运行(直接加载编译好的模型,很快)
python demo_lightglue_camera_position_async.py \
--input "udp://0.0.0.0:12346" \
--max_keypoints 128 \
--use_tensorrt
| 模式 | FPS | 提升 |
|---|---|---|
| PyTorch FP32 | ~15-18 | 基准 |
| PyTorch FP16 | ~22-24 | +30% |
| TensorRT FP16 | ~35-45 | +100-150% |
| TensorRT INT8 | ~50-60 | +200-250% |
错误: torch_tensorrt not found
解决:
pip install torch-tensorrt
错误: CUDA out of memory 或 CUDA version mismatch
解决:
nvidia-smiworkspace_size 或 max_keypoints错误: Unsupported operation: XXX
解决:
torch_executed_ops 参数排除这些操作解决:
在 tensorrt_wrapper.py 中修改:
self.trt_model = torch_tensorrt.compile(
self.model,
inputs=example_input,
enabled_precisions={torch.half},
workspace_size=2 << 30, # 2GB(增加内存)
min_block_size=7, # 最小块大小
torch_executed_ops=["unsupported_op"], # 排除不支持的操作
)
# 需要校准数据(使用真实图像)
python demo_lightglue_camera_position_async.py \
--use_tensorrt \
--tensorrt_precision int8
tensorrt_wrapper.py: TensorRT 包装器convert_to_tensorrt.py: ONNX → TensorRT 转换脚本tensorrt_optimization_guide.md: 详细指南tensorrt_integration_example.py: 集成示例