数字图像处理学习项目(四)手势识别
在自动视力检测中,手势识别是关键。
摘要
设计基于传统机器学习的手势识别方案,通过YCrCb色彩空间转换、Otsu阈值分割和形态学处理实现手部区域分割,创新性地采用弗里曼链码分析提取手势方向特征和指尖形态特征
● 构建包含126张图像的手势数据库(含复杂背景/异常角度样本),开发SVM分类器实现静态手势识别(80%准确率,单帧处理时间<0.005ms),并基于OpenCV实现动态实时识别
● 设计异常手势检测机制,通过链码特征分析对握拳、五指张开等非指向性手势触发文字反馈。
1.项目要求
在自动视力检测中,手势识别是关键。
(1)设计手部分割、特征提取及识别方案。对于异常手势,要进行文字提示或语音反馈。说明:不允许使用深度学习手势识别方法;识别方法可采用传统的机器学习方法,如BP神经网络、线性判别方法等。
(2)建立小型手势图像库:至少120幅图像,其中80幅以上背景单一、角度正常(正负误差30度以内)的图像,20 幅以上背景复杂的图像,20幅以上角度异常或手势异常的图像。
(3)进行静态手势识别,进行性能评价(识别率及识别速度)。
(4)进行动态手势识别,进行性能评价。
2.实验过程
(1)关键处理部分
(2)识别依据
其中两个识别特征分量分别是(1)链码序列中最长的连续相同的方向码;(2)统计链码序列中任一方向码在其后第40个到第65个链码中存在方向相反的方向码这种情况的次数。
对(1),识别依据是手部的根部必然位于图像边缘,该截面获得的链码必然是连续相同的,可以作为判断方向的一个依据。当然,对于呈对角线方向的手部图像无法判断准确方向。
对(2),区分表示方向的手势和没有表示方向的手势(如握拳,张开五指),关键在识别手指的形状,假设表示方向时只伸出一根食指,则在链码序列中,食指的尖端边缘存在一定的规律,即方向码在经过手指宽度后发生180°的反转,对经过形态学处理简化后的手部轮廓来说,这种变化具有规律性。当然,识别的准确度很大程度取决于手指宽度的设置,手指的大小和到摄像头的距离都会使手指宽度发生变化,由于手势图像库的限制,这里限制为固定范围的宽度。
(3)手势图像库
在同学的协助下拍摄了126张图片作为手势图片库,并划分了97张作为训练数据集,29张作为测试数据集。
训练数据集:在四个方向上各有16张图片(包含3张带有复杂背景),还有18张“握拳”,15张“张开手指”作为“看不清(unknown)”的数据(包含3张带有复杂背景)。
测试数据集:在四个方向上各有5张图片(包含2张带有复杂背景),还有4张“握拳”(包含2张带有复杂背景),5张“张开手指”。
3.实现代码
注:为了节约时间,代码借助了生成式人工智能的帮助。
(1)静态手势识别
import cv2
import numpy as np
from sklearn.svm import SVC
import os
import time
def get_chain_code(img, threshold=200):
# 查找轮廓(使用所有点且保留层级关系)
contours, _ = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
# 方向编码字典(8方向链码)
direction_map = {
(1, 0): 0,
(1, 1): 1,
(0, 1): 2,
(-1, 1): 3,
(-1, 0): 4,
(-1, -1): 5,
(0, -1): 6,
(1, -1): 7
}
all_chain_codes = []
for contour in contours:
chain_code = []
n = len(contour)
for i in range(n):
# 获取当前点和下一个点
current = contour[i][0]
next_point = contour[(i + 1) % n][0] # 循环连接首尾
# 计算坐标差
dx = next_point[0] - current[0]
dy = next_point[1] - current[1]
# 验证是否为8邻域连接
if abs(dx) > 1 or abs(dy) > 1:
raise ValueError("轮廓点之间不是8邻域连接")
# 获取链码并加入列表
code = direction_map.get((dx, dy))
if code is None:
raise ValueError(f"无效方向向量: ({dx}, {dy})")
chain_code.append(code)
all_chain_codes.append(chain_code)
return all_chain_codes
def analyze_chain_code(chain_code):
# 1. 找到最长的连续相同方向码
max_length = 0
current_length = 1
max_code = None
for i in range(1, len(chain_code)):
if chain_code[i] == chain_code[i - 1]:
current_length += 1
else:
if current_length > max_length:
max_length = current_length
max_code = chain_code[i - 1]
current_length = 1
# 检查最后一个序列
if current_length > max_length:
max_code = chain_code[-1]
# 2. 统计180°方向变化的次数(间隔40-65个链码)
change_count = 0
i = 0
length = len(chain_code)
while i < length - 65:
current_code = chain_code[i]
opposite_code = (current_code + 4) % 8 # 计算180°反向码
# 检查后续链码中是否首次出现180°反向码
for j in range(i, min(i + 65, length)):
if chain_code[j] == opposite_code and j < i + 40:
i = j - 1
break
elif chain_code[j] == opposite_code and j >= i + 40:
change_count += 1
i = j - 1
break
i += 1
i = length - 65
while i < length:
current_code = chain_code[i]
opposite_code = (current_code + 4) % 8 # 计算180°反向码
# 检查后续链码中是否首次出现180°反向码
for j in range(i, i + 65):
if j > length:
j -= length
if chain_code[j] == opposite_code and j < i + 40 -length:
i = length
break
elif chain_code[j] == opposite_code and j >= i + 40 - length:
change_count += 1
i = length
break
if j < length:
if chain_code[j] == opposite_code and j - i < 40:
i = length
break
elif chain_code[j] == opposite_code and j - i >= 40:
change_count += 1
i = length
break
i += 1
return max_code,change_count
# 加载手势数据
def load_gesture_data(data_folder):
images = []
labels = []
label_map = {0: 'up', 1: 'down', 2: 'left', 3: 'right', 4: 'unknown'}
reverse_label_map = {'up': 'up', 'down': 'down', 'left': 'left', 'right': 'right', 'unknown': 'unknown'}
# 遍历每个图像文件
for image_name in os.listdir(data_folder):
if not image_name.endswith('.png'):
continue
image_path = os.path.join(data_folder, image_name)
image = cv2.imread(image_path)
if image is not None:
# 提取标签
label_str = image_name.split('_')[0]
try:
label = int(label_str)
if label in label_map:
images.append(image)
labels.append(label)
except ValueError:
continue
return images, labels, label_map, reverse_label_map
# 图像预处理和特征提取函数
def preprocess_and_extract_features(image):
# (1) 将RGB图像转换到YCrCb颜色空间,提取Cr分量图像
ycrcb_image = cv2.cvtColor(image, cv2.COLOR_BGR2YCrCb)
cr_channel = ycrcb_image[:, :, 1]
# (2) 对Cr分量进行高斯滤波
blurred_cr = cv2.GaussianBlur(cr_channel, (5, 5), 0)
# (3) 对高斯滤波后的Cr分量图像用Otsu法做二值化阈值分割处理
_, binary_image = cv2.threshold(blurred_cr, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# 形态学操作:膨胀和腐蚀
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (50, 50))
binary_image = cv2.morphologyEx(binary_image, cv2.MORPH_OPEN, kernel) # 开运算
binary_image = cv2.morphologyEx(binary_image, cv2.MORPH_CLOSE, kernel) # 闭运算
features=[]
chain_codes = get_chain_code(binary_image)
for _, code in enumerate(chain_codes):
max_code, change_count = analyze_chain_code(code)
features.append([max_code, change_count])
return features,binary_image
# 创建输出文件夹
output_folder = r''
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# 数据文件夹路径
train_data_folder = r''
test_data_folder = r''
# 加载训练数据
train_images, train_labels, label_map, reverse_label_map = load_gesture_data(train_data_folder)
# 提取所有训练图像的特征
all_train_features = []
for image, label in zip(train_images, train_labels):
features,_ = preprocess_and_extract_features(image)
if features: # 确保至少有一个有效特征
all_train_features.append(features[0])
# 将特征转换为numpy数组
X_train = np.array(all_train_features)
y_train = np.array(train_labels)
# 创建并训练SVM分类器
svm_clf = SVC(kernel='rbf', C=1.0)
svm_clf.fit(X_train, y_train)
# 加载测试数据
test_images = []
test_image_names = []
# 遍历每个测试图像文件
for image_name in os.listdir(test_data_folder):
if not image_name.endswith('.png'):
continue
image_path = os.path.join(test_data_folder, image_name)
image = cv2.imread(image_path)
if image is not None:
test_images.append(image)
test_image_names.append(image_name)
# 提取所有测试图像的特征
all_test_features = []
binary_test_images = []
for image in test_images:
features, binary_image = preprocess_and_extract_features(image)
if features: # 确保至少有一个有效特征
all_test_features.append(features[0])
binary_test_images.append(binary_image)
# 检查是否有足够的测试样本
if len(all_test_features) == 0:
raise ValueError("No valid test samples found. Please check your test data.")
# 将特征转换为numpy数组
X_test = np.array(all_test_features)
# 预测
predicted_labels = svm_clf.predict(X_test)
# 保存处理后的识别结果图片和阈值分割结果
total_time = 0.0
for idx, binary_image in enumerate(binary_test_images):
start_time = time.perf_counter() # 使用 perf_counter 提高时间测量精度
# 使用预测的英文标签作为文件名
predicted_label_name = label_map.get(predicted_labels[idx])
end_time = time.perf_counter() # 使用 perf_counter 提高时间测量精度
elapsed_time = end_time - start_time
total_time += elapsed_time
output_filename_binary = f'{idx}_{predicted_label_name}_predicted.png'
output_path_binary = os.path.join(output_folder, output_filename_binary)
cv2.imwrite(output_path_binary, binary_image)
print(f"Image {idx} : Predicted Label: {predicted_label_name}, Elapsed Time: {elapsed_time:.6f} seconds")
# 性能评价:单张图片的平均识别速度
average_time_per_image = total_time / len(test_images) if test_images else 0
print(f"单张图片的平均识别速度: {average_time_per_image:.6f} 秒")
# 训练图片命名要求
# 0_0.png # 对应 'up'
# 0_1.png # 对应 'up'
# ...
# 1_1.png # 对应 'down'
# ...
# 2_2.png # 对应 'left'
# ...
# 3_3.png # 对应 'right'
# ...
# 4_4.png # 对应 'unknown'
# ...
# 测试片命名要求
# test_0.png
# test_1.png
# ...
(2)动态手势识别
import cv2
import numpy as np
from sklearn.svm import SVC
import os
def get_chain_code(img, threshold=200):
# 查找轮廓(使用所有点且保留层级关系)
contours, _ = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
# 方向编码字典(8方向链码)
direction_map = {
(1, 0): 0,
(1, 1): 1,
(0, 1): 2,
(-1, 1): 3,
(-1, 0): 4,
(-1, -1): 5,
(0, -1): 6,
(1, -1): 7
}
all_chain_codes = []
for contour in contours:
chain_code = []
n = len(contour)
for i in range(n):
# 获取当前点和下一个点
current = contour[i][0]
next_point = contour[(i + 1) % n][0] # 循环连接首尾
# 计算坐标差
dx = next_point[0] - current[0]
dy = next_point[1] - current[1]
# 验证是否为8邻域连接
if abs(dx) > 1 or abs(dy) > 1:
raise ValueError("轮廓点之间不是8邻域连接")
# 获取链码并加入列表
code = direction_map.get((dx, dy))
if code is None:
raise ValueError(f"无效方向向量: ({dx}, {dy})")
chain_code.append(code)
all_chain_codes.append(chain_code)
return all_chain_codes
def analyze_chain_code(chain_code):
# 1. 找到最长的连续相同方向码
max_length = 0
current_length = 1
max_code = None
for i in range(1, len(chain_code)):
if chain_code[i] == chain_code[i - 1]:
current_length += 1
else:
if current_length > max_length:
max_length = current_length
max_code = chain_code[i - 1]
current_length = 1
# 检查最后一个序列
if current_length > max_length:
max_code = chain_code[-1]
# 2. 统计180°方向变化的次数(间隔40-65个链码)
change_count = 0
i = 0
length = len(chain_code)
while i < length - 65:
current_code = chain_code[i]
opposite_code = (current_code + 4) % 8 # 计算180°反向码
# 检查后续链码中是否首次出现180°反向码
for j in range(i, min(i + 65, length)):
if chain_code[j] == opposite_code and j < i + 40:
i = j - 1
break
elif chain_code[j] == opposite_code and j >= i + 40:
change_count += 1
i = j - 1
break
i += 1
i = length - 65
while i < length:
current_code = chain_code[i]
opposite_code = (current_code + 4) % 8 # 计算180°反向码
# 检查后续链码中是否首次出现180°反向码
for j in range(i, i + 65):
if j > length:
j -= length
if chain_code[j] == opposite_code and j < i + 40 -length:
i = length
break
elif chain_code[j] == opposite_code and j >= i + 40 - length:
change_count += 1
i = length
break
if j < length:
if chain_code[j] == opposite_code and j - i < 40:
i = length
break
elif chain_code[j] == opposite_code and j - i >= 40:
change_count += 1
i = length
break
i += 1
return max_code,change_count
# 加载手势数据
def load_gesture_data(data_folder):
images = []
labels = []
label_map = {0: 'up', 1: 'down', 2: 'left', 3: 'right', 4: 'unknown'}
reverse_label_map = {'up': 'up', 'down': 'down', 'left': 'left', 'right': 'right', 'unknown': 'unknown'}
# 遍历每个图像文件
for image_name in os.listdir(data_folder):
if not image_name.endswith('.png'):
continue
image_path = os.path.join(data_folder, image_name)
image = cv2.imread(image_path)
if image is not None:
# 提取标签
label_str = image_name.split('_')[0]
try:
label = int(label_str)
if label in label_map:
images.append(image)
labels.append(label)
except ValueError:
continue
return images, labels, label_map, reverse_label_map
# 图像预处理和特征提取函数
def preprocess_and_extract_features(image):
# (1) 将RGB图像转换到YCrCb颜色空间,提取Cr分量图像
ycrcb_image = cv2.cvtColor(image, cv2.COLOR_BGR2YCrCb)
cr_channel = ycrcb_image[:, :, 1]
# (2) 对Cr分量进行高斯滤波
blurred_cr = cv2.GaussianBlur(cr_channel, (5, 5), 0)
# (3) 对高斯滤波后的Cr分量图像用Otsu法做二值化阈值分割处理
_, binary_image = cv2.threshold(blurred_cr, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# 形态学操作:膨胀和腐蚀
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (50, 50))
binary_image = cv2.morphologyEx(binary_image, cv2.MORPH_OPEN, kernel) # 开运算
binary_image = cv2.morphologyEx(binary_image, cv2.MORPH_CLOSE, kernel) # 闭运算
# 查找轮廓
contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
features = []
processed_image = image.copy()
for contour in contours:
# 过滤掉小轮廓
if cv2.contourArea(contour) < 100:
continue
# 获取外接矩形
rect = cv2.minAreaRect(contour)
box = cv2.boxPoints(rect)
box = np.int32(box)
chain_codes = get_chain_code(binary_image)
for _, code in enumerate(chain_codes):
max_code, change_count = analyze_chain_code(code)
features.append([max_code, change_count])
# 绘制外接矩形
cv2.drawContours(processed_image, [box], 0, (0, 255, 0), 2)
# 只取第一个有效的轮廓
break
return features,processed_image
# 数据文件夹路径
train_data_folder = r''
# 加载训练数据
train_images, train_labels, label_map, reverse_label_map = load_gesture_data(train_data_folder)
# 提取所有训练图像的特征
all_train_features = []
processed_train_images = []
for image, label in zip(train_images, train_labels):
features, processed_image = preprocess_and_extract_features(image)
if features: # 确保至少有一个有效特征
all_train_features.append(features[0])
processed_train_images.append(processed_image)
# 将特征转换为numpy数组
X_train = np.array(all_train_features)
y_train = np.array(train_labels)
# 创建并训练SVM分类器
svm_clf = SVC(kernel='rbf', C=1.0)
svm_clf.fit(X_train, y_train)
# 打开笔记本摄像头
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
# 预处理和特征提取
features, processed_frame = preprocess_and_extract_features(frame)
# 初始化预测结果
predicted_label_name = 'unknown'
if features:
# 预测
predicted_label = svm_clf.predict([features[0]])[0]
predicted_label_name = label_map.get(predicted_label, 'unknown')
# 显示识别结果
text_position = (10, 30)
font = cv2.FONT_HERSHEY_SIMPLEX
font_scale = 1
color = (0, 255, 0) if predicted_label_name != 'unknown' else (0, 0, 255)
thickness = 2
cv2.putText(processed_frame, f"Predicted Label: {predicted_label_name}", text_position, font, font_scale, color,
thickness)
# 检查是否为异常手势
if predicted_label_name == 'unknown':
anomaly_text_position = (10, 70)
cv2.putText(processed_frame, "Anomaly Gesture Detected!", anomaly_text_position, font, font_scale, (0, 0, 255),
thickness)
# 显示处理后的帧
cv2.imshow('Gesture Recognition', processed_frame)
# 按 'q' 键退出
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# 释放摄像头并关闭窗口
cap.release()
cv2.destroyAllWindows()
# 图片命名要求
# 0_0.png # 对应 'up'
# 0_1.png # 对应 'up'
# ...
# 1_1.png # 对应 'down'
# ...
# 2_2.png # 对应 'left'
# ...
# 3_3.png # 对应 'right'
# ...
# 4_4.png # 对应 'unknown'
# ...
4.性能评价
(1)准确率
静态手势识别运行结果显示,测试的29张图片中,有5张错误,预计总体准确率在80%左右。识别错误大多是手势较复杂的图片(部分如下):



动态手势识别运行结果显示,在复杂背景下无法正常识别到准确的手部位置,在单一暗色背景下对表示方向的手势识别准确率较高,但表现出明显的过拟合,对”握拳“和”张开手指“的手势识别结果不稳定。
(2)识别速度
静态手势识别单张图片的平均识别时间: 0.000002 秒,最大识别时间0.000004 秒。
动态手势识别速度同样很快,同时不断跳动改变的识别结果表明算法的鲁棒性较低。
5.总结
或许是我的算法思路存在问题,使用弗里曼链码进行手势识别并不是一个很好的方法,这是我第一时间想到的最简单的实现思路,但结果并不理想,权当作练手罢了。
更多推荐
所有评论(0)