摘要 

设计基于传统机器学习的手势识别方案,通过YCrCb色彩空间转换、Otsu阈值分割和形态学处理实现手部区域分割,创新性地采用弗里曼链码分析提取手势方向特征和指尖形态特征

● 构建包含126张图像的手势数据库(含复杂背景/异常角度样本),开发SVM分类器实现静态手势识别(80%准确率,单帧处理时间<0.005ms),并基于OpenCV实现动态实时识别

● 设计异常手势检测机制,通过链码特征分析对握拳、五指张开等非指向性手势触发文字反馈。

1.项目要求

在自动视力检测中,手势识别是关键。

(1)设计手部分割、特征提取及识别方案。对于异常手势,要进行文字提示或语音反馈。说明:不允许使用深度学习手势识别方法;识别方法可采用传统的机器学习方法,如BP神经网络、线性判别方法等。
(2)建立小型手势图像库:至少120幅图像,其中80幅以上背景单一、角度正常(正负误差30度以内)的图像,20 幅以上背景复杂的图像,20幅以上角度异常或手势异常的图像。
(3)进行静态手势识别,进行性能评价(识别率及识别速度)。
(4)进行动态手势识别,进行性能评价。

2.实验过程

(1)关键处理部分

(2)识别依据

其中两个识别特征分量分别是(1)链码序列中最长的连续相同的方向码;(2)统计链码序列中任一方向码在其后第40个到第65个链码中存在方向相反的方向码这种情况的次数。

对(1),识别依据是手部的根部必然位于图像边缘,该截面获得的链码必然是连续相同的,可以作为判断方向的一个依据。当然,对于呈对角线方向的手部图像无法判断准确方向。

对(2),区分表示方向的手势和没有表示方向的手势(如握拳,张开五指),关键在识别手指的形状,假设表示方向时只伸出一根食指,则在链码序列中,食指的尖端边缘存在一定的规律,即方向码在经过手指宽度后发生180°的反转,对经过形态学处理简化后的手部轮廓来说,这种变化具有规律性。当然,识别的准确度很大程度取决于手指宽度的设置,手指的大小和到摄像头的距离都会使手指宽度发生变化,由于手势图像库的限制,这里限制为固定范围的宽度。

(3)手势图像库

在同学的协助下拍摄了126张图片作为手势图片库,并划分了97张作为训练数据集,29张作为测试数据集。

训练数据集:在四个方向上各有16张图片(包含3张带有复杂背景),还有18张“握拳”,15张“张开手指”作为“看不清(unknown)”的数据(包含3张带有复杂背景)。

测试数据集:在四个方向上各有5张图片(包含2张带有复杂背景),还有4张“握拳”(包含2张带有复杂背景),5张“张开手指”。

3.实现代码

注:为了节约时间,代码借助了生成式人工智能的帮助。

(1)静态手势识别

import cv2
import numpy as np
from sklearn.svm import SVC
import os
import time

def get_chain_code(img, threshold=200):

    # 查找轮廓(使用所有点且保留层级关系)
    contours, _ = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

    # 方向编码字典(8方向链码)
    direction_map = {
        (1, 0): 0,
        (1, 1): 1,
        (0, 1): 2,
        (-1, 1): 3,
        (-1, 0): 4,
        (-1, -1): 5,
        (0, -1): 6,
        (1, -1): 7
    }

    all_chain_codes = []

    for contour in contours:
        chain_code = []
        n = len(contour)

        for i in range(n):
            # 获取当前点和下一个点
            current = contour[i][0]
            next_point = contour[(i + 1) % n][0]  # 循环连接首尾

            # 计算坐标差
            dx = next_point[0] - current[0]
            dy = next_point[1] - current[1]

            # 验证是否为8邻域连接
            if abs(dx) > 1 or abs(dy) > 1:
                raise ValueError("轮廓点之间不是8邻域连接")

            # 获取链码并加入列表
            code = direction_map.get((dx, dy))
            if code is None:
                raise ValueError(f"无效方向向量: ({dx}, {dy})")
            chain_code.append(code)

        all_chain_codes.append(chain_code)

    return all_chain_codes


def analyze_chain_code(chain_code):

    # 1. 找到最长的连续相同方向码
    max_length = 0
    current_length = 1
    max_code = None

    for i in range(1, len(chain_code)):
        if chain_code[i] == chain_code[i - 1]:
            current_length += 1
        else:
            if current_length > max_length:
                max_length = current_length
                max_code = chain_code[i - 1]
            current_length = 1

    # 检查最后一个序列
    if current_length > max_length:
        max_code = chain_code[-1]

    # 2. 统计180°方向变化的次数(间隔40-65个链码)
    change_count = 0
    i = 0
    length = len(chain_code)
    while i < length - 65:
        current_code = chain_code[i]
        opposite_code = (current_code + 4) % 8  # 计算180°反向码
        # 检查后续链码中是否首次出现180°反向码
        for j in range(i, min(i + 65, length)):
            if chain_code[j] == opposite_code and j < i + 40:
                i = j - 1
                break
            elif chain_code[j] == opposite_code and j >= i + 40:
                change_count += 1
                i =  j - 1
                break
        i += 1
    i = length - 65
    while i < length:
        current_code = chain_code[i]
        opposite_code = (current_code + 4) % 8  # 计算180°反向码
        # 检查后续链码中是否首次出现180°反向码
        for j in range(i, i + 65):
            if j > length:
                j -= length
                if chain_code[j] == opposite_code and j < i + 40 -length:
                    i = length
                    break
                elif chain_code[j] == opposite_code and j >= i + 40 - length:
                    change_count += 1
                    i = length
                    break
            if j < length:
                if chain_code[j] == opposite_code and j - i < 40:
                    i = length
                    break
                elif chain_code[j] == opposite_code and j - i >= 40:
                    change_count += 1
                    i = length
                    break
        i += 1
    return max_code,change_count


# 加载手势数据
def load_gesture_data(data_folder):
    images = []
    labels = []
    label_map = {0: 'up', 1: 'down', 2: 'left', 3: 'right', 4: 'unknown'}
    reverse_label_map = {'up': 'up', 'down': 'down', 'left': 'left', 'right': 'right', 'unknown': 'unknown'}

    # 遍历每个图像文件
    for image_name in os.listdir(data_folder):
        if not image_name.endswith('.png'):
            continue

        image_path = os.path.join(data_folder, image_name)
        image = cv2.imread(image_path)
        if image is not None:
            # 提取标签
            label_str = image_name.split('_')[0]
            try:
                label = int(label_str)
                if label in label_map:
                    images.append(image)
                    labels.append(label)
            except ValueError:
                continue

    return images, labels, label_map, reverse_label_map


# 图像预处理和特征提取函数
def preprocess_and_extract_features(image):
    
    # (1) 将RGB图像转换到YCrCb颜色空间,提取Cr分量图像
    ycrcb_image = cv2.cvtColor(image, cv2.COLOR_BGR2YCrCb)
    cr_channel = ycrcb_image[:, :, 1]

    # (2) 对Cr分量进行高斯滤波
    blurred_cr = cv2.GaussianBlur(cr_channel, (5, 5), 0)

    # (3) 对高斯滤波后的Cr分量图像用Otsu法做二值化阈值分割处理
    _, binary_image = cv2.threshold(blurred_cr, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

    # 形态学操作:膨胀和腐蚀
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (50, 50))
    binary_image = cv2.morphologyEx(binary_image, cv2.MORPH_OPEN, kernel)  # 开运算
    binary_image = cv2.morphologyEx(binary_image, cv2.MORPH_CLOSE, kernel)  # 闭运算

    features=[]
    chain_codes = get_chain_code(binary_image)
    for _, code in enumerate(chain_codes):
        max_code, change_count = analyze_chain_code(code)
        features.append([max_code, change_count])
    return features,binary_image


# 创建输出文件夹
output_folder = r''
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# 数据文件夹路径
train_data_folder = r''
test_data_folder = r''

# 加载训练数据
train_images, train_labels, label_map, reverse_label_map = load_gesture_data(train_data_folder)

# 提取所有训练图像的特征
all_train_features = []

for image, label in zip(train_images, train_labels):
    features,_ = preprocess_and_extract_features(image)
    if features:  # 确保至少有一个有效特征
        all_train_features.append(features[0])

# 将特征转换为numpy数组
X_train = np.array(all_train_features)
y_train = np.array(train_labels)

# 创建并训练SVM分类器
svm_clf = SVC(kernel='rbf', C=1.0)
svm_clf.fit(X_train, y_train)

# 加载测试数据
test_images = []
test_image_names = []

# 遍历每个测试图像文件
for image_name in os.listdir(test_data_folder):
    if not image_name.endswith('.png'):
        continue

    image_path = os.path.join(test_data_folder, image_name)
    image = cv2.imread(image_path)
    if image is not None:
        test_images.append(image)
        test_image_names.append(image_name)

# 提取所有测试图像的特征
all_test_features = []
binary_test_images = []

for image in test_images:
    features, binary_image = preprocess_and_extract_features(image)
    if features:  # 确保至少有一个有效特征
        all_test_features.append(features[0])
        binary_test_images.append(binary_image)

# 检查是否有足够的测试样本
if len(all_test_features) == 0:
    raise ValueError("No valid test samples found. Please check your test data.")

# 将特征转换为numpy数组
X_test = np.array(all_test_features)

# 预测
predicted_labels = svm_clf.predict(X_test)

# 保存处理后的识别结果图片和阈值分割结果
total_time = 0.0

for idx, binary_image in enumerate(binary_test_images):
    start_time = time.perf_counter()  # 使用 perf_counter 提高时间测量精度

    # 使用预测的英文标签作为文件名
    predicted_label_name = label_map.get(predicted_labels[idx])

    end_time = time.perf_counter()  # 使用 perf_counter 提高时间测量精度
    elapsed_time = end_time - start_time
    total_time += elapsed_time

    output_filename_binary = f'{idx}_{predicted_label_name}_predicted.png'

    output_path_binary = os.path.join(output_folder, output_filename_binary)

    cv2.imwrite(output_path_binary, binary_image)

    print(f"Image {idx} : Predicted Label: {predicted_label_name}, Elapsed Time: {elapsed_time:.6f} seconds")

# 性能评价:单张图片的平均识别速度
average_time_per_image = total_time / len(test_images) if test_images else 0
print(f"单张图片的平均识别速度: {average_time_per_image:.6f} 秒")

# 训练图片命名要求
#     0_0.png  # 对应 'up'
#     0_1.png  # 对应 'up'
#     ...
#     1_1.png  # 对应 'down'
#     ...
#     2_2.png  # 对应 'left'
#     ...
#     3_3.png  # 对应 'right'
#     ...
#     4_4.png  # 对应 'unknown'
#     ...
# 测试片命名要求
#   test_0.png
#   test_1.png
#   ...

(2)动态手势识别

import cv2
import numpy as np
from sklearn.svm import SVC
import os

def get_chain_code(img, threshold=200):

    # 查找轮廓(使用所有点且保留层级关系)
    contours, _ = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

    # 方向编码字典(8方向链码)
    direction_map = {
        (1, 0): 0,
        (1, 1): 1,
        (0, 1): 2,
        (-1, 1): 3,
        (-1, 0): 4,
        (-1, -1): 5,
        (0, -1): 6,
        (1, -1): 7
    }

    all_chain_codes = []

    for contour in contours:
        chain_code = []
        n = len(contour)

        for i in range(n):
            # 获取当前点和下一个点
            current = contour[i][0]
            next_point = contour[(i + 1) % n][0]  # 循环连接首尾

            # 计算坐标差
            dx = next_point[0] - current[0]
            dy = next_point[1] - current[1]

            # 验证是否为8邻域连接
            if abs(dx) > 1 or abs(dy) > 1:
                raise ValueError("轮廓点之间不是8邻域连接")

            # 获取链码并加入列表
            code = direction_map.get((dx, dy))
            if code is None:
                raise ValueError(f"无效方向向量: ({dx}, {dy})")
            chain_code.append(code)

        all_chain_codes.append(chain_code)

    return all_chain_codes


def analyze_chain_code(chain_code):

    # 1. 找到最长的连续相同方向码
    max_length = 0
    current_length = 1
    max_code = None

    for i in range(1, len(chain_code)):
        if chain_code[i] == chain_code[i - 1]:
            current_length += 1
        else:
            if current_length > max_length:
                max_length = current_length
                max_code = chain_code[i - 1]
            current_length = 1

    # 检查最后一个序列
    if current_length > max_length:
        max_code = chain_code[-1]

    # 2. 统计180°方向变化的次数(间隔40-65个链码)
    change_count = 0
    i = 0
    length = len(chain_code)
    while i < length - 65:
        current_code = chain_code[i]
        opposite_code = (current_code + 4) % 8  # 计算180°反向码
        # 检查后续链码中是否首次出现180°反向码
        for j in range(i, min(i + 65, length)):
            if chain_code[j] == opposite_code and j < i + 40:
                i = j - 1
                break
            elif chain_code[j] == opposite_code and j >= i + 40:
                change_count += 1
                i =  j - 1
                break
        i += 1
    i = length - 65
    while i < length:
        current_code = chain_code[i]
        opposite_code = (current_code + 4) % 8  # 计算180°反向码
        # 检查后续链码中是否首次出现180°反向码
        for j in range(i, i + 65):
            if j > length:
                j -= length
                if chain_code[j] == opposite_code and j < i + 40 -length:
                    i = length
                    break
                elif chain_code[j] == opposite_code and j >= i + 40 - length:
                    change_count += 1
                    i = length
                    break
            if j < length:
                if chain_code[j] == opposite_code and j - i < 40:
                    i = length
                    break
                elif chain_code[j] == opposite_code and j - i >= 40:
                    change_count += 1
                    i = length
                    break
        i += 1
    return max_code,change_count
# 加载手势数据
def load_gesture_data(data_folder):
    images = []
    labels = []
    label_map = {0: 'up', 1: 'down', 2: 'left', 3: 'right', 4: 'unknown'}
    reverse_label_map = {'up': 'up', 'down': 'down', 'left': 'left', 'right': 'right', 'unknown': 'unknown'}

    # 遍历每个图像文件
    for image_name in os.listdir(data_folder):
        if not image_name.endswith('.png'):
            continue

        image_path = os.path.join(data_folder, image_name)
        image = cv2.imread(image_path)
        if image is not None:
            # 提取标签
            label_str = image_name.split('_')[0]
            try:
                label = int(label_str)
                if label in label_map:
                    images.append(image)
                    labels.append(label)
            except ValueError:
                continue

    return images, labels, label_map, reverse_label_map


# 图像预处理和特征提取函数
def preprocess_and_extract_features(image):
    # (1) 将RGB图像转换到YCrCb颜色空间,提取Cr分量图像
    ycrcb_image = cv2.cvtColor(image, cv2.COLOR_BGR2YCrCb)
    cr_channel = ycrcb_image[:, :, 1]

    # (2) 对Cr分量进行高斯滤波
    blurred_cr = cv2.GaussianBlur(cr_channel, (5, 5), 0)

    # (3) 对高斯滤波后的Cr分量图像用Otsu法做二值化阈值分割处理
    _, binary_image = cv2.threshold(blurred_cr, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

    # 形态学操作:膨胀和腐蚀
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (50, 50))
    binary_image = cv2.morphologyEx(binary_image, cv2.MORPH_OPEN, kernel)  # 开运算
    binary_image = cv2.morphologyEx(binary_image, cv2.MORPH_CLOSE, kernel)  # 闭运算

    # 查找轮廓
    contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    features = []
    processed_image = image.copy()

    for contour in contours:
        # 过滤掉小轮廓
        if cv2.contourArea(contour) < 100:
            continue

        # 获取外接矩形
        rect = cv2.minAreaRect(contour)
        box = cv2.boxPoints(rect)
        box = np.int32(box)

        chain_codes = get_chain_code(binary_image)
        for _, code in enumerate(chain_codes):
            max_code, change_count = analyze_chain_code(code)
            features.append([max_code, change_count])


        # 绘制外接矩形
        cv2.drawContours(processed_image, [box], 0, (0, 255, 0), 2)

        # 只取第一个有效的轮廓
        break

    return features,processed_image

# 数据文件夹路径
train_data_folder = r''

# 加载训练数据
train_images, train_labels, label_map, reverse_label_map = load_gesture_data(train_data_folder)

# 提取所有训练图像的特征
all_train_features = []
processed_train_images = []

for image, label in zip(train_images, train_labels):
    features, processed_image = preprocess_and_extract_features(image)
    if features:  # 确保至少有一个有效特征
        all_train_features.append(features[0])
        processed_train_images.append(processed_image)

# 将特征转换为numpy数组
X_train = np.array(all_train_features)
y_train = np.array(train_labels)

# 创建并训练SVM分类器
svm_clf = SVC(kernel='rbf', C=1.0)
svm_clf.fit(X_train, y_train)

# 打开笔记本摄像头
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # 预处理和特征提取
    features, processed_frame = preprocess_and_extract_features(frame)

    # 初始化预测结果
    predicted_label_name = 'unknown'

    if features:
        # 预测
        predicted_label = svm_clf.predict([features[0]])[0]
        predicted_label_name = label_map.get(predicted_label, 'unknown')

    # 显示识别结果
    text_position = (10, 30)
    font = cv2.FONT_HERSHEY_SIMPLEX
    font_scale = 1
    color = (0, 255, 0) if predicted_label_name != 'unknown' else (0, 0, 255)
    thickness = 2

    cv2.putText(processed_frame, f"Predicted Label: {predicted_label_name}", text_position, font, font_scale, color,
                thickness)

    # 检查是否为异常手势
    if predicted_label_name == 'unknown':
        anomaly_text_position = (10, 70)
        cv2.putText(processed_frame, "Anomaly Gesture Detected!", anomaly_text_position, font, font_scale, (0, 0, 255),
                    thickness)

    # 显示处理后的帧
    cv2.imshow('Gesture Recognition', processed_frame)

    # 按 'q' 键退出
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放摄像头并关闭窗口
cap.release()
cv2.destroyAllWindows()

# 图片命名要求
#     0_0.png  # 对应 'up'
#     0_1.png  # 对应 'up'
#     ...
#     1_1.png  # 对应 'down'
#     ...
#     2_2.png  # 对应 'left'
#     ...
#     3_3.png  # 对应 'right'
#     ...
#     4_4.png  # 对应 'unknown'
#     ...

4.性能评价

(1)准确率

静态手势识别运行结果显示,测试的29张图片中,有5张错误,预计总体准确率在80%左右。识别错误大多是手势较复杂的图片(部分如下):

错误识别为“unknown”
错误识别为“left
错误识别为”right“

 动态手势识别运行结果显示,在复杂背景下无法正常识别到准确的手部位置,在单一暗色背景下对表示方向的手势识别准确率较高,但表现出明显的过拟合,对”握拳“和”张开手指“的手势识别结果不稳定。

(2)识别速度

静态手势识别单张图片的平均识别时间: 0.000002 秒,最大识别时间0.000004 秒。

动态手势识别速度同样很快,同时不断跳动改变的识别结果表明算法的鲁棒性较低。

5.总结

或许是我的算法思路存在问题,使用弗里曼链码进行手势识别并不是一个很好的方法,这是我第一时间想到的最简单的实现思路,但结果并不理想,权当作练手罢了。

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐