perf 工具测量 cache 命中率

今天我们使用 perf 工具，实际测量 cache miss 的比率是多少，这将会使你更加深刻地了解 cache 及其对性能的影响。

Li-Yongjun

4978人浏览 · 2023-06-22 20:28:01

Li-Yongjun · 2023-06-22 20:28:01 发布

前言

通过之前的文章《缓存一致性》，我们知道， cache 的命中与否，对程序的性能影响非常大。这点在网络性能方面表现地更为强烈，如果要处理的数据包不在 cache 中，将极大地拉低吞吐量。
之前我们通过程序的运行，直观地感受到了由缓存一致性造成的 cache miss 引起的程序性能下降。今天我们使用 perf 工具，实际测量 cache miss 的比率是多少，这将会使你更加深刻地了解 cache 及其对性能的影响。

代码示例

代码还是原来的代码

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>

#define COUNT 1000000000

struct _t {
	// long p1, p2, p3, p4, p5, p6, p7;
	long x;
	// long p9, p10, p11, p12, p13, p14, p15;
};

struct _t a;
struct _t b;

void *test_thread1(void *arg)
{
	for (long i = 0; i < COUNT; i++)
        a.x = i;

	return NULL;
}

void *test_thread2(void *arg)
{
	for (long i = 0; i < COUNT; i++)
        b.x = i;

	return NULL;
}

int main(int argc, char *argv[])
{
	pthread_t test1_thread_t;
	pthread_t test2_thread_t;

	if (pthread_create(&test1_thread_t, NULL, test_thread1, "test_1_thread") != 0) {
		printf("test1_thread_t create error\n");
		exit(1);
	}

	if (pthread_create(&test2_thread_t, NULL, test_thread2, "test_2_thread") != 0) {
		printf("test2_thread_t create error\n");
		exit(1);
	}

	pthread_join(test1_thread_t, NULL);
	pthread_join(test2_thread_t, NULL);

	return EXIT_SUCCESS;
}

Makefile


CC=/home/liyongjun/project/board/buildroot/OrangePiPC/host/bin/arm-linux-gcc

TARGET=cacheline_not_fill
# TARGET=cacheline_fill

all:
	${CC} ${TARGET}.c -g -O0 -o ${TARGET}.out -Wall -l pthread

clean:
	rm *.out

tftp:
	cp ${TARGET}.out ~/tftp

运行

# ./perf stat -e cache-references -e cache-misses ./cacheline_not_fill.out

 Performance counter stats for './cacheline_not_fill.out':

       12005744527      cache-references
         986698086      cache-misses              #    8.219 % of all cache refs

      15.095276549 seconds time elapsed

      29.822868000 seconds user
       0.000000000 seconds sys

# ./perf stat -e cache-references -e cache-misses ./cacheline_fill.out

 Performance counter stats for './cacheline_fill.out':

       12005381835      cache-references
             63555      cache-misses              #    0.001 % of all cache refs

      13.942023631 seconds time elapsed

      27.839129000 seconds user
       0.000000000 seconds sys

没有缓存行填充的代码，cache-misses 达 8.219%，运行时长为 15s；
进行缓存行填充的代码，cache-misses 只有 0.001%，运行时长 13.9s。
程序的执行效率提高了 7.3%，这个提高要是放在网络吞吐量上是非常可观的。

总结

cache 命中率低将会严重影响程序性能、网络吞吐量等，因此写代码时应尽量避免程序 cache miss。可使用的方法如在《iCache && dCache》介绍的代码段按功能布局、预取、缓存行对齐等。并且可以使用 perf 工具实际测量缓存命中率。

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

UFW防火墙安全指南

UFW（Uncomplicated Firewall）是Ubuntu/Debian系统中简化防火墙管理的工具，通过直观命令帮助用户有效控制网络流量，提升系统安全性。文章详细介绍了UFW的基本命令，包括启停防火墙、添加规则、限制连接速率和日志配置等操作，并提供了安全最佳实践，如默认拒绝策略、IP地址限制和服务级规则管理。同时，还涵盖高级配置技巧，例如多网络接口设置、规则优先级调整、IPv6支持及与f