今天用原子的767板子测试了一下运行速度,不太如意,跟写的462DMIPS还是有比较大差距。请各位大神解惑: 测试中,做了一个约30万次的循环,汇编中共使用了27个指令(已包含循环本身的指令);总的耗时34ms左右,这算出来的大概只有56%(462DMIPS)左右的性能。分析和疑问如下: 27个指令中,有7个LDR加载指令,可能是加载内存数据时间较长拖慢计算时间,于是做了以下测试: 测试1.对比cache开和不开的差别,发现没差别(就算把指令改成从固定内存中拿出来计算,开和不开还是没差别) 测试2.对比片内SRAM和SDRAM差别,片内SRAM只有3%的性能优势。 测试3.对比片内SRAM和DTCM(程序有所修改,因为DTCM只有128KB),结果是基本没差别。 问题:(上面说没啥差别指的是差别在1%以内) 1.测试1为啥没差别? 2.测试2差别偏少啊。 3.测试3为啥没差别?手册上宣传Cache,TCM的加入,使得f7有明显的性能提升,但用跟不用一个样。 4.从汇编来看,使用了已经是最简单的“加、减、比较、跳转”,乘、除都没有,更没有浮点运算,为啥只能达到462DMIPS的56%这么少? 以下贴出具体的时间: 0x08005A92 E022 B 0x08005ADA 155: for(j=1;j<640;j++) 156: { 0x08005A94 2601 MOVS r6,#0x01 0x08005A96 E01C B 0x08005AD2 157: pt=&image[0]; //1.5ms 0x08005A98 4C66 LDR r4,[pc,#408] ; @0x08005C34 158: pt+=i*640+j-640; //7ms 159: 0x08005A9A EB050085 ADD r0,r5,r5,LSL #2 0x08005A9E EB0610C0 ADD r0,r6,r0,LSL #7 0x08005AA2 F5A07020 SUB r0,r0,#0x280 0x08005AA6 4404 ADD r4,r4,r0 160: k=*(pt-1) + *pt * 2+ *(pt+1); //11ms 161: 0x08005AA8 F8141C01 LDRB r1,[r4,#-0x01] 0x08005AAC 7820 LDRB r0,[r4,#0x00] 0x08005AAE EB010040 ADD r0,r1,r0,LSL #1 0x08005AB2 7861 LDRB r1,[r4,#0x01] 0x08005AB4 1847 ADDS r7,r0,r1 162: pt+=1280; 0x08005AB6 F50464A0 ADD r4,r4,#0x500 163: f=*(pt-1) + *pt * 2 + *(pt+1); //11ms 164: 165: 0x08005ABA F8141C01 LDRB r1,[r4,#-0x01] 0x08005ABE 7820 LDRB r0,[r4,#0x00] 0x08005AC0 EB010040 ADD r0,r1,r0,LSL #1 0x08005AC4 7861 LDRB r1,[r4,#0x01] 0x08005AC6 EB000A01 ADD r10,r0,r1 166: k+=f-k; //1.5ms 167: 188: } 189: } 0x08005ACA EBAA0007 SUB r0,r10,r7 0x08005ACE 4407 ADD r7,r7,r0 0x08005AD0 1C76 ADDS r6,r6,#1 0x08005AD2 F5B67F20 CMP r6,#0x280 0x08005AD6 DBDF BLT 0x08005A98 153: for(i=1;i<480-1;i++) 0x08005AD8 1C6D ADDS r5,r5,#1 0x08005ADA F5B57FEF CMP r5,#0x1DE 0x08005ADE DDD9 BLE 0x08005A94 |
就是下面这个呀,性能最好的就是TCM,这个要在复杂的系统上才会有明显的区别,我们cache这么大,coremark都直接放到cache了,都不涉及到内容切换.所以实际应用要根据应用将不同的程序,提高系统设计的合理性,放在不同的地方,这样可以提高很多.
https://www.stmcu.org.cn/document/download/index/id-212351
论坛就有呀,你搜索"coremark",就有如何将coremark程序移植到STM32上.pdf
评分
查看全部评分
论坛就有呀,你搜索"coremark",就有如何将coremark程序移植到STM32上.pdf
下面是测试结果.
----------------------------------------------------
Config.: 1 - Exec in Ram ITCM - Data in DTCM
----------------------------------------------------
System frequency: 200MHz
-------------------------------------------
| Flash WS | ART | D-cache | I-cache |
+-----------+---------+---------+---------+
| NA | OFF | OFF | OFF |
-------------------------------------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 11952016
Total time (secs): 11.952016
Iterations/Sec : 1004.014720
Iterations : 12000
Compiler version : uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05
Compiler flags : -O3 Otime
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 1004.014720 / uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05 -O3 Otime / STACK
----------------------------------------------------
Config.: 2 - Exec in Flash AXI - Data in DTCM
----------------------------------------------------
System frequency: 200MHz
-------------------------------------------
| Flash WS | ART | D-cache | I-cache |
+-----------+---------+---------+---------+
| 6 | OFF | OFF | ON |
-------------------------------------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 12156203
Total time (secs): 12.156203
Iterations/Sec : 987.150346
Iterations : 12000
Compiler version : uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05
Compiler flags : -O3 Otime
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 987.150346 / uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05 -O3 Otime / STACK
----------------------------------------------------
Config.: 3 - Exec in Flash AXI - Data in SRAM1
----------------------------------------------------
System frequency: 200MHz
-------------------------------------------
| Flash WS | ART | D-cache | I-cache |
+-----------+---------+---------+---------+
| 6 | OFF | ON | ON |
-------------------------------------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 12218410
Total time (secs): 12.218410
Iterations/Sec : 982.124515
Iterations : 12000
Compiler version : uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05
Compiler flags : -O3 Otime
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 982.124515 / uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05 -O3 Otime / STACK
----------------------------------------------------
Config.: 4 - Exec in Flash ITCM - Data in DTCM
----------------------------------------------------
System frequency: 200MHz
-------------------------------------------
| Flash WS | ART | D-cache | I-cache |
+-----------+---------+---------+---------+
| 6 | ON | OFF | OFF |
-------------------------------------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 12127302
Total time (secs): 12.127302
Iterations/Sec : 989.502859
Iterations : 12000
Compiler version : uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05
Compiler flags : -O3 Otime
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 989.502859 / uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05 -O3 Otime / STACK
----------------------------------------------------
Config.: 5 - Exec in External SRAM - Data in DTCM
----------------------------------------------------
System frequency: 200MHz
-------------------------------------------
| Flash WS | ART | D-cache | I-cache |
+-----------+---------+---------+---------+
| NA | OFF | OFF | ON |
-------------------------------------------
FMC SRAM config:
-----------------
| Mem Bus width |
+---------------+
| 16 |
-----------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 12984019
Total time (secs): 12.984019
Iterations/Sec : 924.213065
Iterations : 12000
Compiler version : uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05
Compiler flags : -O3 Otime
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 924.213065 / uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05 -O3 Otime / STACK
----------------------------------------------------
Config.: 6 - Exec in External SDRAM swapped - Data in DTCM
----------------------------------------------------
System frequency: 200MHz
-------------------------------------------
| Flash WS | ART | D-cache | I-cache |
+-----------+---------+---------+---------+
| NA | OFF | OFF | ON |
-------------------------------------------
FMC SDRAM config:
-----------------
| Mem Bus width |
+---------------+
| 32 |
-----------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 12325498
Total time (secs): 12.325498
Iterations/Sec : 973.591493
Iterations : 12000
Compiler version : uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05
Compiler flags : -O3 Otime
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 973.591493 / uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05 -O3 Otime / STACK
----------------------------------------------------
Config.: 7 - Exec in External SDRAM not swapped - Data in DTCM
----------------------------------------------------
System frequency: 200MHz
-------------------------------------------
| Flash WS | ART | D-cache | I-cache |
+-----------+---------+---------+---------+
| NA | OFF | OFF | ON |
-------------------------------------------
FMC SDRAM config:
-----------------
| Mem Bus width |
+---------------+
| 32 |
-----------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 64608165
Total time (secs): 64.608165
Iterations/Sec : 185.735038
Iterations : 12000
Compiler version : uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05
Compiler flags : -O3 Otime
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 185.735038 / uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05 -O3 Otime / STACK
----------------------------------------------------
Config.: 8 - Exec in QuadSPI Flash - Data in DTCM
----------------------------------------------------
System frequency: 200MHz
-------------------------------------------
| Flash WS | ART | D-cache | I-cache |
+-----------+---------+---------+---------+
| NA | OFF | OFF | ON |
-------------------------------------------
QSPI config:
| Prescaler | QSPI CLK | DDRMODE | Inst. lines nb |
+-----------+----------+---------+-----------------+
| 3 | 50MHz | ON | 1 |
----------------------------------------------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 14840884
Total time (secs): 14.840884
Iterations/Sec : 808.577171
Iterations : 12000
Compiler version : uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05
Compiler flags : -O3 Otime
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 808.577171 / uVision 5.12.00 - RealView MDK-ARM V5.12 - ARMCCV5.05 -O3 Otime / STACK
点评
点评
评分
查看全部评分
点评
点评
343119498@qq.com
发我我来跑一下,谢谢。看你的结果貌似就是Cache、DTCM等,性能改善只有很少啊。
嗯,虽然数据让我有点失望,我是打算用h7或mx.rt来做一下图像识别的,识别的东西很简单,不过需要20ms内完成,看了这结果估计没戏了,rt也最多提升3倍速度,做一次运算也10多ms了,没空间留给识别算法了。
不过,谢谢你的答复。