2023年计算机系统结构实验报告.pdf-淘文阁

资源描述

《2023年计算机系统结构实验报告.pdf》由会员分享，可在线阅读，更多相关《2023年计算机系统结构实验报告.pdf（11页珍藏版）》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、计算机系统结构实验报告一.流水线中的相关实验目的：1.纯熟掌握WinDLX模拟器的操作和使用，熟悉DLX指令集结构及其特点；2.加深对计算机流水线基本概念的理解；3.进一步了解DLX基本流水线各段的功能以及基本操作；4.加深对数据相关、结构相关的理解，了解这两类相关对CPU性能的影响；5.了解解决数据相关的方法,掌握如何使用定向技术来减少数据相关带来的暂停。实验平台：WinDLX模拟器实验内容和环节：1.用 W i nDLX模拟器执行下列三个程序：求阶乘程序fac t.s 求最大公倍数程序gem.s 求素数程序pr i m.s分别以步进、连续、设立断点的方式运营程序,观测程序在流水线中的执行情

2、况，观测CPU中寄存器和存储器的内容。纯熟掌握WinDLX的操作和使用。2.用W i n DLX运营程序s t ruct u re_ d.s,通过模拟找出存在资源相关的指令对以及导致资源相关的部件；记录由资源相关引起的暂停时钟周期数，计算暂停时钟周期数占总执行周期数的比例;论述资源相关对CPU性能的影响,讨论解决资源相关的方法。3.在不采用定向技术的情况下(去掉C o n f ig u ra t i o n 菜单中E nable Fo r w a r d ing选项前的勾选符)，用WinDLX运营程序d a ta _ d.s。记录数据相关引起的暂停时钟周期数以及程序执行的总时钟周期数，计算暂停

3、时钟周期数占总执行周期数的比例。在采用定向技术的情况下(勾选E A a6/e Forward i ng),用WinDLX再次运营程序d a ta _ d.s 反复上述3中的工作，并计算采用定向技术后性能提高的倍数。1.求阶乘程序用W in D L X模拟器执行求阶乘程序fa c t.s。这个程序说明浮点指令的使用。该程序从标准输入读入一个整数，求其阶乘，然后将结果输出。该程序中调用了 i n p u t.s中的输入子程序，这个子程序用于读入正整数。实验结果：在载入f a ct.s和in p u t.s之后，不设立任何断点运营。a.不采用重新定向技术，我们得到的结果Total:236 Cycle

4、(s)executed.ID executed by 145 Instruction(s).2 Instruction currently in Pipeline.Stalls:RAW stalls:53(22.46%of all Cycles)WAW stalls:0(0.00 of all Cycles)Structural stalls:0(0.00 of all Cycles)Control stalls:25(10.59%of all Cycles)T rap stalls:12 5.08-of all Cycles)T otal:90 Stall(s)(38.14 of all C

5、ycles)b.采用定向技术，我们得到的结果：Total:215 Cycle(s)executed.ID executed by 145 Instruction(s).2 Instruction(s)currently in Pipeline.Stalls:RAW stalls:17(7.91%of all Cycles),thereof:LD stalls:3(17.65%of RAW stalls)Branch/Jump stalls:3 07.65%of RAW stalls)Floating point stalls:11(64.70%of RAW stalls)WAW stalls:

6、0(0.00 of all Cycles)Structural stalls:0(0.00 of all Cycles)Control stalls:25 01.63%of all Cycles)Trap stalls:12(5.58%of all Cycles)Total:54 StM国(25.12%of all Cycles)从上面的数据我们可以看出定向的作用：在定向技术存在的情况下S ta ti s t ie s窗口中的各种记录数字：总的周期数(2 1 5)和暂停数(1 7 RAW,25 C o n t r o l,l2 T ra p ;5 4 T o t a 1 )在定向技术不存在

7、时候，控制暂停和T r a p暂停仍然是同样的值，而R A W暂停从1 7变成了 5 3,总的模拟周期数增长到236。所以定向技术带来的加速比：2 3 6/2 1 5 =1.098DLXforwarded 比 DLXnot forwarded快 9.8%o2.数据相关先给出一个存在数据相关的程序：LH IR 2,(A 16)&OxFF FFADDUI R2,R2,A&OxFFFFL H IR 3,(B 1 6)&0 xFFFFADDUI R 3,R 3,B&OxFFFFloo p:LW R l,0 (R2)ADD RI,RI,R3SW 0(R2),RILW R5,O(R1)A DDI R5,R

8、5,#10ADD I R 2,R2,#4SUB R 4,R 3,R2BNEZ R 4,lo opTRAP#0A：,wo r d 0,4,8,12,16,20,2 4,28,3 2,36B：.word 9,8,7,6,5,4,3,2,1,0没有采用定向技术时运营该程序：得到F o ta l:202 Cycle($)executed.ID executed by 85 Instruction2 Instruction currently in Pipeline.S t a ll s:RAW stalls:104(51.48%of all Cycles)WAW stalls:0(0.00 of al

9、l Cycles)Structural stalls:0(0.00-of all Cycles)Control stalls:9(4.46%of all Cycles)Trap stalls:3(1.48%of all Cycles)Total:116 St典)(57.42%of all Cycles)程序执行了 2 0 2个周期,1 0个数据相关引起的时钟周期R A W st a 11为10 4个。暂停时钟周期数占总执行周期数的比例=5 1.4 8%采用定向技术时运营该程序:得到Total:128 Cycle(s)executed.ID executed by 85 Instruction(

10、s).2 lnstruction($j currently in Pipeline.Stalls:RAW stalls:30(23.44%of all Cycles),thereof:LD stalls:20 66.67%of RAW stalls)Branch/Jump stalls:10(33.33 of RAW stalls)Floating point stalls:0(0.00 of RAW stalls)WAW stalls:0(0.00%of all Cycles)Structural stalls:0(0.00 of all Cycles)Control stalls:9(7.

11、03%of all Cycles)Trap stalls:3(2.34%of all Cycles)T otal:42 Stall($)(32.81%of all Cycles)程序执行了 1 2 8个周期，共有6个数据相关引起的时钟周期R A W s t al l为3 0个。暂停时钟周期数占总执行周期数的比例=2 3.4 4%可见通过定向技术，减少了数据相关，缩短了程序的执行周期，整个性能为本来的1.5 7倍。3.结构相关下面这段程序存在结构相关A D D I R 5,R 5,1S U B IR 4,R 4,1A N D R 3,R 3 ,R3oXOR R 7,R 7,R 7oA D D I

12、 R 8,R8,1o A D D IR 9 ,R 9,1M U L T R I,R5,R4M U L T R2,R 3,R 7执行之后得到的c loc k cycle pro g r a meI nstiuctions I Cyclesadd r5,r5,0 x1$ubi r4j 4,0 x1and(3J3J3xor r7j7j7add r8zr8,0 x1add r3r9,0 x1mult11/5/4mult 2r 3/7：-19,T8,-17,-16,-15,-14,-13,-12,-11,-10,-9,-8,-7,-6,-5,-4-3,-2-1 ,0MEM|W B|St a t isti

13、cs:l o ta l:20 Cycle(5)executed.ID executed by 14 Instruction5 Instruction(s)currently in Pipeline.S t a ll s:RAW stalls:0(0.00%of all Cycles),thereof:LD stalls:0(0.00 of RAW stalls)Branch/Jump stalls:0(0.00%of RAW stalls)Floating point stalls:0(0.00%of RAW stalls)WAW stalls:0(0.00 of all Cycles)Str

14、uctural stalls:4(20.00 of all Cycles)Control stalls:0(0.00 of all Cycles)Trap stalls:0(0.00%of all Cycles)Total:4 Stall($)(20.00%of all Cycles)可见 1 个结构相关引起了 4 个 stall,占总共20个CY CLE的 20%为了避免结构相关，可以考虑采用资源反复的方法，比如,在流水线机器中设立互相独立的指令存储器和数据存储器，也可以将CACHE分割成指令CACHE和数据CACH E。二.循环展开及指令调度实验目的：1.加深对循环级并行性、指令调度技术

15、、循环展开技术以及寄存器换名技术的理解；2.熟悉用指令调度技术来解决流水线中的数据相关的方法；3.了解循环展开、指令调度等技术对C PU性能的改善。实验平台：WinDLX模拟器实验内容和环节：1.用指令调度技术解决流水线中的结构相关与数据相关(1)用DLX汇编语言编写代码文献*.s,程序中应涉及数据相关与结构相关(假设:加法、乘法、除法部件各有2个,延迟时间都是3个时钟周期)(2)通过Co”figizra/。菜单中的“尸a g p。i nt stages”选项，把加法、乘法、除法部件的个数设立为2 个,把延迟都设立为3个时钟周期；(3)用Win DLX运营程序。记录程序执行过程中各种相关发生的

16、次数、发生相关的指令组合，以及程序执行的总时钟周期数；(4)采用指令调度技术对程序进行指令调度,消除相关；(5)用WinDLX运营调度后的程序,观测程序在流水线中的执行情况,记录程序执行的总时钟周期数；(6)根据记录结果，比较调度前和调度后的性能。论述指令调度对于提高C PU 性能的意义。2.用循环展开、寄存器换名以及指令调度提高性能(1)用 DLX汇编语言编写代码文献*.s,程序中包含一个循环次数为4的整数倍的简朴循环；(2)fflW i n DLX运营该程序。记录执行过程中各种相关发生的次数以及程序执行的总时钟周期数；(3)将循环展开3次,将4个循环体组成的代码代替本来的循环体，并对程序做

17、相应的修改。然后对新的循环体进行寄存器换名和指令调度；(4)用WinDLX运营修改后的程序,记录执行过程中各种相关发生的次数以及程序执行的总时钟周期数；(5)根据记录结果，比较循环展开、指令调度前后的性能。3)存在相关的程序1.指令调度：一方面，通过 J g u ra t 菜单中的“F/oa t in g po/nr sf ages”选项，把除法单元数设立为3,把加法、乘法、除法的延迟设立为3 个时钟周期。给出调度前的程序sch_bef:.data.g lobal O NEO N E：.word 1.text.global m a inmain:1 f fl,ONE；tu r n d i vf

18、 i nto a m o v ecvt i 2ff7,f 1 ;b y s t oring in f7 1 innop;f 1 o a t ing-p o int f o rmatd iv f fl,f 8,f 7;mo v e Y=(f8)into fldivf f 2,f 9,f7;mo v e Z=(f 9)into f2addf f3,f l,f2d i v f f10,f3,f7;move f3 into X=(f 10)divf f4,fll,f 7;move B=(fll)i nto f 4d i vf f5,fl2,f7;move C=(f 1 2)in t o f 5mu 1

19、 tf f6,f 4,f5d i v f fl 3,f 6,f7;m o ve f 6 into A=(fl3)F in i sh:t r apO运营之后可以得到结果：Total:27 Cycle(s)executed.I D executed by 12 I nstruction(s).2 I nsUuction(s)currently in Pipeline.tails:R A W stalls:9(33.33%of all Cycles),thereof:LD stalls:1(11.11 of R A W stalls)Branch/Jump stalls:0(0.00 of R A

20、W stalls)Floating point stalls:8(100.00 of R A W stalls)W A W stalls:0(0.00 of all Cycles)Structural stalls:0(0.00 of all Cycles)Control stalls:0(0.00%of all Cycles)Trap stalls:7(2：92%of all Cycles)Total:16 Stall固(59.26%of all Cycles)调度之后的程序s c h _ a ft:.d a ta.g lobal ONEO N E：.w or d 1.tex t.g 1 o

21、bal mainmain:1 f f 1,ONE;turn d iv f into a mo v ecv t i2f f7,f 1 ；b y s to r i n g in f 7 1 i nno p;fl o ating-p o int fo r matdivffl,f8,f 7;m o ve Y=(f8)into f ld ivf f 2,f 9,f7;moveZ=(f9)in t o f 2divf f 4,fll,f7；mo v e B=(f1 1 )i n t o f4divff5,f 1 2,f 7;mo v e C=(fl2)i n t o f5add f f3,f 1,f2mu

22、ltf f6,f4,f5d iv f fl 0,f3,f7;m o vef3into X=(flO)divf f 13,f6,f7;move f 6 i n to A=(fl3)Finish:tra p 0运营之后得到：Total:21 Cycle(s)executed.ID executed by 12 Instruction(s).2 lnstruction(sj currently in Pipeline.Stalls:RAW stalls:3(14.28%of all Cycles),thereof:LD stalk:1 33.33%of RAW stalls)Branch/Jump

23、stalls:0(0.00%of RAW stalls)Floating point stalls:2(66.67%of RAW stalls)WAW stalls:0(0.00 of all Cycles)Structural stalls:0(0.00 of all Cycles)Control stalls:0(0.00 of all Cycles)Trap stalls:6(28.57 of all Cycles)Total:9 Stall回(42.86%of all Cycles)可以看出通过调度之后运营周期从2 7 减少到2 1,并且减少了相关。2.循环展开：循环展开前的程序：L

24、HI R2,(A16)&0 xFF FFADDUI R2,R2,A&0 xFFF FLH I R3,(B16)&0 xFFFFADDUI R3,R3,B&0 x FFFFADDU R 4,R 0,R 3NOP1 oop:SUB I R 4,R4,#8S UB R 5,R4,R2BNEZ R5,loopTRAP#0.d ou b I e 1,2,3,4B:.double 1,2,3,4运营结果:StatisticsTotal:30 Cycle(s)executed.ID executed by 19 Instruction2 Instruction(s)currently in Pipeline

25、.Hardware conf iguration:Memory size:32768 BytesfaddEX-Stages:1z required Cycles:2fmulEX-Stages:1z required Cycles:5fdivEX-Stages:1,required Cycles:19Forwarding enabled.Stalls:RAW stalls:4 03.33%of all Cycles),thereof:LD stalls:0(0.00 of RAW stalls)Branch/Jump stalls:4(100.00%of RAW stalls)Floating

26、point stalls:0(0.00 of RAW stalls)WAW stalls:0(0.00%of all Cycles)Structural stalls:0(0.00 of all Cycles)Control stalls:3(10.00%of all Cycles)Trap stalls:3(10.00 of all Cycles)Total:10 Stall(s)(33.33%of all Cycles)业循环展开后的程序：LH I R2,(A 16)&OxFFFFADDUI R2,R2,A&OxF F F FLHI R3,(B 1 6)&0 xFFFFADDU I R 3

27、,R 3 ,B&Ox FFFFADDU R4,R0,R3SUBI R4,R4,#8SU BI R 4,R 4,#8SUBI R4,R4,#8SU BI R4,R4,#8TRAP#0A：.d o u ble I,2,3,4B:.d o uble 1,2,3,4运营结果：Statistics|匚Total:14 Cycle 回 executed.ID executed by 10 lnstruction($).2 Instruction(s)currently in Pipeline.Hardware conf iguration:Memory size:32768 BytesfaddEX-Sta

28、ges:1z required Cycles:2fmulEX-Stages:1z required Cycles:5fdivEX-Stages:1z required Cycles:19Forwarding enabled.Stalls:RAW stalls:0(0.00 of all Cycles),thereof:LD stalls:0(0.00 of RAW stalls)Branch/Jump stalls:0(0.00%of RAW stallsFloating point stalls:0(0.00 of RAW stalls)WAW stalls:0(0.00 of all Cycles)Structural stalls:0(0.00 of all Cycles)Control stalls:0(0.00%of all Cycles)Trap stalls:3(21.43%of all Cycles)Total:3 Stall(s)(21.43 of all Cycles)可以看出通过循环展开之后运营周期从3 0减少到1 4,并且减少了相关。三.实验总结：通过本实验，基本掌握了 WinDLX模拟器的操作和使用，熟悉D LX指令集结构及其特点,对减少各种相关、提高流水线速度的方法和技术有了更深的结识，对于体系结构这门课程的学习和后面的实验还是很有帮助的。

展开阅读全文