《highthroughput asynchronous pipelines for finegrain dynamic.ppt》由会员分享,可在线阅读,更多相关《highthroughput asynchronous pipelines for finegrain dynamic.ppt(10页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、1A Classic AsynchronousDynamic PipelineWilliams and Horowitzs PS0 pipeline:Structure Structure Operation Operation Performance Performance2A Classic Approach:PS0 PipelineWilliams/Horowitz(Stanford U.)1986-91:Williams/Horowitz(Stanford U.)1986-91:l lsuccessfully used in fabricated chips Stanford 87 H
2、AL 90ssuccessfully used in fabricated chips Stanford 87 HAL 90sImplemented using“Implemented using“dynamic logic”dynamic logic”ProcessingBlockCompletionDetectorDataDataininDataDataoutoutStage 1Stage 1Stage 2Stage 2Stage 3Stage 3ackackdatadata3PS0 Pipeline StageA PS0 stage consists of dynamic gates a
3、nd a A PS0 stage consists of dynamic gates and a completion detector:completion detector:Pull-downPull-downnetworknetwork“keeper”“keeper”PCPCdatadatainputsinputsdatadataoutputsoutputsProcessing BlockProcessing BlockCompletionCompletionDetectorDetectorackack4Dual-Rail Completion DetectorCombines dual
4、-rail signalsCombines dual-rail signalsIndicates when all bits are valid(or reset)Indicates when all bits are valid(or reset)C CDoneDoneORORbitbit0 0ORORbitbit1 1ORORbitbitn nOR together 2 rails per bitMerge results using“C-element”C-element:C-element:l lif all inputs=1,output if all inputs=1,output
5、 1 1l lif all inputs=0,output if all inputs=0,output 0 0l lelse,maintain output valueelse,maintain output value5Precharge Precharge Evaluate:Evaluate:another 3 eventsanother 3 eventsComplete cycle:Complete cycle:6 events6 eventsindicates“done”indicates“done”l lPRECHARGE N:PRECHARGE N:when N+1 comple
6、tes evaluationwhen N+1 completes evaluationdelete data:delete data:afterafter next stage has copied it next stage has copied itl lEVALUATE N:EVALUATE N:when N+1 completes prechargingwhen N+1 completes prechargingaccept new data:accept new data:after after next stage is emptiednext stage is emptiedPS
7、0 Protocol1 12 23 34 45 56 6evaluatesevaluatesevaluatesevaluatesevaluatesevaluatesindicates“done”indicates“done”prechargesprechargesindicates“done”indicates“done”3 3Evaluate Evaluate Precharge:Precharge:3 events3 eventsN NN+1N+1N+2N+26PS0 Performance1 12 23 34 45 56 6Cycle Time=Cycle Time=7Summary:P
8、SO PipeliningDatapaths are Datapaths are latch-free:latch-free:l ldynamic gates themselves provide implicit latchesdynamic gates themselves provide implicit latches+:chip area savings+:chip area savings+:extremely low latency+:extremely low latencyData items kept separate by controlData items kept s
9、eparate by controll lstage deletes data:stage deletes data:only afteronly after next stage has copied itnext stage has copied itl lstage accepts new data:stage accepts new data:only ifonly if next stage is emptynext stage is emptydistinct data items always separated by“spacers”distinct data items al
10、ways separated by“spacers”Control is Control is extremely simple:extremely simple:each controller=single wireeach controller=single wirel lcompletion detector directly controls previous stagecompletion detector directly controls previous stage+:chip area savings+:chip area savings+:low control overh
11、ead+:low control overhead8Comparison to a Clocked PipelineHow would you design the pipeline if you actually had a clock?How would you design the pipeline if you actually had a clock?1.1.Replace handshaking with Replace handshaking with“magic clocking”“magic clocking”l leach stage gets its own clocke
12、ach stage gets its own clockl lsuccessive clocks are slightly skewedsuccessive clocks are slightly skewedessentially,clocked simulation of asynchronous handshaking!essentially,clocked simulation of asynchronous handshaking!need multiple clock phases!need multiple clock phases!2.2.Use a single clock,
13、but insert Use a single clock,but insert latcheslatches between stages between stagesl llatches are simple,level-sensitivelatches are simple,level-sensitivel lconsecutive stages receive complementary clock signalsconsecutive stages receive complementary clock signalslatchlatchCkCkCkCk9Comparison (co
14、ntd.)Cycle Times?Cycle Times?10Drawbacks of PSO Pipelining1.1.Poor throughput:Poor throughput:l llong cycle time:6 events per cyclelong cycle time:6 events per cyclel ldata“tokens”are forced far apart in timedata“tokens”are forced far apart in time2.2.Limited storage capacity:Limited storage capacit
15、y:l lmax only 50%of stages can hold distinct tokensmax only 50%of stages can hold distinct tokensl ldata tokens must be separated by at least one spacerdata tokens must be separated by at least one spacerOur Research Goals:Our Research Goals:address both issuesaddress both issuesl lstill maintain very low latencystill maintain very low latency