《【教学课件】第十三章共享存储系统编程.ppt》由会员分享,可在线阅读,更多相关《【教学课件】第十三章共享存储系统编程.ppt(50页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、第十三章 共享存储系统编程共享存储系统编程 13.1 ANSI X3H513.1 ANSI X3H5共享存储模型共享存储模型 13.2 POSIX 13.2 POSIX 线程模型线程模型 13.3 OpenMP13.3 OpenMP模型模型编程标准的作用编程标准的作用 规定程序的执行模型规定程序的执行模型规定程序的执行模型规定程序的执行模型 SPMD,SMP SPMD,SMP 等等等等 如何表达并行性如何表达并行性如何表达并行性如何表达并行性 DOACROSS,FORALL,PARALLEL,INDEPENDENTDOACROSS,FORALL,PARALLEL,INDEPENDENT 如何表
2、达同步如何表达同步如何表达同步如何表达同步 Lock,Barrier,Semaphore,Condition VariablesLock,Barrier,Semaphore,Condition Variables 如何获得运行时的环境变量如何获得运行时的环境变量如何获得运行时的环境变量如何获得运行时的环境变量 threadid,num of processesthreadid,num of processesANSI X3H5共享存储器模型 Started in the mid-80s with the emergence of shared memory parallel Started i
3、n the mid-80s with the emergence of shared memory parallel computers with proprietary directive driven programming environmentscomputers with proprietary directive driven programming environments 更早的标准化结果更早的标准化结果PCFPCF共享存储器并行共享存储器并行FortranFortran 19931993年制定的概念性编程模型年制定的概念性编程模型 Language BindingLangua
4、ge Binding C C Fortran 77Fortran 77 Fortran 90Fortran 90 并行块(工作共享构造)并行块(工作共享构造)并行块并行块(psections.end psections)(psections.end psections)并行循环并行循环(pdo.Endo pdo)(pdo.Endo pdo)单进程单进程(psingle.End psingle)(psingle.End psingle)可嵌套可嵌套 非共享块重复执行非共享块重复执行 隐式路障隐式路障(nowait)(nowait),显式路障和阻挡操作,显式路障和阻挡操作 共享共享/私有变量私有变
5、量 线程同步线程同步 门插销门插销(latch)(latch):临界区:临界区 锁:锁:test,lock,unlocktest,lock,unlock 事件事件:wait,post,clear:wait,post,clear 序数序数(ordinal):(ordinal):顺序顺序X3H5:并行性构造Program mainProgram main!程序以顺序模式开始程序以顺序模式开始程序以顺序模式开始程序以顺序模式开始,此时只有一个此时只有一个此时只有一个此时只有一个A A!A!A只由基本线程执行,称为主线程只由基本线程执行,称为主线程只由基本线程执行,称为主线程只由基本线程执行,称为主线
6、程parallelparallel!转换为并行模式,派生出多个子线程(一个组)转换为并行模式,派生出多个子线程(一个组)转换为并行模式,派生出多个子线程(一个组)转换为并行模式,派生出多个子线程(一个组)B B!B!B为每个组员所复制为每个组员所复制为每个组员所复制为每个组员所复制psectionspsections!并行块开始并行块开始并行块开始并行块开始sectionsectionC C!一个组员执行一个组员执行一个组员执行一个组员执行C CsectionsectionD D!一个组员执行一个组员执行一个组员执行一个组员执行D Dend psectionsend psections!等待等
7、待等待等待C C和和和和D D都结束都结束都结束都结束psinglepsingle!暂时转换成顺序模式暂时转换成顺序模式暂时转换成顺序模式暂时转换成顺序模式E E!已由一个组员执行已由一个组员执行已由一个组员执行已由一个组员执行end psingleend psingle!转回并行模式转回并行模式转回并行模式转回并行模式pdo i=1,6pdo i=1,6!pdo!pdo构造开始构造开始构造开始构造开始F(i)F(i)!组员共享组员共享组员共享组员共享F F的六次迭代的六次迭代的六次迭代的六次迭代end pdo no waitend pdo no wait!无隐式路障同步无隐式路障同步无隐式路
8、障同步无隐式路障同步G G!更多的复制代码更多的复制代码更多的复制代码更多的复制代码end parallelend parallel!转为顺序模式转为顺序模式转为顺序模式转为顺序模式HH!初始化进程单独执行初始化进程单独执行初始化进程单独执行初始化进程单独执行HH.!可能有更多的并行构造可能有更多的并行构造可能有更多的并行构造可能有更多的并行构造EndEnd线程隐式路障同步PQRABCEF(1:2)GHGGF(3:4)F(5:6)DBB隐式路障同步隐式路障同步无隐式路障同步隐式路障同步共享存储系统编程 13.1 ANSI X3H513.1 ANSI X3H5共享存储模型共享存储模型 13.2
9、POSIX 13.2 POSIX 线程模型线程模型 13.3 OpenMP13.3 OpenMP模型模型POSIX线程模型 IEEE/ANSIIEEE/ANSI标准标准IEEE POSIX 1003.1c-1995IEEE POSIX 1003.1c-1995线程标准线程标准Unix/NTUnix/NT操作系统层上的,操作系统层上的,SMPSMP Chorus,Topaz,Mach CthreadsChorus,Topaz,Mach Cthreads Win32 ThreadWin32 Thread GetThreadHandle,SetThreadPriority,SuspendThread
10、,ResumeThreadGetThreadHandle,SetThreadPriority,SuspendThread,ResumeThread TLS(TLS(线程局部存储线程局部存储)TlsAlloc,TlsSetValue)TlsAlloc,TlsSetValue LinuxThreads:_clone and sys_cloneLinuxThreads:_clone and sys_clone 用户线程和内核线程用户线程和内核线程(LWP)(LWP)(一到一,一一到一,一到多,多到多到多,多到多)What Are Threads?General-purpose solution fo
11、r managing concurrency.General-purpose solution for managing concurrency.Multiple independent execution streams.Multiple independent execution streams.Shared state.Shared state.Preemptive scheduling.Preemptive scheduling.Synchronization(e.g.locks,conditions).Synchronization(e.g.locks,conditions).Sha
12、red state(memory,files,etc.)Threads 线程共享相同的内存空间。线程共享相同的内存空间。与标准与标准 fork()fork()相比,线程带来的开销很小。内核无需单独复制进程的内相比,线程带来的开销很小。内核无需单独复制进程的内存空间或文件描述符等等。这就节省了大量的存空间或文件描述符等等。这就节省了大量的 CPU CPU 时间。时间。和进程一样,线程将利用多和进程一样,线程将利用多 CPUCPU。如果软件是针对多处理器系统设计的,。如果软件是针对多处理器系统设计的,计算密集型应用。计算密集型应用。支持内存共享无需使用繁琐的支持内存共享无需使用繁琐的 IPC IP
13、C 和其它复杂的通信机制。和其它复杂的通信机制。Linux _cloneLinux _clone不可移植,不可移植,PthreadPthread可移植可移植。POSIX POSIX 线程标准不记录任何线程标准不记录任何“家族家族”信息。无父无子。如果要等待一个信息。无父无子。如果要等待一个线程终止,就必须将线程的线程终止,就必须将线程的 tid tid 传递给传递给 pthread_join()pthread_join()。线程库无法为。线程库无法为您断定您断定 tidtid。POSIX Threads:Basics and Examples by Uday Kamathhttp:/www.c
14、oe.uncc.edu/abw/parallel/pthreads/pthreads.htmlPOSIX 线程详解线程详解:一种支持内存共一种支持内存共享的简单和快捷的工具享的简单和快捷的工具by Daniel Robbinshttp:/ 2Solaris 2pthread_createpthread_createthr_createthr_createpthread_exitpthread_exitthr_exitthr_exitpthread_killpthread_killthr_killthr_killpthread_joinpthread_jointhr_jointhr_joinpt
15、hread_selfpthread_selfthr_selfthr_self线程调用线程同步和互斥POSIXPOSIXSolaris 2Solaris 2pthread_mutex_initpthread_mutex_initmutex_initmutex_initpthread_ mutex_destroy pthread_ mutex_destroy mutex_destroymutex_destroypthread_ mutex_lock pthread_ mutex_lock mutex_lockmutex_lockpthread_ mutex_trylock pthread_ mut
16、ex_trylock mutex_trylockmutex_trylockpthread_ mutex_unlock pthread_ mutex_unlock mutex_unlockmutex_unlockpthread_cond_initpthread_cond_initpthread_cond_destroypthread_cond_destroypthread_cond_waitpthread_cond_waitpthread_cond_timedwaitpthread_cond_timedwaitpthread_cond_signalpthread_cond_signalpthre
17、ad_cond_broadcastpthread_cond_broadcastPthreadsPthreads实现计算实现计算 的实例的实例 1 1PthreadsPthreads实现计算实现计算 的实例的实例 2 2对生产者驱动的有界缓冲区问题的对生产者驱动的有界缓冲区问题的Pthread条件变量解条件变量解void*producer(void*arg1)void*producer(void*arg1)int i;int i;for(i=1;i=SUMSIZE;i+)for(i=1;i=SUMSIZE;i+)pthread_mutex_lock(&slot_lock);pthread_mut
18、ex_lock(&slot_lock);while(nslots=0)while(nslots=0)pthread_cond_wait(&slots,&slot_lock);pthread_cond_wait(&slots,&slot_lock);nslots-;nslots-;pthread_mutex_unlock(&slot_lock);pthread_mutex_unlock(&slot_lock);put_item(i*i);put_item(i*i);pthread_mutex_lock(&item_lock);pthread_mutex_lock(&item_lock);nite
19、ms+;nitems+;pthread_cond_signal(&items);pthread_cond_signal(&items);pthread_mutex_unlock(&item_lock);pthread_mutex_unlock(&item_lock);pthread_mutex_lock(&item_lock);pthread_mutex_lock(&item_lock);producer_done=1;producer_done=1;pthread_cond_broadcast(&items);pthread_cond_broadcast(&items);pthread_mu
20、tex_unlock(&item_lock);pthread_mutex_unlock(&item_lock);return NULL;return NULL;void*consumer(void*arg2)void*consumer(void*arg2)int i,myitem;int i,myitem;for(;)for(;)pthread_mutex_lock(&item_lock);pthread_mutex_lock(&item_lock);while(nitems=0)&!producer_done)while(nitems=0)&!producer_done)pthread_co
21、nd_wait(&items,&item_lock);pthread_cond_wait(&items,&item_lock);if(nitems=0)&producer_done)if(nitems=0)&producer_done)ptherad_mutex_unlock(&item_lock);ptherad_mutex_unlock(&item_lock);break;break;nitems-;nitems-;pthread_mutex_unlock(&item_lock);pthread_mutex_unlock(&item_lock);get_item(&myitem);get_
22、item(&myitem);sum+=myitem;sum+=myitem;pthread_mutex_lock(&slot_lock pthread_mutex_lock(&slot_lock nslots+;nslots+;cond_signal(&slots);cond_signal(&slots);pthread_mutex_unlock(&slot_lock);pthread_mutex_unlock(&slot_lock);return NULL;return NULL;共享存储系统编程 13.1 ANSI X3H513.1 ANSI X3H5共享存储模型共享存储模型 13.2 P
23、OSIX 13.2 POSIX 线程模型线程模型 13.3 OpenMP13.3 OpenMP模型模型The History of OpenMPThe History of OpenMP What is directive/pragma?What is directive/pragma?Directive-based general purpose parallel programming API with emphasis Directive-based general purpose parallel programming API with emphasis on the ability
24、 to parallelize existing serial programson the ability to parallelize existing serial programs Why a new standard?Why a new standard?Whos Involved?Whos Involved?Parallelism model and basic directivesParallelism model and basic directives Fortran77Fortran77,Fortran90,Fortran90 C,C+C,C+OpenMP标准The His
25、tory of OpenMP A key intermediate step was X3H5 in the late 80s.A key intermediate step was X3H5 in the late 80s.An official standards effort to agree on a parallel dialect of Fortran for An official standards effort to agree on a parallel dialect of Fortran for shared memory computers.shared memory
26、 computers.The X3H5 effort failed.It was too big and too late.The X3H5 effort failed.It was too big and too late.OpenMP is born:OpenMP is born:In 1996 a group formed to create an industry standard set of directives for In 1996 a group formed to create an industry standard set of directives for SMP p
27、rogrammingSMP programming This group called itself the OpenMP Architecture Review Board(the ARB)This group called itself the OpenMP Architecture Review Board(the ARB)who takes care of OpenMPwho takes care of OpenMPThe History of OpenMP(cont.)The ARB has released the following specifications:The ARB
28、has released the following specifications:OpenMP 1.0 for Fortran,Nov.1997OpenMP 1.0 for Fortran,Nov.1997 OpenMP 1.0 for C/C+,Nov.1998OpenMP 1.0 for C/C+,Nov.1998 OpenMP Fortran Interpretations,Spring 1999OpenMP Fortran Interpretations,Spring 1999 OpenMP 2.0(soon)OpenMP 2.0(soon)OpenMP is an evolving
29、 standard.Send comments over the OpenMP is an evolving standard.Send comments over the feedback link on the OpenMP web feedback link on the OpenMP web site(site(http:/www.openmp.orghttp:/www.openmp.org)为什么要建立新标准为什么要建立新标准?ANSI X3H5,1994ANSI X3H5,1994 时机不好时机不好,分布式机器流行分布式机器流行 只支持循环级并行性,粒度太细只支持循环级并行性,粒度
30、太细 Pthreads(IEEE Posix 1003.4a)Pthreads(IEEE Posix 1003.4a)是为低端是为低端(low end)(low end)的共享机器的共享机器(如如SMP)SMP)的标准的标准 对对FORTRANFORTRAN的支持不够的支持不够 适合任务并行适合任务并行,而不适合数据并行而不适合数据并行 MPI MPI 消息传递的编程标准消息传递的编程标准,对程序员要求高对程序员要求高 HPF HPF 主要用于主要用于分布式存储机器分布式存储机器 大量已有的科学应用程序需要很好地被继承和移植大量已有的科学应用程序需要很好地被继承和移植大量已有的科学应用程序需要
31、很好地被继承和移植大量已有的科学应用程序需要很好地被继承和移植 In a Nutshell A set of directives(library routines,and environment variables)used to annotate a A set of directives(library routines,and environment variables)used to annotate a sequential program to indicate how it should be executed in parallelsequential program to
32、indicate how it should be executed in parallel继承继承X3H5X3H5的许多概念的许多概念 Portable,Simple and Scalable Shared Memory Multiprocessing APIPortable,Simple and Scalable Shared Memory Multiprocessing API not a new languagenot a new language not automatic parallelization not automatic parallelization extend ba
33、se languages:Fortran77,Fortran90,C and C+extend base languages:Fortran77,Fortran90,C and C+Multi-vendor Support,for both UNIX and NTMulti-vendor Support,for both UNIX and NT Standardizes Fine Grained(Loop)Parallelism,also Supports Coarse Grained Standardizes Fine Grained(Loop)Parallelism,also Suppor
34、ts Coarse Grained AlgorithmsAlgorithmsOpenMPOpenMP是什么?是什么?一组编译制导语句和可调用的运行一组编译制导语句和可调用的运行一组编译制导语句和可调用的运行一组编译制导语句和可调用的运行(run-time)(run-time)库函数库函数库函数库函数,扩充到基本语言扩充到基本语言扩充到基本语言扩充到基本语言中用来表达程序中的并行性中用来表达程序中的并行性中用来表达程序中的并行性中用来表达程序中的并行性 编译制导语句包括编译制导语句包括编译制导语句包括编译制导语句包括:在串行程序中加入下列结构在串行程序中加入下列结构在串行程序中加入下列结构在串行
35、程序中加入下列结构 SPMD(Single Program Multiple Data)constructsSPMD(Single Program Multiple Data)constructs work-sharing constructswork-sharing constructs synchronization constructssynchronization constructs data environment constructsdata environment constructs 运行库函数包括运行库函数包括运行库函数包括运行库函数包括:execution environ
36、ment routinesexecution environment routines lock routines lock routines 另外另外另外另外,在在在在FORTRANFORTRAN标准中标准中标准中标准中,还包括对环境变量的描述还包括对环境变量的描述还包括对环境变量的描述还包括对环境变量的描述OpenMPOpenMP当前的状况当前的状况 19971997年年年年1010月月月月2828日日日日,DEC,IBM,Intel,SGI,DEC,IBM,Intel,SGI,和和 Kuch&Associates Kuch&Associates 等公司的等公司的代表们决定制定一种适用于多
37、种硬件平台的共享存储编程的新的工业应用标代表们决定制定一种适用于多种硬件平台的共享存储编程的新的工业应用标准准 接着接着,全球很多的组织和全球很多的组织和ISVISV决定支持这一标准决定支持这一标准,如如DOE/ASCI,Livermore DOE/ASCI,Livermore Software Technology Corp.,Fluent Inc.,Absoft Corp.,Ansys Inc.Etc.Software Technology Corp.,Fluent Inc.,Absoft Corp.,Ansys Inc.Etc.目前支持目前支持目前支持目前支持FORTRANFORTRAN
38、语言语言语言语言,C,C 和和C,C,并建有专门的网址并建有专门的网址 http:/www.openmp.orghttp:/www.openmp.org 在科研机构中在科研机构中,也引起了足够的重视也引起了足够的重视,被认为是被认为是2121世纪最受欢迎的并行编程标世纪最受欢迎的并行编程标准准 OpenMP on NOWs(SC98,Nov.1998)OpenMP on NOWs(SC98,Nov.1998)Integrated OpenMP and MPI on ClustersIntegrated OpenMP and MPI on ClustersSPMDSPMD的程序执行模型的程序执行
39、模型 P0 P1 P2 .PnSMPSMP的程序执行模型的程序执行模型OpenMPOpenMP的程序执行模型的程序执行模型Parallel and work sharing directivesdata environment directivessynchronization directives编译编译制导语句制导语句(1)(1)Work-sharing constructsWork-sharing constructs 将结构内的任务分配到处理机中将结构内的任务分配到处理机中,必须动态地放在必须动态地放在Parallel region Parallel region construct
40、construct 中中,进入这种结构之前并不隐含进入这种结构之前并不隐含BARRIERBARRIER操作操作 DO(DO(最常用最常用)有有SCHEDULESCHEDULE选项选项,可以指定采用什么调度算法可以指定采用什么调度算法 SECTIONS(SECTIONS(可以流水线执行之可以流水线执行之)SINGLE(SINGLE(只有一个处理机执行之只有一个处理机执行之)Parallel Region:parallel,end parallelWork Sharing:do,sections,single(parallel do,nowait)Fork-Join model of parall
41、el execution(static,dynamic,orphaned)Parallel Region and Work Sharing Directives编译编译制导语句制导语句(2)(2)指令格式指令格式指令格式指令格式 固定形式固定形式 !$OMP!$OMP 自由形式自由形式 !$OMP,*$OMP,C$OMP!$OMP,*$OMP,C$OMP Parallel Region ConstructParallel Region Construct !$OMP Parallel clause,clause.!$OMP Parallel clause,clause.Do I=1,20 Do
42、 I=1,20 A(I)=A(I)+B(I)A(I)=A(I)+B(I)!$OMP End Parallel !$OMP End Parallel (隐含隐含隐含隐含BARRIERBARRIER操作操作操作操作)其中其中ClauseClause可以为可以为:PRIVATEPRIVATE(list),(list),SHAREDSHARED(list),(list),COPYINCOPYIN(list),(list),FIRSTPRIVATEFIRSTPRIVATE(list),(list),DEFAULTDEFAULT(PRIVATE|SHARED|NONE),(PRIVATE|SHARED|N
43、ONE),REDUCTIONREDUCTION(operation|intrinsic:list),(operation|intrinsic:list),IFIF(logical_expression)(logical_expression)DO编译编译制导语句制导语句 !$OMP DO clause,clause.!$OMP DO clause,clause.do_loop do_loop !$OMP END DO NOWAIT!$OMP END DO NOWAIT 例子例子:!$OMP PARALLEL DO!$OMP PARALLEL DO DO I=2,N DO I=2,N B(I)=
44、(A(I)+A(I-1)/2.0 B(I)=(A(I)+A(I-1)/2.0 ENDDO ENDDO !$OMP END DO NOWAIT!$OMP END DO NOWAIT !$OMP END PARALLEL!$OMP END PARALLEL SECTIONS SECTIONS 编译编译制导语句制导语句 !$OMP SECTIONS!$OMP SECTIONS !$OMP SECTION!$OMP SECTION block1 block1 !$OMP SECTION!$OMP SECTION block2 block2 !$OMP SECTION!$OMP SECTION bloc
45、k3 block3 !$OMP END SECTIONS!$OMP END SECTIONS编译编译制导语句制导语句(3)(3)Data environment constructsData environment constructs THREADPRIVATETHREADPRIVATE Data scope attribute clausesData scope attribute clauses PRIVATEPRIVATE SHAREDSHARED DEFAULTDEFAULT FIRSTPRIVATEFIRSTPRIVATE LASTPRIVATELASTPRIVATE REDUCT
46、IONREDUCTION COPYINCOPYINData Scope attribute clauses:Private,Shared,Default,Firstprivate,Lastprivate,Reduction and Copyin/Copyout(value undefined entering/exiting parallel region)Threadprivate directives:Private to a thread but global within the thread(SMP)Fortran:COMMON blocks/C:file scope and sta
47、tic variablesData Environment Directives编译编译制导语句制导语句(4)(4)Synchronization constructsSynchronization constructs MASTERMASTER CRITICALCRITICAL BARRIERBARRIER ATOMICATOMIC FLUSHFLUSH ORDEREDORDERED例子例子(ORDERED)(ORDERED)规定了各个线程执行的顺序规定了各个线程执行的顺序规定了各个线程执行的顺序规定了各个线程执行的顺序 !$OMP PARALLEL!$OMP PARALLEL !$OMP
48、DO!$OMP DO ORDEREDORDERED SCHEDULE(DYNAMIC)SCHEDULE(DYNAMIC)DO I=DO I=LowBoundLowBound,UpBoundUpBound,Step,Step CALL WORK(I)CALL WORK(I)END DO END DO !$OMP END PARALLEL!$OMP END PARALLEL SUBROUTINE WORK(K)SUBROUTINE WORK(K)!$OMP!$OMP ORDEREDORDERED WRITE(*,*)K WRITE(*,*)K !$OMP END!$OMP END ORDERED
49、ORDERED END END Synchronization Directivesmaster,barrier,critical,atomic,flush,orderedOpenMPOpenMP的的OrphanOrphan新特性新特性 1 1 为了便于支持粗粒度的任务级并行为了便于支持粗粒度的任务级并行为了便于支持粗粒度的任务级并行为了便于支持粗粒度的任务级并行,OpenMP,OpenMP,OpenMP,OpenMP 提供了提供了提供了提供了OrphanOrphanOrphanOrphan制导语制导语制导语制导语句句句句 OrphanOrphanOrphanOrphan制导语句是指那些在并行
50、区域制导语句是指那些在并行区域制导语句是指那些在并行区域制导语句是指那些在并行区域(Parallel Region,(Parallel Region,(Parallel Region,(Parallel Region,如如如如PARALLEL)PARALLEL)PARALLEL)PARALLEL)之外的制导语句之外的制导语句之外的制导语句之外的制导语句 在在在在OpenMPOpenMPOpenMPOpenMP中提供了一种绑定规则使得这些中提供了一种绑定规则使得这些中提供了一种绑定规则使得这些中提供了一种绑定规则使得这些OrphanOrphanOrphanOrphan制导语句与调用它制导语句与调