2022年调优DB实现高性能 .pdf-淘文阁

资源描述

《2022年调优DB实现高性能 .pdf》由会员分享，可在线阅读，更多相关《2022年调优DB实现高性能 .pdf（20页珍藏版）》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、调优 DB2 实现高性能：案例研究简介设计不良的系统可能会在上线之后很快出现性能问题。即便经过良好调优的系统在长时间的操作或重大功能变更之后也会遇到性能问题。调优系统是系统管理员不可逃避的任务。作为大多数应用系统的一个重要部分，数据库性能调优是此任务中的一个重要方面。统计数据显示，数据库调优可使从未调优过的系统实现20% 的性能提升但是，如果未能合理地执行调优，则会给生产系统带来风险。这篇文章将展示IBM DB2 for Linux, UNIX, and Windows 环境中的数据库性能调优的实际案例研究。本案例研究中调优的系统是一个基于JIRA Enterprise 包的工作流应用程序，

2、它使用 DB2 作为后端数据库。应用程序使用两种操作模式：夜间批处理模式和日间 OLTP 模式。在夜间批处理时间中，将执行一系列的shell 脚本，以便将外部数据（采用纯文本文件的形式）传输给数据库。在日间OLTP 时间中，操作人员按照JIRA 中定义的工作流处理这种业务数据。在应用程序运行大约一年之后，客户并未看到问题事故率的显著下降。调查表明，其中某些事故是由性能问题引起的，例如数据库死锁和JIRA 文件锁定超时。根据合同，每年约有5% 的工作负载增加。如果系统性能未得到改进，则未来会出现更多与性能相关的事故。性能调优势在必行。发现问题系统性能调优的前提任务就是找到性能问题源自何

3、处。Nigel 的 Linux 性能监视器（ NMON）就是一种收集关键性能数据的出色工具，例如CPU 利用率、内存利用率、磁盘忙率、顶级进程等。NMON 用于为系统内的各服务器收集性能信息。查看了所收集的NMON 数据之后，确定了两个性能问题：在夜间批处理时间中，数据库服务器的CPU 利用率在至少1 个小时的时间内保持80%。数据库服务器的某些磁盘会在一天内定期转为100% 忙。常规数据库性能调优包含以下阶段：1. 收集数据库服务器信息2. 收集数据库使用信息3. 分析数据库信息名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - -

4、名师精心整理 - - - - - - - 第 1 页，共 20 页 - - - - - - - - - 4. 设计调优活动5. 实现和评估调优结果以下几节详细讨论了各个阶段。收集数据库服务器信息在此阶段中，您要收集数据库服务器的硬件和软件信息以及数据库的配置。下面给出了您需要收集的一些信息：数据库服务器的类型CPU 的数量和类型内存数量磁盘驱动器的数量和制造商存储子系统的类型和制造商存储子系统的配置操作系统和数据库信息db2look 工具对于服务器中运行的各个实例的输出（db2look d dbname e o outputfile）各表空间及其容器的描述（db2 list tablesp

5、aces和 db2 list tablespace containers for tablespacename show details）请记住，信息越是丰富，能提供的帮助就越多。您在这里收集的任何信息都将在稍后的分析中提供很大的帮助。例如，在本案例研究中， db2look 和表空间信息解释了磁盘忙为何会成为问题：所有用户数据表都是在相同的表空间上创建的，位于相同的磁盘中。收集数据库使用信息收集数据库使用信息的方法有两种：获取快照和监视事件。两种方法都会收集实时数据库使用信息，例如缓冲池活动、锁定状态、SQL 语句信息等。然而，它们都有着不同的监视机制，也就是说可以在不同的场景中使用。正

6、如名称所表示的那样，快照捕捉数据库在特定时间点的即时信息。按照指定间隔获取的快照可用于观察趋势或者预测潜在的问题。它们对于排查已知时间段内发生的问题或者即席数据库健康检查极有帮助。利用快照时比使用事件监视器时消耗的资源更少。另一方面，事件监视器是事件驱动的。在预先定义的时间段内，事件监视器可在发生指定时间时创建记录。与快照相比，事件监视器可以提供更多基于数据库对象的统计信息（例如，数据库、表、表空间等对象的统计信息）。监视是连续的，因此它会收集所监视时间段内的整体数据库使用情况。由于连续性，在目标系统名师资料总结 - - -精品资料欢迎下载 - - - - - - - -

7、- - - - - - - - - - 名师精心整理 - - - - - - - 第 2 页，共 20 页 - - - - - - - - - 极为繁忙时，所消耗的资源数量可能非常惊人。您应尽力避免在调查生产系统时因监视而导致系统受损。在探索如何降低监视的性能影响之前，应观察设置事件监视器时的选项：表事件监视器、文件事件监视器和管道事件监视器。正如其名称所表示的那样，各种事件监视器之间的差别在于在何处创建事件记录：SQL 表、文件或通过指定管道创建。由于管道事件监视器在实践中并不常用（需要使用一个程序从指定管道中读取数据），因此本文仅关注表和文件事件监视器。表 1. 降低事件监视器的性能影

8、响的提示类别选项使用方法表和文件提示EventtypeCREATE EVENT MONITOR emon1 FOR STATEMENTSSTATEMENTS 监视器是最大的性能威胁。如果关注性能，那么就应该将STATEMENTS 监视器与其他监视器分离开来，使其自成一体。BuffersizeCREATE EVENT MONITOR emon1 FOR CONNECTIONS WRITE TO TABLE BUFFERSIZE 8为了减少表插入或者文件写入操作的开销，事件记录首先将写入一个缓冲区空间。在缓冲区空间充满时，事件记录随之移入事件表或文件。出于性能方面的原因，高度活跃的事件监视器

9、应比相对来说活跃度较低的监视器具有更大的缓冲区。BUFFERSIZE 表示缓冲区空间的容量（以4K 页面为单位）。由于空间是名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 3 页，共 20 页 - - - - - - - - - 从数据库监视器堆中分配的，因此所有事件监视器的合并容量不应超过最大大小（使用 db2 get dbm cfg | grep MON_HEAP_SZ 来确定这个值）。Blocked/NonblockedCREATE EVENT MONITOR emon1 F

10、OR CONNECTIONS WRITE TO TABLE BLOCKED/NONBLOCKED如果设置了BLOCKED ，则在事件记录从缓冲区空间移动到表 /文件时（即缓冲区空间已满时），生成事件的每个代理都会等待移动完成。通过这样的方式即可保证不丢失任何事件数据。然而，这种方式也会降低数据库性能。因此，在关注性能时，事件监视器应设置为 NONBLOCKED 。此时将会存在数据丢失的现象，但对数据库性能的影响可降低到最低限度。特定于表的提示逻辑数据组监视器元素CREATE EVENT MONITOR emon1 FOR DEADLOCKS WITH DETAILS WRITE TO

11、TABLE DLCONN (EXCLUDES(agent_id, lock_wait_start_time), DLLOCK (INCLUDES(lock_mode, table_name)每个事件监视器都使用多个数据库表来存储所收集名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 4 页，共 20 页 - - - - - - - - - 到的数据。举例来说，STATEMENTS 事件监视器收集语句数据并将其存储在表中：CONNHEADER 、STMT 、SUBSECTION 和CON

12、TROL 。通过避免收集不必要的事件表和字段，对性能的影响即可降低到最低限度。表空间CREATE EVENT MONITOR emon1 FOR CONNECTIONS WRITE TO TABLE CONN (TABLE conns, IN mytablespace)在磁盘忙成为性能瓶颈问题时，应将事件表放置在独立的表空间和独立的磁盘中，以便使磁盘写入操作更加平均地分布。PCTDEACTIVATECREATE EVENT MONITOR emon1 FOR CONNECTIONS WRITE TO TABLE CONN PCTDEACTIVATE 90PCTDEACTIVATE 选项

13、用于控制事件监视器的存储占用。它定义为一个百分比数字。举例来说，如果PCTDEACTIVATE 设置为 90，在事件表所在的表空间的占用容量达到90% 时，事件监视器将自动禁用。这个选项仅可用于数据名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 5 页，共 20 页 - - - - - - - - - 库托管表空间（ DMS）。特定于文件的提示Maxfiles/MaxfilesizeEVENT MONITOR emon1 FOR CONNECTIONS WRITE TO FILE

14、 myfile MAXFILES 10 MAXFILESIZE 32与 PCTDEACTIVATE 选项相似， MAXFILES 和MAXFILESIZE 可共同用于控制事件监视器有权使用多少存储空间。MAXFILESZIE 定义单独一个事件监视器文件可以包含的 4K 页面的数量。在达到最大数量时，即创建一个新文件来存储传入的事件数据。这种处理方式将一直继续到文件数量达到预先定义的MAXFILES 值，此时事件监视器将被自动禁用。通过应用以下两种方法，即可进一步降低影响生产性能的风险：1. 在生产环境中执行事件监视之前，应在测试环境中执行测试，或者在生产环境中执行短期的试运行，以便评估实际

15、性能影响。2. 为各性能指标设定一个阈值（例如，CPU 利用率： 90%），并在监视期间密切监视这些指标。在超出阈值时立即停止监视。分析数据库信息收集了所有信息之后，即可执行本节中介绍的各种分析。SQL 语句分析SQL 语句分析的主要信息资源是语句事件监视器。如果事件是使用事件文件监视的，即可使用db2evmon 来格式化输出，如清单 1 所示。清单 1. db2evmon 命令名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 6 页，共 20 页 - - - - - - - - -

16、db2evmon path event_files_directory output_filename 结果项如清单 2 所示。清单 2. 语句事件项示例1) Statement Event . Appl Handle: 53793 Appl Id: *LOCAL.db2inst1.101126060601 Appl Seq number: 00003 Record is the result of a flush: FALSE - Type : Dynamic Operation: Describe Section : 201 Creator : NULLID Package : SQLC2

17、G15 Consistency Token : AAAAALIY Package Version ID : Cursor : SQLCUR201 Cursor was blocking: TRUE Text : select * from schema.table - Start Time: 11/26/2010 15:06:35.641755 Stop Time: 11/26/2010 15:06:35.665380 Elapsed Execution Time: 0.023625 seconds Number of Agents created: 1 User CPU: 0.003768

18、seconds System CPU: 0.000000 seconds Statistic fabrication time (milliseconds): 0 Synchronous runstats time (milliseconds): 0 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 7 页，共 20 页 - - - - - - - - - Fetch Count: 62 Sorts: 0 Total sort time: 0 Sort overflows: 0 Row

19、s read: 62 Rows written: 0 Internal rows deleted: 0 Internal rows updated: 0 Internal rows inserted: 0 Bufferpool data logical reads: 1 Bufferpool data physical reads: 0 Bufferpool temporary data logical reads: 0 Bufferpool temporary data physical reads: 0 Bufferpool index logical reads: 0 Bufferpoo

20、l index physical reads: 0 Bufferpool temporary index logical reads: 0 Bufferpool temporary index physical reads: 0 Bufferpool xda logical page reads: 0 Bufferpool xda physical page reads: 0 Bufferpool temporary xda logical page reads: 0 Bufferpool temporary xda physical page reads: 0 SQLCA: sqlcode:

21、 0 sqlstate: 00000 Text 行显示了所执行的SQL 语句。 Elapsed Execution Time 表明执行此SQL 语句所耗费的时间。可以通过加总相同语句的所有执行耗时来计算各个SQL 语句的总执行时间。随后，总执行时间最长的语句将成为SQL 语句分析的候选项。IBM 提供了一系列工具来分析SQL 语句。 Visual Explain 、db2exfmt 和db2expln 对于检查各语句的访问计划是非常有用的。db2advis 工具提供了是否需要新索引来优化执行性能的建议。死锁分析死锁事件监视器提供了有关死锁发生原因和所发生的各死锁的历史记录的详细信息。清单3

22、展示了一个示例死锁事件项。名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 8 页，共 20 页 - - - - - - - - - 清单 3. 死锁事件项示例3382) Deadlocked Connection . Deadlock ID: 1 Participant no.: 2 Participant no. holding the lock: 1 Appl Id: 10.207.4.51.40897.100826202041 Appl Seq number: 03988 Tp

23、mon Client Workstation: server01 Appl Id of connection holding the lock: 10.207.4.51.39361.100826202035 Seq. no. of connection holding the lock: 00001 Lock wait start time: 08/27/2010 10:38:13.168058 Lock Name : 0 x020012032900E9161100000052 Lock Attributes : 0 x00000000 Release Flags : 0 x20000000

24、Lock Count : 1 Hold Count : 0 Current Mode : none Deadlock detection time: 08/27/2010 10:38:22.765817 Table of lock waited on : table Schema of lock waited on : schema Data partition id for table : 0 Tablespace of lock waited on : USERSPACE1 Type of lock: Row Mode of lock: X - Exclusive Mode applica

25、tion requested on lock: U - Update Node lock occured on: 0 Lock object name: 73398812713 Application Handle: 957 Deadlocked Statement: Type : Dynamic 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 9 页，共 20 页 - - - - - - - - - Operation: Fetch Section : 1 Creator : NU

26、LLID Package : SYSSH200 Cursor : SQL_CURSH200C1 Cursor was blocking: FALSE Text : SELECT value1, value2 FROM schema.table WHERE value1 = ? for update with rs List of Locks: Lock Name : 0 x020012032900EC161100000052 Lock Attributes : 0 x00000000 Release Flags : 0 x00000080 Lock Count : 1 Hold Count :

27、 0 Lock Object Name : 73399009321 Object Type : Row Tablespace Name : table Table Schema : schema Table Name : EXCLUSION Data partition id : 0 Mode : U - Update 13384) Deadlocked Connection . Deadlock ID: 1 Participant no.: 1 Participant no. holding the lock: 2 Appl Id: 10.207.4.51.39361.10082620203

28、5 Appl Seq number: 09195 Tpmon Client Workstation: server01 Appl Id of connection holding the lock: 10.207.4.51.40897.100826202041 Seq. no. of connection holding the lock: 00001 Lock wait start time: 08/27/2010 10:38:13.166513 Lock Name : 0 x020012032900EC161100000052名师资料总结 - - -精品资料欢迎下载 - - - - - -

29、 - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 10 页，共 20 页 - - - - - - - - - Lock Attributes : 0 x00000000 Release Flags : 0 x40000000 Lock Count : 1 Hold Count : 0 Current Mode : none Deadlock detection time: 08/27/2010 10:38:22.787777 Table of lock waited on : table Schema of lock waited on : sch

30、ema Data partition id for table : 0 Tablespace of lock waited on : USERSPACE1 Type of lock: Row Mode of lock: U - Update Mode application requested on lock: U - Update Node lock occured on: 0 Lock object name: 73399009321 Application Handle: 951 Deadlocked Statement: Type : Dynamic Operation: Execut

31、e Section : 1 Creator : NULLID Package : SYSSH200 Cursor : SQL_CURSH200C1 Cursor was blocking: FALSE Text : UPDATE schema.table SET value2 = ?, value3 = ? WHERE value1 IN (?,?) List of Locks: Lock Name : 0 x020012032900E9161100000052 Lock Attributes : 0 x00000000 Release Flags : 0 x40000000 名师资料总结 -

32、 - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 11 页，共 20 页 - - - - - - - - - Lock Count : 1 Hold Count : 0 Lock Object Name : 73398812713 Object Type : Row Tablespace Name : USERSPACE1 Table Schema : schema Table Name : table 清单 3 展示了死锁中涉及到了哪两个锁定、各锁定的类型以及对应的SQL 语句。通过修改相关语句，即

33、可减少死锁的出现。缓冲池分析您可以利用缓冲池事件监视器提供的信息来执行缓冲池分析，如清单 4 所示。清单 4. 缓冲池事件项示例3) Bufferpool Event . Bufferpool Name: IBMDEFAULTBP Database Name: database Database Path: /shared/dbg/db2inst3/db2inst3/NODE0000/SQL00001/ Buffer Pool Statistics: Buffer pool data logical page reads: 14871152 Buffer pool data physical

34、page reads: 1699818 Buffer pool data page writes: 53823 Buffer pool index logical page reads: 8606405 Buffer pool index physical page reads: 290822 Buffer pool index page writes: 272282 Buffer pool xda logical page reads: 0 Buffer pool xda physical page reads: 0 Buffer pool xda page writes: 0 Buffer

35、 pool read time (milliseconds): 1536574 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 12 页，共 20 页 - - - - - - - - - Buffer pool write time (milliseconds): 353641 Files closed: 0 Buffer pool asynch data page reads: 1694131 Buffer pool asynch data page read reqs: 5911

36、0 Buffer pool asynch data page writes: 53371 Buffer pool asynch index page reads: 227455 Buffer pool asynch index page read reqs: 8527 Buffer pool asynch index page writes: 270292 Buffer pool asynch xda page reads: 0 Buffer pool asynch xda page read reqs: 0 Buffer pool asynch xda writes: 0 Buffer po

37、ol asynch read time: 1327887 Buffer pool asynch write time: 347809 No victim buffers available: 1509238 Unread prefetch pages: 2995 Direct I/O Statistics: Sectors read directly: 13610 Sectors written directly: 1695616 Direct read requests: 1382 Direct write requests: 3763 Direct read time: 3758 Dire

38、ct write time: 22236 Vectored IOs: 67407 Pages from vectored IOs: 1921234 Block IOs: 0 Pages from block IOs: 0 可以利用清单 5 中的公式大致地计算缓冲池的效率。清单 5. 计算缓冲池效率的公式1 (Bufferpool data logical page reads + Bufferpool index logical page reads) 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - -

39、 - - - 第 13 页，共 20 页 - - - - - - - - - divided by (Bufferpool data physical page reads + Bufferpool index physical page reads) 如果计算得出的数量低于90%，则增加缓冲池的大小将是一种合理的调优选项。内存分析数据库事件监视器提供的信息可用于进行内存分析，如清单 6 所示。清单 6. 内存事件项示例3) Database Event Record is the result of a flush: FALSE Lock Statistics: Lock Waits: 0

40、 Total time waited on locks (milliseconds): 0 Deadlocks: 0 Lock escalations: 0 X lock escalations: 0 Lock timeouts: 0 Sort Statistics: Sorts: 844 Total sort time (milliseconds): 160043 Sort overflows: 80 Sort share heap high water mark: 9851 Post Shared Memory Threshold Sorts: 20 Hash Statistics: Ha

41、sh Joins: 25 Hash Loops: 0 Hash Join Small Overflows: 0 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 14 页，共 20 页 - - - - - - - - - Hash Join Overflows: 0 Post Shared Memory Threshold Hash Joins: 0 Node Number: 0 Memory Pool Type: Backup/Restore/Util Heap Current si

42、ze (bytes): 65536 High water mark (bytes): 196608 Configured size (bytes): 319815680如果输出中包含过多锁升级或者X 锁升级，则可能表明某个LOCKLIST 内存分配不足。较高的排序溢出率（排序溢出除以排序数量）或者较高的散列连接溢出率（散列连接小溢出+ 散列连接溢出） / 散列连接数量）表示未为SORTHEAP 分配足够的内存。如果内存高位接近所配置的大小，则表示所分配的内存大小过小。表空间与表分析表空间和表事件监视器信息可用于确定哪个表空间或者表最常被访问，如清单7 所示。清单 7. 表空间 /表事件

43、项示例5) Tablespace Event . Tablespace Name: USERSPACE1 Record is the result of a flush: FALSE File System Caching: Yes Buffer Pool Statistics: Buffer pool data logical page reads: 14846454 Buffer pool data physical page reads: 1699227名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - -

44、- - - - - 第 15 页，共 20 页 - - - - - - - - - Buffer pool data page writes: 31111 Buffer pool index logical page reads: 8593610 Buffer pool index physical page reads: 290381 Buffer pool index page writes: 272125 Buffer pool xda logical page reads: 0 Buffer pool xda physical page reads: 0 Buffer pool xda

45、 page writes: 0 Buffer pool read time (milliseconds): 1529939 Buffer pool write time (milliseconds): 350770 Files closed: 0 Buffer pool asynch data page reads: 1693042 Buffer pool asynch data page read reqs: 58409 Buffer pool asynch data page writes: 30761 Buffer pool asynch index page reads: 227412

46、 Buffer pool asynch index page read reqs: 8489 Buffer pool asynch index page writes: 270137 Buffer pool asynch xda page reads: 0 Buffer pool asynch xda page read reqs: 0 Buffer pool asynch xda writes: 0 Buffer pool asynch read time: 1325077 Buffer pool asynch write time: 345169 No victim buffers ava

47、ilable: 1435565 Unread prefetch pages: 2982 Direct I/O Statistics: Sectors read directly: 3488 Sectors written directly: 1695176 Direct read requests: 436 Direct write requests: 3752 Direct read time: 476 Direct write time: 22217 4) Table Event . 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - -

48、- - 名师精心整理 - - - - - - - 第 16 页，共 20 页 - - - - - - - - - Table schema: SCHEMA Table name: TEMP (00001,00002) Data partition id: 0 Record is the result of a flush: FALSE Table type: Temporary Data object pages: 1 Index object pages: 0 Lob object pages: 0 Long object pages: 0 Rows read: 3 Rows written

49、: 1 Overflow Accesses: 0 Page reorgs: 0 Tablespace id: 1 读取/写入数量表示表空间或者表的繁忙程度。如果最常被访问的表与其他表位于相同的磁盘中，则最好将其重新放置在不同的磁盘中，以便更加平均地分散磁盘读取和写入操作。另外一种解决方案是在多个物理磁盘上分散表中的数据。设计调优活动您可以根据所收集到的全部信息，设计实际调优活动。然而，每项调优活动都有着一定的风险和成本。在决定实现解决方案之前，必须进行谨慎的风险和ROI 分析。分析结果可能将调优活动分为以下类别：立即实现、有条件地实现或不实现。对于本案例研究，我们制作了表 2 来帮助制定

50、调优决策。表 2. 调优决策表调优活动性能改进风险ROI 决策条件添加新索引低低低立即无升级 CPU中低中有条件地峰值 CPU 利用率达到90%重新分配数据库表高高中有条件地观察到较高的CPU 等待 I/O实现和评估调优结果名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 17 页，共 20 页 - - - - - - - - - 测试了调优活动之后，即可将其部署到生产环境之中。为了评估调优结果，您可以再次使用NMON 来评估调优实现了怎样的性能改进。在这个案

展开阅读全文