Oracle诊断案例.doc

上传人:asd****56 文档编号:70328985 上传时间:2023-01-19 格式:DOC 页数:5 大小:41KB
返回 下载 相关 举报
Oracle诊断案例.doc_第1页
第1页 / 共5页
Oracle诊断案例.doc_第2页
第2页 / 共5页
点击查看更多>>
资源描述

《Oracle诊断案例.doc》由会员分享,可在线阅读,更多相关《Oracle诊断案例.doc(5页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、Oracle诊断案例:Job任务停止执行2006-11-02 22:46 出处:互联网 作者:2 【网友评论0条 发言】0点击分享 摘要: 本文通过一次Oracle Job任务异常案例诊断,分析其原因及解决过程,从内部揭示Oracle Job任务调度及内部计时机制。 问题及环境 接到研发人员报告,数据库定时任务未正常执行,导致某些操作失败。 开始介入处理该事故. 系统环境: SunOS DB 5.8 Generic_108528-21 sun4u sparc SUNW,Ultra-4 Oracle9i Enterprise Edition Release 9.2.0.3.0 - Product

2、ion 解决过程 首先介入检查数据库任务 $ sqlplus / as sysdba SQL*Plus: Release 9.2.0.3.0 - Production on Wed Nov 17 20:23:53 2004 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production With the Partitioning, OLAP and Oracle Data Mini

3、ng options JServer Release 9.2.0.3.0 - Production SQL select job,last_date,last_sec,next_date,next_sec,broken,failures from dba_jobs; JOB LAST_DATE LAST_SEC NEXT_DATE NEXT_SEC B FAILURES INTERVAL - - - - - - - - 31 16-NOV-04 01:00:02 17-NOV-04 01:00:00 N 0 trunc(sysdate+1)+1/24 27 16-NOV-04 00:00:04

4、 17-NOV-04 00:00:00 N 0 TRUNC(SYSDATE) + 1 35 16-NOV-04 01:00:02 17-NOV-04 01:00:00 N 0 trunc(sysdate+1)+1/24 29 16-NOV-04 00:00:04 17-NOV-04 00:00:00 N 0 TRUNC(SYSDATE) + 1 30 01-NOV-04 06:00:01 01-DEC-04 06:00:00 N 0 trunc(add_months(sysdate,1),MM)+6/24 65 16-NOV-04 04:00:03 17-NOV-04 04:00:00 N 0

5、 trunc(sysdate+1)+4/24 46 16-NOV-04 02:14:27 17-NOV-04 02:14:27 N 0 sysdate+1 66 16-NOV-04 03:00:02 17-NOV-04 18:14:49 N 0 trunc(sysdate+1)+3/24 8 rows selected. 发现JOB任务是都没有正常执行,最早一个应该在17-NOV-04 01:00:00执行。但是没有执行。 建立测试JOB create or replace PROCEDURE pining IS BEGIN NULL; END; / variable jobno number

6、; variable instno number; begin select instance_number into :instno from v$instance; dbms_job.submit(:jobno, pining;, trunc(sysdate+1/288,MI), trunc(SYSDATE+1/288,MI), TRUE, :instno); end; / 发现同样的,不执行。 但是通过dbms_job.run()执行没有任何问题。 进行恢复尝试 怀疑是CJQ0进程失效,首先设置JOB_QUEUE_PROCESSES为0,Oracle会杀掉CJQ0及相应job进程SQL

7、ALTER SYSTEM SET JOB_QUEUE_PROCESSES = 0; 等23分钟,重新设置 SQL ALTER SYSTEM SET JOB_QUEUE_PROCESSES = 5; 此时PMON会重起CJQ0进程 Thu Nov 18 11:59:50 2004 ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY; Thu Nov 18 12:01:30 2004 ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY; Thu Nov 18 12:01:30 2004 Res

8、tarting dead background process CJQ0 CJQ0 started with pid=8 但是Job仍然不执行,而且在再次修改的时候,CJQ0直接死掉了。 Thu Nov 18 13:52:05 2004 ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY; Thu Nov 18 14:09:30 2004 ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY; Thu Nov 18 14:10:27 2004 ALTER SYSTEM SET job_que

9、ue_processes=0 SCOPE=MEMORY; Thu Nov 18 14:10:42 2004 ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY; Thu Nov 18 14:31:07 2004 ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY; Thu Nov 18 14:40:14 2004 ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY; Thu Nov 18 14:40:28 2004 ALTER SYST

10、EM SET job_queue_processes=0 SCOPE=MEMORY; Thu Nov 18 14:40:33 2004 ALTER SYSTEM SET job_queue_processes=1 SCOPE=MEMORY; Thu Nov 18 14:40:40 2004 ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY; Thu Nov 18 15:00:42 2004 ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY; Thu Nov 18 15:01:36 20

11、04 ALTER SYSTEM SET job_queue_processes=15 SCOPE=MEMORY; 尝试重起数据库,这个必须在晚上进行:PMON started with pid=2 DBW0 started with pid=3 LGWR started with pid=4 CKPT started with pid=5 SMON started with pid=6 RECO started with pid=7 CJQ0 started with pid=8 QMN0 started with pid=9 . CJQ0正常启动,但是Job仍然不执行。 没办法了. 继续研究

12、.居然发现Oralce有这样一个bug 1. Clear description of the problem encountered: slgcsf() / slgcs() on Solaris will stop incrementing after 497 days 2 hrs 28 mins (approx) machine uptime. 2. Pertinent configuration information No special configuration other than long machine uptime. . 3. Indication of the frequ

13、ency and predictability of the problem 100% but only after 497 days. 4. Sequence of events leading to the problem If the gethrtime() OS call returns a value 42949672950000000 nanoseconds then slgcs() stays at 0xffffffff. This can cause some problems in parts of the code which rely on slgcs() to keep

14、 moving. eg: In kkjssrh() does now = slgcs(&se) and compares that to a previous timestamp. After 497 days uptime slgcs() keeps returning 0xffffffff so now - kkjlsrt will always return 0. . 5. Technical impact on the customer. Include persistent after effects. In this case DBMS JOBS stopped running a

15、fter 497 days uptime. Other symptoms could occur in various places in the code. 好么,原来是计时器溢出了,一检查我的主机: bash-2.03$ uptime 10:00pm up 500 day(s), 14:57, 1 user, load average: 1.31, 1.09, 1.08 bash-2.03$ date Fri Nov 19 22:00:14 CST 2004 刚好到事发时是497天多一点.ft. 安排重起主机系统. 这个问题够郁闷的,NND,谁曾想Oracle这都成. Oracle最后声称

16、: fix made it into 9.2.0.6 patchset 在Solaris上的9206尚未发布.晕. 好了,就当是个经历吧,如果有问题非常不可思议的话,那么大胆怀疑Oracle吧,是Bug,可能就是Bug。 重起以后问题解决,状态如下: $ sqlplus / as sysdba SQL*Plus: Release 9.2.0.3.0 - Production on Fri Nov 26 09:21:21 2004 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Connected to: Or

17、acle9i Enterprise Edition Release 9.2.0.3.0 - Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.3.0 - Production SQL select job,last_date,last_sec,next_date,next_sec from user_jobs; JOB LAST_DATE LAST_SEC NEXT_DATE NEXT_SEC - - - - - 70 26-NOV-04 09:21:04 26-NOV-04 09:26:00 SQL / JOB LAST_DATE LAST_SEC NEXT_DATE NEXT_SEC - - - - - 70 26-NOV-04 09:26:01 26-NOV-04 09:31:00 SQL SQL select * from v$timer; HSECS - 3388153 SQL select * from v$timer; HSECS - 3388319 SQL FAQ

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 应用文书 > 汇报体会

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号© 2020-2023 www.taowenge.com 淘文阁