《常见网络故障诊断解析课件.ppt》由会员分享,可在线阅读,更多相关《常见网络故障诊断解析课件.ppt(62页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、1设备维护和故障诊断设备维护和故障诊断设备维护和故障诊断设备维护和故障诊断2路由器设备路由器设备路由器设备路由器设备维护和故障诊断维护和故障诊断维护和故障诊断维护和故障诊断3 ShowInterface4缓存和队列缓存和队列(什么是什么是Ignore?)当没有缓存可以被用来储存当没有缓存可以被用来储存输入的帧时,输入的帧时,Ignore就会增就会增加加队列极限值可以被超过以便队列极限值可以被超过以便处理突发的流量处理突发的流量InterfaceBusInterfaceCardInterfaceCard123x123yEthernetTokenRing1 PacketBuffersPacketB
2、uffersGlobalGlobalPoolPoolSystemBuffersSystemBuffersCPUT1T2TyTy+8Ex+2Ex+1ExE21,2,3x1,2,3x5缓存和队列缓存和队列(InputDrops)当没有足够的系统缓存可当没有足够的系统缓存可用时用时Inputdrop值就会增值就会增加加系统缓存被用作处理所有系统缓存被用作处理所有经处理交换和路由器自身经处理交换和路由器自身生成的数据包生成的数据包InterfaceBus InterfaceBuffersInterfaceBuffersSystemBuffersSystemBuffersCPUNOVACANCYXEth
3、ernetFrame1 Ethernet1,2,3x6缓存和队列缓存和队列(OutputDrops)当没有足够的接口缓存可用当没有足够的接口缓存可用作处理输出帧时,作处理输出帧时,Outputdrop值便会增加值便会增加InterfaceBus InterfaceBuffersInterfaceBuffersSystemBuffersSystemBuffersCPUTokenRingXNewFrameNOVACANCY7缓存和队列缓存和队列(OutputDrops)当超出输出接口的队列极限当超出输出接口的队列极限值时,值时,Outputdrop值会增值会增加加InterfaceBusSTOPS
4、TOPSTOP InterfaceBuffersInterfaceBuffersSystemBuffersSystemBuffersCPUNewFrame TokenRing18接口缓存接口缓存HoStageHoStageshowcontrollershowcontrollercbuscbuscBuscBus0,controllertype6.0,microcodeversion10.00,controllertype6.0,microcodeversion10.0 512Kbytesofmainmemory,128Kbytescachememory512Kbytesofmainmemory,
5、128Kbytescachememory1341520bytebuffers,654496bytebuffers1341520bytebuffers,654496bytebuffers Restarts:0linedown,0hungoutput,0controllererrorRestarts:0linedown,0hungoutput,0controllererrorMEC0,controllertype5.1,microcodeversion10.0MEC0,controllertype5.1,microcodeversion10.0 Interface0-Ethernet0,stati
6、onaddress0000.0c06.4ae0(Interface0-Ethernet0,stationaddress0000.0c06.4ae0(biabia0000.0c06.4ae0)0000.0c06.4ae0)11bufferRXqueue11bufferRXqueuethreshold,threshold,18bufferTXqueue18bufferTXqueue limit,buffersize1520limit,buffersize1520iftift0000,0000,rqlrql11,11,tqtq0000000000000000,tqltql1818Transmitte
7、rdelayis0microsecondsTransmitterdelayis0microsecondsCTR1,controllertype9.0,microcodeversion10.1CTR1,controllertype9.0,microcodeversion10.1 Interface8-TokenRing0,stationaddress0000.3060.3219(Interface8-TokenRing0,stationaddress0000.3060.3219(biabia0000.3060.3219)0000.3060.3219)13bufferRXqueue13buffer
8、RXqueue threshold,threshold,31bufferTXqueue31bufferTXqueue limit,buffersize4496limit,buffersize4496iftift0005,0005,rqlrql1313,tqtq0000000000000000,tqltql3131Transmitterdelayis0microsecondsTransmitterdelayis0microsecondsFDDI-T3,controllertype7.2,microcodeversion10.1FDDI-T3,controllertype7.2,microcode
9、version10.1 Interface24-Fddi0,stationaddress0000.0c06.36d7(Interface24-Fddi0,stationaddress0000.0c06.36d7(biabia0000.0c06.36d7)0000.0c06.36d7)13bufferRXqueue13bufferRXqueue threshold,threshold,32bufferTXqueue32bufferTXqueue limit,buffersize4496limit,buffersize4496iftift0006,0006,rqlrql99,tqtq0000000
10、000000000,tqltql32329系统缓存系统缓存HoStageHoStage#showbuffers#showbuffersBufferelements:Bufferelements:500infreelist(500maxallowed)500infreelist(500maxallowed)51640224hits,0misses,0created51640224hits,0misses,0createdSmallbuffers,104bytes(total121,permanent121):Smallbuffers,104bytes(total121,permanent121)
11、:119infreelist(20min,250maxallowed)119infreelist(20min,250maxallowed)19229201hits,0misses,0trims,0created19229201hits,0misses,0trims,0createdMiddlebuffers,600bytes(total90,permanent90):Middlebuffers,600bytes(total90,permanent90):89infreelist(10min,200maxallowed)89infreelist(10min,200maxallowed)20513
12、359hits,91misses,115trims,115created20513359hits,91misses,115trims,115createdBigbuffers,1524bytes(total90,permanent90):Bigbuffers,1524bytes(total90,permanent90):90infreelist(5min,300maxallowed)90infreelist(5min,300maxallowed)7160285hits,0misses,0trims,0created7160285hits,0misses,0trims,0createdLarge
13、buffers,5024bytes(total5,permanent5):Largebuffers,5024bytes(total5,permanent5):5infreelist(0min,30maxallowed)5infreelist(0min,30maxallowed)233295hits,0misses,0trims,0created233295hits,0misses,0trims,0createdHugebuffers,18024bytes(total0,permanent0):Hugebuffers,18024bytes(total0,permanent0):0infreeli
14、st(0min,4maxallowed)0infreelist(0min,4maxallowed)0hits,0misses,0trims,0created0hits,0misses,0trims,0created10命令命令ShowprocesscpuCPU utilization for five seconds:X%/Y%;one minute:Z%;five minutes:W%PID Runtime(ms)Invoked uSecs 5Sec 1Min 5Min TTY Process 下面的表格解释了命令输出的各详细参数意思下面的表格解释了命令输出的各详细参数意思:FieldDes
15、criptionXAveragetotalutilizationduringlastfivesecondsYAverageutilizationduetointerrupts,duringlastfivesecondsX-YThisrepresentspercentageoftrafficbeingprocessswitchedZAveragetotalutilizationduringlastminute*WAveragetotalutilizationduringlastfiveminutes*PIDProcessIDRuntimeCPUtimetheprocesshasused(inmi
16、lliseconds)InvokedNumberoftimesaprocesshasbeencalleduSecsMicrosecondsofCPUtimeforeachinvocation5SecCPUutilizationbytaskinthelast5seconds1MinCPUutilizationbytaskinthelastminute*5MinCPUutilizationbytaskinthelast5minutes*TTYTerminalthatcontrolstheprocessProcessNameofprocess11高利用率分析高利用率分析router-5#showpr
17、ocesscpu CPUutilizationforfiveseconds:83%/21%;oneminute:79%;fiveminutes:84%PID Runtime(ms)Invoked uSecs 5Sec 1Min 5Min TTY Process 1 104 3707 28 0.00%0.00%0.00%0 Load Meter 2 10208 15222 670 0.00%0.01%0.00%0 OSPF Hello 3 34620 579 59792 0.00%0.20%0.17%0 Check heaps总共总共83%的利用率,其中的利用率,其中21%是中断是中断83-21
18、=62%实时运行的进程实时运行的进程12中断引起的高中断引起的高CPU利用率利用率CPU中断的主要原因是数据流量的快速交换中断的主要原因是数据流量的快速交换路由器上配置了语音端口 路由器上有活跃的异步传输(ATM)端口 路由器上配置了不正确的交换路径 CPU 在处理内存修正 路由器高负荷运转 检查端口show命令的输出 IOS软件有bug 13进程引起的高进程引起的高CPU利用率利用率如果一个进程占用了大量的如果一个进程占用了大量的CPU资源,检查资源,检查log信息。进信息。进程上的不寻常活动会引起程上的不寻常活动会引起log中的错误信息。下面这些进程中的错误信息。下面这些进程会引起会引起C
19、PU的高利用率的高利用率:IP Input HyBridge Input IP Simple Network Management Protocol(SNMP)Virtual EXEC TCP Timer VTEMPLATE Backgr Other Processes14Showmemory Head Total(b)Used(b)Free(b)Lowest(b)Largest(b)Processor 815D3828 15440232 6758292 8681940 2827144 3901312 I/O 2400000 12582912 1625120 10957792 10858580
20、 10871708 Processor memory Address Bytes Prev Next Ref PrevF NextF Alloc PC what815D8674 0000001500 815D3828 815D8C7C 001 -801E14DC List Elements815D8C7C 0000005000 815D8674 815DA030 001 -801E1518 List Headers815DA030 0000000044 815D8C7C 815DA088 001 -80D6FCCC *Init*815DA088 0000001500 815DA030 815D
21、A690 001 -801EBD04 messages主内存主内存主内存主内存包交换内存包交换内存包交换内存包交换内存每个分配的内存块每个分配的内存块每个分配的内存块每个分配的内存块15内存问题内存问题 MALLOCFailures(内存分配失败)(内存分配失败)MemoryLeaks(内存漏洞)(内存漏洞)Fragmentation(碎片)(碎片)AlignmentErrors(修正错误)(修正错误)SpuriousAccesses(虚假存取)(虚假存取)MemoryCorruption(内存崩溃)(内存崩溃)ProcessorMemoryParityErrors(处理器内存校验(处理器内存
22、校验错误)错误)16Showprocessmemory RoutershowprocessesmemoryTotal:3149760,Used:2334300,Free:815460PIDTTYAllocatedFreedHoldingGetbufsRetbufsProcess002265481252180437600*Initialization*00320542228832000*Scheduler*0056636922173356018561000*Dead*10264264378400LoadMeter22570053721312400VirtualExec3000678400Chec
23、kheaps40960688000PoolManagerAllocated=路由器启动后分配给进程的总字节数路由器启动后分配给进程的总字节数Freed=进程释放的总字节数进程释放的总字节数Holding=进程拥有的总字节数。这是进程拥有的实际字节数,是故障诊断中进程拥有的总字节数。这是进程拥有的实际字节数,是故障诊断中最重要的判断依据。这个数值不一定等于最重要的判断依据。这个数值不一定等于Allocated减去减去Freed,因为有些进,因为有些进程在分配了一块内存后,会被另外的进程返回到空闲池中。程在分配了一块内存后,会被另外的进程返回到空闲池中。17交换机设备交换机设备交换机设备交换机设备
24、维护和故障诊断维护和故障诊断维护和故障诊断维护和故障诊断18自动协商总结自动协商总结Peer1配置配置Peer2配置配置Peer1结果结果Peer2结果结果注释注释Auto Auto100 FD100 FDCorrect nego when both peer are capable of 100 FD100 FDAuto100 FD100 HDDUPLEX MISMATCH100 FD100 FD100 FD100 FDCorrect manual config100 HD Auto100 HD100 HDLink is established,but peer 2 does not see
25、 any auto-negotiation information from NIC and defaults to half-duplex.10 HDAuto10 HD10 HDLink is established,but peer 2 will not see FLP and will default to 10 Mbps half-duplex.10 FD100 FDNo linkNo linkSPEED MISMATCH19检查双工不匹配检查双工不匹配CDP将会在第一次连接时告警将会在第一次连接时告警全双工意味着碰撞监测机制不启用,一个全双工的设备将全双工意味着碰撞监测机制不启用,一
26、个全双工的设备将会不检查传输介质是否空闲就直接发送数据帧会不检查传输介质是否空闲就直接发送数据帧双工不匹配的症状双工不匹配的症状FCSerrors(seenonFDside)Alignerrors(seenonFDside)Runts(seenonFDside)Excessivecollision(seenonHDside)Latecollision(seenonHDside)20总结总结:10/100M自动协商自动协商尽量使用尽量使用:Auto对对auto固定的固定的speed/duplex对固定的对固定的speed/duplex避免避免:Auto对固定的对固定的speed/duplex21
27、千兆以太协商问题千兆以太协商问题有些设备不支持千兆协商或部分支持有些设备不支持千兆协商或部分支持如果问题引起如果问题引起linkup,disable千兆连接的协商千兆连接的协商协商需要同时在两端协商需要同时在两端enabled或者或者disabled22千兆以太协商问题千兆以太协商问题TwoswitchesAandBareconnectedViaGigEther23CatOS如何监控如何监控Shport:显示端口状态和一些错误计数器(显示端口状态和一些错误计数器(counters)Shmac:显示端口下显示端口下Rx和和TX的流量数据的流量数据Shcountersmod/port:显示所有的计
28、数器显示所有的计数器Shtop:在在30秒内数据流量最大的秒内数据流量最大的10个端口个端口快速诊断广播风暴的源和环路端口的方法快速诊断广播风暴的源和环路端口的方法24XL交换机如何监控交换机如何监控Shinterface:显示端口信息,类似于显示端口信息,类似于IOS里的里的shint命令命令Shcontrollerethernetfast|gigx/x:显示更多的端口计数器数值显示更多的端口计数器数值25常见的端口问题常见的端口问题坏的连接线坏的连接线(失效失效,错误的类型错误的类型):导致连接无法建立导致连接无法建立,FCS,runt,align,.(GigEModeconditioni
29、ngcablerequiredforLX/LH-GBICwithMMCabledist300m)(ZX-GBICisforextendeddistances,minimumdistancewith8dbattenuator10kmmin,withoutattenuator40kmmin)双工不匹配双工不匹配:通常通常CDP协议会报告双工不匹配,但不会自协议会报告双工不匹配,但不会自动修复动修复CatOS下的端口状态未激活下的端口状态未激活(状态状态LED灯为桔黄色灯为桔黄色):通常通常由于端口分配到一个不存在的由于端口分配到一个不存在的(VTP的问题见后的问题见后)26常见的端口问题常见的端口
30、问题:err-disable端口状态为端口状态为err-disable(仅对于仅对于CatOS)的原因是的原因是:某个端口上的大量错误某个端口上的大量错误EtherChannel配置错误配置错误BDPU端口告警端口告警其他其他r如果端口进入如果端口进入err-disable状态状态:设置端口选项设置端口选项errportenable这步操作防止所有端口进入这步操作防止所有端口进入err-disable状态,在需要时使用状态,在需要时使用最好先找到最好先找到err原因原因!27常见的端口问题常见的端口问题:err-disableerr-disable的端口需要手工的的端口需要手工的re-enab
31、led可以设置可以设置err-disable的端口在的端口在x秒后重新秒后重新enable:Taras(enable)set errdisable-timeout Usage:set errdisable-timeout set errdisable-timeout interval (reason=bpdu-guard,channel-misconfig,duplex-mismatch,udld,other,all interval=30.86400 seconds)28五种五种Trunking模式模式off表示表示Trunk不会建立不会建立auto(默认配置默认配置)表示会响应表示会响应T
32、runk协商但不会主动发协商但不会主动发起协商起协商(它不会主动它不会主动trunk,但如果对端要求但如果对端要求Trunk,也会参与协商,也会参与协商)desirable表示连接将会协商并主动建立表示连接将会协商并主动建立Trunk29Trunkingon表示建立表示建立Trunk并发送并发送DTP数据包数据包(需要设置封装需要设置封装方式方式ISL或者或者dot1q)Nonegotiate表示建立表示建立Trunk但不发送但不发送DTP数据包数据包nonegotiate应该在应该在Trunk不稳定不稳定(trunk/non-trunk)的情况下暂时使用,的情况下暂时使用,或者是设备不支持或
33、者是设备不支持DTP协议比如协议比如路由器或者路由器或者XL系列的交换机。系列的交换机。30Trunking建议在核心和边界分别配置建议在核心和边界分别配置desirable对对auto,或者是,或者是desirable对对desirable如果连接如果连接必须必须是是Trunk,配置,配置on对对on31Trunking总结总结Trunk或连接上的问题或连接上的问题:确认端口连接在确认端口连接在innontrunk模式模式确认至少一端是确认至少一端是desirable模式,或者两端都是模式,或者两端都是on模式,然后检查两端的模式,然后检查两端的封装方式是否一致封装方式是否一致做一次做一次s
34、howmac来检查来检查in-discards数值不再增长数值不再增长检查检查VTP域名域名为为TAC捕捉以下信息:捕捉以下信息:Shtrunk(orshintx/xswitchport)Shspantx/x(orshspanningintx/x)Shconfig32FEC/PAgPFEC也有也有on,off,auto,anddesirable几种几种配置状态,但含义上仅有微小的区别配置状态,但含义上仅有微小的区别on表示端口会进入表示端口会进入channel但但不会不会运行运行PAgP(PortAggregationProtocol)33FECauto表示端口会响应协商表示端口会响应协商(他
35、们监听他们监听PAgP)但不会主动形成但不会主动形成channeldesirable表示端口会收发表示端口会收发PAgP并主动建立并主动建立channel34FEC警告警告:autotoon不会不会工作工作警告警告:ontoon会工作但不会运行会工作但不会运行PAgP推荐推荐desirabletodesirable,如果两端都支,如果两端都支持持PAgPon用于连接一个不支持用于连接一个不支持PAgP的设备的设备(连接路连接路由器或者一个由器或者一个XL系列的交换机系列的交换机)35FEC如果错误的配置了如果错误的配置了FEC,或者,或者FEC监测到监测到SpanningTree环路环路,它会
36、将端口设成它会将端口设成ERR-DISABLE状态状态同一个同一个channel中的所有端口都必须有相同的配置参数中的所有端口都必须有相同的配置参数:同样的同样的speed,duplex,trunkingstatus,DTPconfig,vlanallowed,.36以太网以太网Channel总结和要点总结和要点Shportcapa告诉告诉channel容量容量检查两端的工作模式检查两端的工作模式(比如,最好是比如,最好是desirabletodesirable或者或者ontoon的配置的配置,不要不要autotoauto或其他混和方式或其他混和方式)如果如果channel起不来,尝试以下处理
37、方法起不来,尝试以下处理方法停掉所有端口的停掉所有端口的trunking确认确认speed,duplex,nativevlan匹配,并放入匹配,并放入channel一旦一旦channel起来起来,在第一个端口配置在第一个端口配置trunk,然后,然后trunk的配置会被拷贝的配置会被拷贝至其余的所有端口至其余的所有端口37SpanningTreeTroubleshooting38WhatCausesLoops?1)Configuration problems Spantree disabled Spantree enabled on some switches but not on other
38、s Speed/duplex mismatches Portfast enabled on ports connected to hubs or switches Router,multiport NIC,configured for bridging Using different spantree protocols within the same VLAN Misconfigured or buggy trunk-or channel-capable NIC Loops with hubs or switches Port channeling misconfiguration39Wha
39、tCausesLoops?2)Design issues Too large of a switched network Bridging over the WAN(delay problems)40WhatCausesLoops?3)Software issues Software bugs Forwarding traffic across blocked ports UplinkFast/BackboneFast Etc.Loss of management communication to line cards41WhatCausesLoops?4)Hardware Issues La
40、yer one links that are bad(i.e.CRCs,other input errors)Unidirectional links Data corruption(BPDUs dropped)Port Stuck(BPDUs dropped)NMP stops listening to spanning-tree(stuck inband)Loss of management communication to line cards42DetectingSpanningTreeLoops1)Network is EXTREMELY slow for all nodes2)Ne
41、twork outage3)High system utilization on switch System Utilization in“show system”above 20%usually indicates a loop Above 7%indicates possible transitory loop Depends on network traffic and hardware(Cat5000 Sup1 vs.Cat6000 Sup2,etc.)4)System LED indicators on Switch Utilization Bar 5)High Amount of
42、In-lost and Out-lost on“show mac”6)HSRP,OSPF,etc report duplicate IP address7)Unicast flooding43DetectingSpanningTreeLoopsCheck spantree blocked and root ports for errors using“show port”,“show mac”&“show counters”Set up a syslog server and turn on logging for the“spantree”facility to 6,which will s
43、how port transitions through the spantree states(listening,learning,etc.)Use“show inband”to check for“RsrcErrors”(BPDU could be dropped if supervisor is unable to process the BPDU)Check to see if you are exceeding spanning tree instances“show spantree summary”44DuringanEventRemove redundant Ethernet
44、 segments from the networkStart with connections between core switchesBegin with EtherChannels,if usedWait for 30-60 seconds for the network to recover before removing another linkIf the network does not recover,continue methodically removing redundancy until the network stabilizesAvoid rebooting or
45、 powering off switchesIf you do this youll lose the logging buffer&spantree stats on the switchSyslog to a server cannot necessarily be trusted during a network failure45FindingtheSmokingGunUse“show system”to find switches with high backplane utilizationUse“show mac”and look for large amounts of bro
46、adcast/multicast received&transmittedUse“show spantree statistics”to follow the problem through the networkOn the root,check the“topology change initiator”to see which bridge last generated a TCNLook for“msg age expiry count”on blocked ports to see whether we expired a BPDU on the port(MaxAge was re
47、ached)Look for“tcn bpdus xmitted”to see whether a bridge sent many TCNsLook for“forward trans count”to see how many times the port transitioned into the forwarding state46PreparingfortheNextTimeTake proactive measures(perform these tasks prior to having another event)Turn spantree logging level on t
48、he switches to 6(“set logging level spantree 6 default”)to see state transitions&TCNs(also,log to a server)On switches running IOS,use“debug spanning events”Enter“clear counters”on all switches47FindingtheRootVerify the location of the rootThe customer might have failed to deterministically set the
49、rootThe root might have moved due to a new bridge in the network,or a bridge priority changeesc-cat6500-a(enable)show spantree 5VLAN 5Spanning tree enabled Spanning tree type ieeeDesignated Root 00-d0-06-26-f4-04Designated Root Priority 8192Designated Root Cost 3Designated Root Port 2/1-2(agPort 13/
50、33)Root Max Age 20 sec Hello Time 2 sec Forward Delay 15 secBridge ID MAC ADDR 00-d0-bb-01-30-04Bridge ID Priority 32768Bridge Max Age 20 sec Hello Time 2 sec Forward Delay 15 secPort Vlan Port-State Cost Priority Portfast Channel_id-2/1-2 5 forwarding 3 32 disabled 801 15/1 5 forwarding 4 32 enable