2022年2022年计算机程序设计与解释 .pdf-淘文阁

资源描述

《2022年2022年计算机程序设计与解释 .pdf》由会员分享，可在线阅读，更多相关《2022年2022年计算机程序设计与解释 .pdf（8页珍藏版）》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、Structure and Interpretationof Computer Programs?GaneshNarayan, Gopinath KComputer Science and AutomationIndian Institute of Science nganesh,gopi csa.iisc.ernet.inSridhar VApplied ResearchGroupSatyam CAbstractCall graphs depict the static, caller-calleerelation be-tween “functions”in a program.With

2、most source/targetlanguages supporting functions asthe primitive unit of com-position, call graphs naturally form the fundamental control?ow representation availableto understand/develop soft-ware. They are also the substrate on which various inter-procedural analyses are performed and are integral

3、partof program comprehension/testing. Given their universalityand usefulness, it is imperative to ask if call graphs exhibitany intrinsic graph theoretic features across versions, pro-gram domains and source languages. This work is an at-tempt to answer thesequestions: we present and investigatea se

4、tof meaningful graph measuresthat help us understandcall graphs better;we establish how these measures cor-relate, if any, across different languages and program do-mains; we also assess the overall, language independentsoftware quality by suitably interpreting thesemeasures.1IntroductionComplexityi

5、s one of the most pertinent characteristicsof computer programs and, thanks to Moores law, com-puter programs are becoming ever larger and complex; it snot atypical for a software product to contain hundreds ofthousands, even millions of lines of code where individualcomponents interact in myriad of

6、 ways. In order to tacklesuch complexity,variety of code organizing motifs wereproposed.Of these motifs, functions form the most fun-damental unit of source code: software is organized as setof functions of varying granularity and utility, with func-tions computing various results on their arguments

7、. Criticalfeature of this organizing principle is that functions them-selves can call other functions.This naturally leads to thenotion of function call graph where individual functions arenodes, with edges representing caller-callee relations; in-?In reverence to the Wizard Book.degree depicts the

8、number of functions that could call thefunction and outdegree depicts the number of functions thatthis function can call. Since no further restriction employed,caller-callee relation induces a generic graph structure, pos-sibly with loops and cycles.In this work we study the topology of such (static

9、) callgraphs. Our presentunderstanding of call graphs is limited;we know: that call graphs aredirected and sparse;can havecycles and often do; arenot strongly connected; evolve overtime and could exhibit preferential attachment of nodes andedges. Apart from these basic understanding, we do notknow m

10、uch about the topology of call graphs.2ContributionsIn this paper we answer questions pertaining to topologi-cal properties of call graphs by studying arepresentative setof open source programs. In particular, we ask followingquestions: What is the structure of call graphs? Are thereany consistent p

11、roperties? Are some properties inherent tocertain programming languages/problem classes? In orderto answer thesequestions, we investigate set of meaningfulmetrics from plethora of graph properties 9. Our speci?ccontributions are:1) We motivate and provide insights as to why certaincall graph propert

12、ies are useful and how they help us de-velop better and robust software.2) We compare graphstructure induced by different language paradigms under aneventual but structurally immediate structure call graphs.The authors are unaware of any study that systematicallycompare the call graphs of different

13、languages; in particular,the “ call graph ” structure of functional languages. 3) Ourcorpus, being varied and large, is far more statistically rep-resentative compared to the similar studies (24, 4,18).4) We, apart from con?rming previous results in arigorousmanner, also compute new metrics to captu

14、re ?ner aspectsof graph structure.5) As a side effect, we provide a po-tential means to assesssoftware quality, independent of thesource language.Rest of the paper is organized as follows.We begin by名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 1 页，共 8 页 - - - - - -

15、 - - - justifyingthe utilityof our study and proceed to introducerelevant structural measures in section 4.Section 5 dis-cussesthe corpus and methodology.We then present ourthe measurements and interpretations (Section 6). We con-clude with section 7 and 8.3MotivationWe argue that it helps to unders

16、tand the detailed topol-ogy of call graphs: they de?ne the set of permissible inter-actions and information ?ows, and could in?uence softwareprocessesin non trivial ways. In order to give the reader anintuitive understanding as to how graph topology could in-?uence software processes, we present fol

17、lowingfour sce-narios where it does.Bug PropagationDynamicsConsider how a bug insome function affects the rest of the software. Let foocallbarand barcould return an incorrect value because of abug in bar . if foois to incorporate this return value inits part of computation, it is likely to compute w

18、rong an-swer aswell; that is, barhas infected foo . Note that suchan infectionis contagious and, in principle, barcan in-fect any arbitrary function fnaslong asfnis connected tobar . Thus connectedness asgraph property triviallytrans-lates to infectability.Indeed, with appropriate notions ofinfectio

19、n propagation and immunization, one could under-stand bugs as an epidemic process. It is well known thatgraph topology could in?uence the stationary distributionof this process. In particular, thecritical infection rate theinfection rate beyond which an infection is not containableis highly network

20、speci?c; in fact, certain networks areknown to have zero critical thresholds 5. It pays to knowif call graphs areinstances of such graphs.Software Testing: Different functions contribute differ-ently to software stability.Certain functions that, whenbuggy, are likely to render the system unusable. S

21、uch func-tions, functions whose correctness is central to statisticalcorrectness of the software, are traditionallycharacterizedby per-function attributes like indegree and size. Such sim-ple measure(s), though useful, fail to capture the transi-tive dependencies that could render evena not-so-well

22、con-nected function anAchilles heel. Having unambiguous met-rics that measure a node simportance helps making soft-ware testing more ef?cient.Centralityis such a measurethat gives a node simportance in a graph.Once relevantcentrality measures were assigned, one could expend rel-atively more time tes

23、ting central functions.Or, equally,test central functions and their called contexts for preva-lent error modes like interface nonconformity, context dis-parity and the likes (21, 7).By considering node cen-tralities, one could bias the testing effort to achieve similarcon?dence levels without a cost

24、lier uniform/random testingschedule; though most developers intuitivelyknow the im-portance of individualfunctions and devise elaborate testcasesto stressthesefunctions accordingly, we believe suchan idiosyncratic methodology could be safely replaced byan informed and statistically tenable biasing b

25、ased on cen-tralities. Centrality is alsoreadily helpful in software impactanalysis.SoftwareComprehension:Understanding call graphstructure helps us to construct tools that assist the devel-opers in comprehending software better. For instance, con-sider a tool that magically extracts higher-level st

26、ructuresfrom program call graph by grouping related, lower-levelfunctions. Such a tool, for example, when run on a kernelcode base, would automaticallydecipher different logicalsubsystems, say, networking, ?lesystem, memory manage-ment or scheduling. Devising such a tool amounts to ?nd-ing appropria

27、te similarity metric(s) that partitions the graphso that nodes withina partitionare “ more”similar com-pared to nodes outside. Understandably, different notions ofsimilarities entail different groupings. Recent studies showhow network structure controls such grouping 2 and howproperties of nodes can

28、 be effectively used to improve thedeveloper-perceived clustering validity (26, 17).InterProceduralAnalysisCall graph topology couldin?uences both precision and convergence of Inter Procedu-ral Analysis (IPA). When specializing individual proceduresin a program, procedures that havelarge indegree co

29、uld endup being less optimal:data?ow facts for these functionstend to be too conservative as they are required to be con-sistent across a large number of call sites. By speci?callycloning nodes with large indegree and by distributingtheindegrees “ appropriately”between theseclones, one couldspeciali

30、ze individualclones better. Also, number of itera-tions an iterative IPA takes compute a ?xed-point dependson max(longest path length, largest cycle) of the call graph.4Statistical Properties of InterestAs with most nascent sciences, graph topology litera-ture is strewn with notions that are overlap

31、ping, correlatedand misused gratuitously;for clarity, we restrict ourselvesto followingstructural notions.A note on usage: we em-ploy graphs and networks interchangeably;G =(V, E ),| V |= n and | E |= m; (i, j ) implies i calls j ; didenotesthe degreeof vertex i and dijdenotes the geodesic distanceb

32、etween i and j ; N (i) denotes the immediate neighboursof i; graphs are directed and simple: for every (i1, j1) and(i2, j2) present,either (i1= i2) or (j1= j2) is true.Graphs, in general, could be modeled as random, smallworld , power-law , or scale rich, each permitting differentdynamics.Random gra

33、phs:random graph model 11, is perhapsthe simplest network model:undirected edges are addedat random between a ?xed number n of vertices to create名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 2 页，共 8 页 - - - - - - - - - a network in which each of the12n(n -1) possibl

34、e edgesis independently present with some probabilityp, and thenumber of edgesconnected to each vertex the degree of thevertex is distributed according to a Poisson distribution inthe limit of large n.Small worldgraphs:exhibit high degree of cluster-ing and have mean geodesic distance, de?ned as,- 1

35、=1n (n+1)i = jd- 1ij, in the range of log n; that is, number ofvertices within adistance r of a typical central vertex growsexponentially with r 19.It should be noted that a large number of networks, in-cluding random networks, havein the range of log n or,even, log log n.In this work, we deem a net

36、work to besmall world ifgrows sub logarithmicallyand the networkexhibits high clustering.Power law networks:These arenetworks whose degreedistribution follow the discrete CDF: P X x cx- ,where c is a ?xed constant, and is the scaling exponent.When plotted as a double logarithmicplot, this CDF ap-pea

37、rs as a straight line of slope - . The sole response ofpower-law distributions to conditioning is a changein scale:for large values of x, P X x |XXi is identical tothe (unconditional)distribution PX x. This “ scale in-variance”of power-law distributions is attributed as scale-freeness. Note that thi

38、s notion of scale-freeness does notdepict the fractal-like self similarityin every scale.Graphs with similardegree distributionsdiffer widelyin other structural aspects; rest of the de?nitions introducemetrics that permit ?ner classi?cations.degree correlations:In many real-worldgraphs, theprobabili

39、tyof attachment to the target vertex depends alsoon the degree of the source vertex: many networks showassortative mixing on their degrees,that is, a preference forhigh-degree nodes to attach to other high-degree node; oth-ersshow disassortative mixing where high-degree nodes at-tach to low-degree o

40、nes. Followingmeasure, a variant ofPearson correlation coef?cient 20, gives the degreecorre-lation.=m-1ijiki- m-1i12( ji+ ki)2m-1i12(j2i+ k2i)- m-1i12(ji+ ki)2, whereji, kiarethe degreesof the vertices at the ends of ithedge,with i = 1 m. takes values in the range - 1 1,with 0 signifyingassortativit

41、y and 0 signifyingdissortativity.=0 when there is no discernible correla-tion between degreesof nodesthat share anedge.scale free metric:a useful measurecapturing the fractalnature of graphs is scale-free metric s(g) 16, de?ned as:s(g) =(i,j )Edidj, along with its normalized variantS(g) =s(g)smax; s

42、maxis the maximal s(g) and is dictated bythe type of network understudy1.s(g) is maximal when nodes with similar degree con-nect to each other 13; thus, S(g) is close to one for net-1For unrestricted graphs, smax=ni =1(di/ 2).d2i.works that are fractal like, where the connectivity, at all de-grees,

43、stays similar. On the other hand, in networks wherenodes repeatedly connect to dissimilar nodes, S(g) is closeto zero. Networks that exhibit power-law, but have have ascale free metric S(g) close to zero are called scale rich ;power-law networks whose S(g) value is close to one arecalled scale-free.

44、 Measures S(g) and are similar and arecorrelated; but they employ different normalizations and areuseful in discerning different features.clusteringcoef?cient:is a measure of how clustered,or locally structured, a graph is: it depicts how, on an aver-age,interconnected eachnode sneighbors are. Speci

45、?cally,if node v has kvimmediate neighbors, then the clusteringcoef?cient for thatnode, Cv, is the ratio of number of edgespresent between its neighbours Evto thetotal possible con-nections between vsneighbours, that is, kv(kv- 1)/ 2. Thewhole graph clustering coef?cient, C , is the averageof Cvs:th

46、at is, C =Cvv=2Evkv( kv- 1)v.clusteringpro?le:C has limiteduse when immedi-ate connectivityis sparse.In order to understand inter-connection pro?le of transitively connected neighbours, weuse clustering pro?le 1:Cdk= i | di= k Cd(i )| i|di= k |, whereCd(i) =| (j,k ); j,k N (i) |djkG( Vi)= d|(|N ( i

47、) |2)centrality:of a node is a measure of relative impor-tance of the node within the graph; central nodes are bothpoints of opportunities in that they can reach/in?uencemost nodes in the graph, and of constraints that any per-turbation in them is likely to havegreater impact in a graph.Many central

48、ity measures exist and have been successfullyused in many contexts (6, 10).Here we focus on be-tweenness centralityBu(of node u), de?ned as the ratioof number of geodesic paths that pass through the node(u) to that of the total number of geodesic paths: that is,Bu=ij(i,u,j) ( i,j ); nodes that occur

49、 on many shortest pathsbetween other vertices have higher betweenness than thosethat do not.connected components:size and number of connectedcomponents gives us the macroscopic connectivity of thegraph. In particular, number and size of strongly connectedcomponents gives us the extent of mutual recu

50、rsion presentin the software.Number of weakly connected componentgives us the upper bound on amount of runtime indirectionresolutions possible.edge reciprocity:measuresif the edges are reciprocal,that is, if (i, j ) E , is (j, i) also E ? A robust mea-sure for reciprocityis de?ned as 12: =- a1- awhe

展开阅读全文