《外文翻译--网络服务的爬虫引擎.doc》由会员分享,可在线阅读,更多相关《外文翻译--网络服务的爬虫引擎.doc(28页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、外文资料WSCE: A Crawler Engine for Large-Scale Discovery of Web Services Eyhab Al-Masri and Qusay H. Mahmoud Abstract This paper addresses issues relating to the efficient access and discovery of Web services across multiple UDDI Business Registries (UBRs). The ability to explore Web services across mul
2、tiple UBRs is becoming a challenge particularly as size and magnitude of these registries increase. As Web services proliferate, finding an appropriate Web service across one or more service registries using existing registry APIs (i.e. UDDI APIs) raises a number of concerns such as performance, eff
3、iciency, end-to-end reliability, and most importantly quality of returned results. Clients do not have to endlessly search accessible UBRs for finding appropriate Web services particularly when operating via mobile devices. Finding relevant Webservices should be time effective and highly productive.
4、 In an attempt to enhance the efficiency of searching for businesses and Web services across multiple UBRs, we propose a novel exploration engine, the Web Service Crawler Engine (WSCE). WSCE is capable of crawling multiple UBRs, and enables for the establishment of a centralized Web servicesreposito
5、ry which can be used for large-scale discovery of Web services. The paper presents experimental validation, results, and analysis of the presented ideas. 1. Introduction The continuous growth and propagation of the internet have been some of the main factors for information overload which at many in
6、stances act as deterrents for quick and easy discovery of information. Web services are internet-based, modular applications, and the automatic discovery and composition of Web services are an emerging technology of choice for building understandable applications used for business-to-business integr
7、ation and are of an immense interest to governments, businesses, as well as individuals. As Web services proliferate, the same dilemma perceived in the discovery of Web pages will become tangible and the searching for specific business applications or Web services becomes challenging and time consum
8、ing particularly as the number of UDDI Business Registries (UBRs) begins to multiply.In addition, decentralizing UBRs adds another level of complexity on how to effectively find Web services within these distributed registries. Decentralization of UBRs is becoming tangible as new operating systems,
9、applications, and APIs are already equipped with built-in functionalities and tools that enable organizations or businesses to publish their own internal UBRs for intranet and extranet use such as the Enterprise UDDI Services in Windows Server 2003, WebShpere Application Server, Systinet Business Re
10、gistry, jUDDI, to name a few. Enabling businesses or organizations to self-operate and mange their own UBRs will maximize the likelihood of having a significant increase in the number of business registries and therefore, clients will soon face the challenge of finding Web services across hundreds,
11、if not thousands of UBRs. At the heart of the Service Oriented Architecture (SOA) is a service registry which connects and mediates service providers with clients as shown in Figure 1. Service registries extend the concept of an application-centric Web by allowing clients (or conceivably application
12、s) to access a wide range of Web services that match specific search criteria in an autonomous manner. Without publishing Web services through registries, clients will not be able to locate services in an efficient manner, and service providers will have to devote extra efforts in advertising their
13、services through other channels. There are several companies that offer Web-based Web service directories such as WebServiceList 1, RemoteMethods 2, WSIndex 3, and XM 4. However, due to the fact that these Web-based service directories fail to adhere to Web services standards such as UDDI, it is lik
14、ely that they become vulnerable to being unreliable sources forfinding relevant Web services, and may become disconnected from the Web services environment as in the cases of BindingPoint and SalCentral which closed their Web-based Web service directories after many years of exposure. Apart from hav
15、ing Web-based service directories, there have been numerous efforts that attempted to improve the discovery of Web services 5,6,9,21, however, many of them have failed to address the issue of handling discovery operations across multiple UBRs. Due to the fact that UBRs are hosted on Web servers, the
16、y are dependent on network traffic and performance, and therefore, clients that are looking for appropriate Web services are susceptible to performance issues when carrying out multiple UBR search requests. To address the above-mentioned issues, this work introduces a framework that serves as the he
17、art of our Web Services Repository Builder (WSRB) architecture 7 by enhancing the discovery of Web services without having any modifications to exiting standards. In this paper, we propose the Web Service Crawler Engine (WSCE) which actively crawls accessible UBRs and collects business and Web servi
18、ce information. Our architecture enables businesses and organizations to maintain autonomous control over their UBRs while allowing clients to perform search queries adapted to large-scale discovery of Web services. Our solution has been tested and results present high performance rates when compare
19、d with other existing models. The remainder of this paper is organized as follows. Section two discusses related work. Section three discusses some of the limitations with existing UBRs. Section four discusses the motivations for WSCE. Section five presents our Web service crawler engines architectu
20、re. Experiments and results are discussed in Section six, and finally conclusion and future work are discussed in Section seven. 2. Related Work Discovery of Web services is a fundamental area of research in ubiquitous computing. Many researchers have focused on discovering Web services through a ce
21、ntralized UDDI registry 8,9,10. Although centralized registries can provide effective methods for the discovery of Web services, they suffer from problems associated with having centralized systems such as single point of failure, and bottlenecks. In addition, other issues relating to the scalabilit
22、y of data replication, providing notifications to all subscribers when performing any system upgrades, and handling versioning of services from the same provider have driven researchers to find other alternatives. Other approaches focused on having multiple public/private registries grouped into reg
23、istry federations 6,12 such as METEOR-S for enhancing the discovery process. METEOR-S provides a discovery mechanism for publishing Web services over federated registries but this solution does not provide the means for articulating advanced search techniques which are essential for locating appropr
24、iate business applications. In addition, having federated registry environments can potentially provide inconsistent policies to be employed which will have a significant impact on the practicability of conducting inquiries across them. Furthermore, federated registry environments will have increase
25、d configuration overhead, additional processing time, and poor performance in terms of execution time when performing service discovery operations. A desirable solution would be a Web services crawler engine such as WSCE that can facilitate the aggregation of Web service references, resources, and d
26、escription documents, and can provide clients with a standard, universal access point for discovering Web services distributed across multiple registries. Several approaches focused on applying traditional Information Retrieval (IR) techniques or using keyword-based matching 13,14 which primarily de
27、pend on analyzing the frequency of terms. Other attempts focused on schema matching 15,16 which try to understand the meanings of the schemas and suggest any trends or patterns. Other approaches studied the use of supervised classification and unsupervised clustering of Web services 17, artificial n
28、eural networks 18, or using unsupervised matching at the operation level 19. Other approaches focused on the peer-to-peer framework architecture for service discovery and ranking 20, providing a conceptual model based on Web service reputation 21, and providing keyword-based search engine for queryi
29、ng Web services 22. However, many of these approaches provide a very limited set of search methods (i.e. search by business name, business location, etc.) and attempt to apply traditional IR techniques that may not be suitable for services discovery since Web services often contain or provide very b
30、rief textual description of what they offer. In addition, the Web services structure is complex and only a small portion of text is often provided. WSCE enhances the process of discovering Web services by providing advanced search capabilities for locating proper business applications across one or
31、more UDDI registries and any other searchable repositories. In addition, WSCE allows for high performance and reliable discovery mechanism while current approaches are mainly dependent on external resources which in turn can significantly impact the ability to provide accurate and meaningful results
32、. Furthermore, current techniques do not take into consideration the ability to predict, detect, recover from failures at the Web service host, or keep track of any dynamic updates or service changes. 3. UDDI Business Registries (UBRs) Business registries provide the foundation for the cataloging an
33、d classification of Web services and other additional components. A UDDI Business Registry (UBR) serves as a service directory for the publishing of technical information about Web services 23. The UDDI is an initiative originally backed up by several technology companies including Microsoft, IBM, a
34、nd Ariba 24 and aims at providing a focal point where all businesses, including their Web services meet together in an open and platform-independent framework. Hundreds of other companies have endorsed the UDDI initiative including HP, Intel, Fujitsu, BEA, Oracle, SAP, Nortel Networks, WebMethods, A
35、ndersen Consulting, Sun Microsystems, to name a few. E-Business XML (ebXML) is another service registry standard that focuses more on the collaboration between businesses 27. Although commonalities between UDDI and ebXML registries present opportunities for interoperability between them 26, the UDDI
36、 remains the de facto industry standard for Web service discovery 21. Although the UDDI provides ways for locating businesses and how to interface with them electronically, it is limited to a single search criterion. Keyword-based search techniques offered by UDDI will make it impractical to assume
37、that it can be very useful for Web services discovery or composition. In addition, a client does not have to endlessly search UBRs for finding an appropriate Web service. As Web services proliferate and the number of UBRs increases, limited search capabilities are likely to yield less meaningful sea
38、rch results which makes the task of performing search queries across one or multiple UBRs very time consuming, and less productive. 3.1. Limitations with Current UDDI Apart from the problems regarding limited search capabilities offered by UDDI, there are other major limitations and shortcomings wit
39、h the existing UDDI standard. Some of these limitations include: (1) UDDI was intended to be used only for Web services discovery; (2) UDDI registration is voluntary, and therefore, it risks becoming passive; (3) UDDI does not provide any guarantees to the validity and quality of information it cont
40、ains; (4) the disconnection between UDDI and the current Web; (5) UDDI is incapable of providing Quality of Service (QoS) measurements for registered Web services, which can provide helpful information to clients when choosing appropriate Web services, (6) UDDI does not clearly define how service pr
41、oviders can advertise pricing models; and (7) UDDI does not maintain nor provide any Web service life-cycle information (i.e. Web services across stages). Other limitations with the current UDDI standard 23 are shown in Table 1. Although the UDDI has been the de facto industry standard for Web servi
42、ces discovery, the ability to find a scalable solution for handling significant amounts of data from multiple UBRs at a large-scale is becoming a critical issue. Furthermore, the search time when searching one or multiple UDDI registries (i.e. meta-discovery) raises several concerns in terms perform
43、ance, efficiency, reliability and the quality of returned results. 4. Motivations for WSCE Web services are syntactically described using the Web Service Description Language (WSDL) which concentrates on describing Web services at the functional level. A more elaborate business-centric model for Web
44、 services is provided by the UDDI which allows businesses to create many-to-many partnership relationships and serves as a focal point where all businesses of all sizes can meet together in an open and a global framework. Although there have been numerous standards that support the description and d
45、iscovery of Web services, combining these sources of information in a simple manner for clients to apprehend and use is not currently present. In order for clients to search or invoke services, first they have to manually perform search queries to an existing UBR based on a primitive keyword-based t
46、echnique, loop through returned results, extract binding information (i.e. through bindingTemplates or via WSDL access points), and manually examine their technical details. In this case, clients have to manually collect Web service information from different types of resources which may not be a re
47、liable approach for collecting information about Web services. What is therefore desirable is a Web services crawler engine such as WSCE that facilitates the aggregation of Web service references, resources, and description documents and provides a well defined access pattern of usages on how to dis
48、cover Web services. WSCE facilitates the establishment of a Web services search engine in which service providers will have enough visibility for their services, and at the same time clients will have the appropriate tools for performing advanced search queries. The crucial design of WSCE is motivat
49、ed by several factors including: (1) the inability to periodically keep track of business and Web service life-cycle using existing UDDI design, which can provide extremely helpful information serving as the basis for documenting Web services across stages; (2) the inherent search criterion offered by UDDI inquiry API which would not be beneficial for finding services of interest; (3) the apparent disconnection between U