《信息组织方法论ppt课件.ppt》由会员分享,可在线阅读,更多相关《信息组织方法论ppt课件.ppt(38页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、第二讲 信息组织方法论刘炜第五章 主题法主题法主题法直接以表示文献主题的语词作标识,提供字顺检索途径,并主要采用参照系统结石词间关系的标引和检索文献的方法。其中的语词可以是自然语言,也可以是受控语言。(p.114)主题法的特点主题法的特点揭示文献的内容属性以自然语言为基础(不用标记制度)直接以局部(语词)指代文献以特性检索为主,族性检索通过词间的关联达到以明确性的检索为主,模糊检索通过技术的手段达到主题法的类型主题法的类型标题法(Subject Heading)用规范化的自然语言语词作标题,直接表达文献主题概念,按照标题字顺排列,并用参照系统(已借鉴叙词表的参照关系:用代属分参)显示标题之间关
2、系。最早产生(1876年克特出版字典式目录规则)标题含义明确、易于使用标题列举、先组式,概念表达能力有限,难以多向成族,检索途径少,标识的通用性差(由于采用自然语言?)单元词法Uniterm又称元词法,以取自自然语言、经过规范化处理的语词作标识,通过子面组配表达主题。最基本的、概念上不可再分解的词,表示文献主题克服标题不足后组式,易轮排(每个词)索引,灵活20世纪50年代用于穿孔卡系统(最早的机检系统)字面分拆,语义准确性差词间缺乏联系,缺乏参照系统主题法的类型主题法的类型主题法的类型主题法的类型叙词法,又称主题词法Thesaurus以规范化的自然语言语词作为文献主题的标识,通过叙词的概念组配
3、表达主题概念。20世纪50年代末提出。吸取了元词法、标题法以及分面组配分类法的优点而发展起来能够多向成族、多检索途径、多因素组配、灵活扩检/缩检/改变检索范围灵活性、专指度、可扩展性俱佳主题法的类型主题法的类型关键词法(又称自由词)直接以文献中能够表达主题概念的关键词作为标识。来自于计算机自动抽词关键词:出现在文献的题名、文摘、正文中,能够表达文献主题,具有检索意义的语词。自然语词,不做规范化处理(不受控)无词间关系,但编制禁用词表现在计算机也能结合一定的受控功能,以提高检索的准确性叙词法的综合性叙词法的综合性与元词法和标题法一样采用自然语言,但规范控制严格,保证标识与概念的唯一对应;适当采用
4、标题法的预先组配,而不是元词法尽可能细分,以表达组合、专有概念,减少组配误差;借鉴分面组配分类法,采用概念组配而不是字面组配;完善了标题法的参照系统;采用体系分类法的叙词分类索引和等级索引(词族索引),甚至直接引入体系分类表或分面分类表,实现分类主题一体化;采用关键词法的轮排方法,编制叙词轮排索引主题法的功能主题法的功能对信息内容进行标引的功能(指代用以检索);对主题内容及其相关信息予以集中或揭示相关性的功能;对大量信息加以系统化或组织化的功能;便于将标引用语与检索用语进行相符性比较的功能叙词表叙词表叙词表是词汇控制(vocabulary control)的工具之一。叙词表是索引用语及检索词汇
5、的authority list。叙词表是由已知的概念查得代表该概念的适当用语。concept term叙词表透过标准化词汇的选用,使同一概念产生类聚(grouping)的作用。汉语主题词表汉语主题词表1975 年,中国情报所、北京图书馆、国防科工委情报所、电子科技情报所、六二八所、机械科技情报所等单位组建全国1048个单位、7519人参加的汉语主题词 表研究编辑工作,是全国汉学信息处理系统工程(简称748工程)的组成部分,目的是建立全国统一的联机情报检索网络。该主题词表是中国第一部大型综合性 检索工具书,全书包括主表、附表、词族索引、范畴表、英汉对照索引等、共分3卷10个分册。其收词范围之广、
6、编辑技术之先进、结构体系之严谨,当时是国内 外一流的。该成果获1985年国家科技进步二等奖。叙词表详解叙词表详解(略)(略)IntroductionThesaurus 的原义为:Treasury,Collection通常用于同义字字典。“A book of words and their synonyms”(Merriam-Websters Dictionary)“A book of words that are put in groups together according to connections between their meanings rather than in an al
7、phabetical list.”(Longman Dictionary of Contemporary English)e.g.,Rogets Thesaurus of English Words and Phrases1957AD H.P.Luhn最早以Thesaurus代表主題索引用语词典(简称叙词表),并以之为词汇控制的工具。(一說Brownson于1957正式使用叙词表一詞)叙词表的简要历史叙词表的简要历史1959年杜邦公司的工程信息中心开发了首个实用的序词表1960 the Armed Services Technical Information Agency(ASTIA)prod
8、uced the Thesaurus of ASTIA Descriptors1961 the American Institute of Chemical Engineers(AIChE)published the Chemical Engineering Thesaurus1964 the Engineers Joint Council(EJC)published the Thesaurus of Engineering Terms1967 Thesaurus of Engineering and Scientific Terms(TEST)Brief History(cont.)1967
9、 the Committee on Scientific and Technical Information(COSATI)published the first set of guidelines for thesaurus construction1970 Unesco Guidelines for the Establishment and Development f Monolingual Scientific and Technical Thesaurus1974 ANSI(American National Standards Institute)Z39.19 a US natio
10、nal standard for thesaurus construction1974 the first international standard for thesaurus construction ISO 2788 采用叙词表的目的采用叙词表的目的采用叙词表能够促进文本标引和后控情报存储和检索系统的一致性,并能够利用规范词进行搜索“Its purposes are to promoted consistency in the indexing of documents,predominantly for postcoordinated information storage and
11、retrieval systems,and to facilitate searching by linking entry terms with descriptors”(ANSI Z39.19-1993,p.38)采用叙词的四个主要目的Four principal purposes are served by a thesaurus:a)翻译Translation.把作者、标引者和用户使用的自然语言翻译成受控规范词汇To provide a means for translating the natural language of authors,indexers,and users in
12、to a controlled vocabulary used for indexing and retrieval.b)一致性Consistency.促进标引用词的一致性To promote consistency in the assignment of index terms.c)建立关系Indication of Relationships.建立语词之间的语义联系To indicate semantic relationships among terms.d)检索Retrieval 充当文献检索的辅助工具To serve as a searching aid in retrieval
13、of documents.(ANSI Z39.19-1993,p.1)词汇控制词汇控制信息控制和使用词根的需求来自于对自然语言两个缺点的克服:多词一意Synonyms different terms representing the same concept一词多义Polysemes a word with multiple meanings in spoken language,polysemes are homonyms;in written language,they are homographs terms with the same spelling representing dif
14、ferent concepts.Only the latter is relevant to thesauri.词汇控制(续)词汇控制(续)叙词表的词汇控制通过三种方法达成:Vocabulary control in a thesaurus is achieved through three principal means:a)范围、含义说明the delineation of the scope,or meaning,of descriptors Scope Note(SN)范围说明b)同义词和近义词通过“等价关系”联系起来the linking of synonymous and near
15、ly(quasi)synonymous terms through equivalence relationship USE and UFc)通过修饰揭示同形异义情况the disambiguation of homographs Qualifier(Source:ANSI Z39.19-1993,p.1)结构和关系结构和关系叙词表必须能够揭示其所含叙词之间的结构关系An intrinsic feature of a thesaurus is its ability to distinguish and display the structural relationships between
16、the terms it contains.叙词表内两类广义的关系There are two broad types of relationships within a thesaurus:微观层次个体词之间的语义联系Micro Level the semantic links between individual terms宏观层次词和词间关系与主题领域的整体结构相关Macro level how the terms and their inter-relationships relate to the overall structure of the subject field(Sourc
17、e:J.Aitchison,A.Gilchrist,&D.Bawden.Thesaurus Construction and Use:A Practical Manual.3rd ed.London:Aslib,1997.P.47)基本叙词关系基本叙词关系三类词间关系:等价等价:the relationship between preferred and non-preferred terms where two or more terms are regarded,for indexing purposes,as referring to the same concept 层次层次:this
18、 relationship shows levels of superordination and subordination.The superordinate term represents a class or whole,and the subordinate terms refer to its members or parts相关相关:the relationship is found between terms which are closely related conceptually but not hierarchically and are not members of
19、an equivalence set.(本頁及以下關于各種relationship的敘述,主要參考:Aitchison,Gilchrist,&Bawden,1997,Section F)等价关系等价关系Descriptors Preferred termsLead-in terms(Entry terms)Non-preferred termsLead-in termUSE DESCRIPTORDESCRIPTORUF Lead-in termExample:耗子 USE 老鼠(preferred term)老鼠 UF 耗子(non-preferred term)等价关系(续)等价关系(续)同
20、义词同义词Synonyms terms are virtually interchangeable or regarded as the samePopular names and scientific namesCommon nouns or scientific names,and trade namesStandard names and slang Terms originating from different cultures sharing a common language(e.g.,pavements/sidewalks)Competing names for emergin
21、g concepts(e.g.,metadata之各種中譯名)Current or favored term versus outdated or deprecated term(e.g.dishwashers/washing-up machines)等价关系(续)等价关系(续)异形变体词异形变体词Lexical variants different word forms for the same expressing,such as spelling,grammatical variation,irregular plurals,direct versus indirect order,an
22、d abbreviated formatsVariant spellings e.g.,moslems/muslims;mouse/mice;colour/colorDirect and indirect form e.g,academic library vs.library,academic Abbreviations and full namese.g.,ALA vs.American Library Association等价关系(续)等价关系(续)近义词近义词Quasi-synonyms,or near-synonyms terms whose meanings are genera
23、lly regarded as different in ordinary usage,but they are treated as though they are synonyms for indexing purposes.Terms having a significant overlape.g.,urban areas/cities gifted people/geniusesAntonyms or terms representing different viewpoints of the same property continuume.g.,dryness/wetness eq
24、uality/inequality等价关系(续)等价关系(续)靠上位词靠上位词Upward posting(generic posting)This is a technique which treats narrower terms as if they are equivalent to,rather than a species of,their broader terms.The effect is to reduce the size of the vocabulary.SOCIAL CLASSUF Elite Middle class Working class EliteUSE
25、SOCIAL CLASS层次关系层次关系互逆关系The relationship is reciprocal and is set out in a thesaurus using the following conventions:BT(Broader Term)NT(Narrower Term)e.g.,Public LibrariesBT LibrariesLibrariesNTAcademic LibrariesChildrens LibrariesPublic Libraries 层次关系(续)层次关系(续)一般一般/特殊关系特殊关系Generic/species relations
26、hip identifies the link between a class or category and its members or species(e.g.,Bird/Robin)整体整体/部分关系部分关系Whole/part relationshipSystems and organs of the body(e.g.,消化系統/胃)Geographical location(e.g.,Taipei/Ta-an District)Discipline or field of study(e.g.,Chemistry/Organic chemistry)Hierarchical so
27、cial structure(e.g.,army and its rank system)层次关系(续)层次关系(续)实例关系实例关系Instance relationship a general category of things and events,expressed by a common noun,and an individual instance of that category,the instance then forming a class of one which is represented by a proper name(e.g.,SEAS/Pacific Oce
28、an)多对一层次关系多对一层次关系Polyhierarchical relationships the relationship between the term and its two or more superordinate terms is said to be polyhierarchical.NURSES HEALTH ADMINISTRATORS NT Nurse Administrators NT Nurse Administrators NURSES ADMINISTRATORS BT Health administrators Nurses相关关系相关关系关系是互逆的,由R
29、T标示The relation is reciprocal,and is distinguished by the abbreviation“RT”(Related Terms)e.g.,TEACING RT Teaching aidsTEACHING AIDS RT Teaching相关关系(续)相关关系(续)有两种相关关系:同类相关Terms belonging to the same category(e.g.,motorcycle/bicycle)异类相关Terms belonging to different categoriesWhole-part(e.g.,buildings/d
30、oors)A discipline and the objects studied(e.g.,ethnography/primitive societies)An operation or process and the agent or instrument(e.g.,motor racing/racing cars)An occupation and the person in that occupation(e.g.,accountancy/accountants)An action and the product of the action(e.g.,publishing/music
31、scores)相关关系(续)相关关系(续)Terms belonging to different categories(cont.)An action and its patient(e.g.,data analysis/data)Concepts related to their properties(e.g.,women/femininity)Concepts linked by causal dependence(e.g.,injury/accidents)A thing or action and its counter-agent(e.g.,pests/pesticides)A r
32、aw material and its product(e.g.,皮革/皮衣)An action and a property associated with it(e.g.,precision measurement/accuracy)A concept and its opposite(e.g.,single people/married people)叙词表款目示例叙词表款目示例COMPETENCY BASED EDUCATION Mar.1980CIJE:884 RIE:2881 GC:330SNEducational system that emphasizes the specif
33、ication,learning,and demonstrating of those competencies(knowledge,skills,behaviors)that are of central importance to a given task,activity,or career.UFConsequence Based EducationCriterion Referenced EducationOutput Oriented EducationNTCompetency Based Teacher EducationBTEducationRTAcademic Standard
34、sAccountabilityBack to BasicsIndividualized Instruction 显示显示字顺Alphabetical分类Classified层次Hierarchical关键词索引Permuted Keyword Index图象表示Graphical序词表的设计序词表的设计 两步法两步法Is a thesaurus necessary?If it is,which of the followings would be a better or more suitable approach?BuyingCompilingAdaptingA very useful We
35、b site to find information about thesaurus construction and use prepared by Willpower Information http:/叙词表的设计叙词表的设计 信息系统因素信息系统因素Subject fieldType of literature/dataQuantity of literature/dataLanguage considerationsSystem usersQuestions,searchers,profilesResources available(Source:Aitchison,Gilchris
36、t,&Bawden,1997,Section B)如何开发叙词表如何开发叙词表自顶向下方法自顶向下方法Convene a group of subject experts to decide on the scope and broad categories of terms to be included.Use existing dictionaries and thesauri to decide on the terms and their relationships.Review and organize the preliminary term set:decide on prefe
37、rred terms and make Use references from the variants and synonyms;and build hierarchical and associative relationships among the preferred terms.Produce a draft thesaurus,test index and revise.(source:http:/)Develop a group of subject experts to serve as advisors;work with them to determine the scop
38、e if it is not already set.If there is a set of representative already-indexed documents,use the index terms from this set as your preliminary term list.If not,index a set of representative documents using free language(i.e.,no vocabulary control),and take this term set as your preliminary list.Buil
39、d your thesaurus by reviewing and organizing these terms,using a variety of resources as aids,as in the top-down method.Refer to your subject experts on terms whose meaning or usage is unclear,and for advice on which variant or synonym to prefer(or on whether two terms really are synonyms in the fie
40、ld).Produce a draft thesaurus,test index,and revise.(Source:http:/)如何开发叙词表如何开发叙词表自底向上法自底向上法建立叙词表的步骤建立叙词表的步骤Collecting termsModifying and inventing termsChoosing preferred terms and standardizing the form of wordsEstablishing semantic relationshipsThesaurus arrangement and displayTesting and revisingThesaurus maintenanceThe American Society of Indexers provides a list of thesaurus management software-