Bibliometrics - School of Communication and Information
Short Description
Download Bibliometrics - School of Communication and Information...
Description
BIBLIOMETRICS
Tefko Saracevic Rutgers University http://www.scils.rutgers.edu/~tefko
© Tefko Saracevic
11
What is? “… all studies which seek to quantify processes of written communication.” Pritchard
“… the quantitative treatment of the propertiesd of recorded discourse and behavior pertaining to it.” Fairthorne
Recorded communication - ‘literature’-> quantitative methods © Tefko Saracevic
2
Alan Pritchard 1969 Coined the term "bibliometrics" "the application of mathematics and statistical methods to books and other media of communication“ Journal of Documentation (1969) 25(4):348-349
© Tefko Saracevic
3
and other related metrics … Also used to study broader than books, articles … Scientometrics covering science in general, not just publications
Infometrics all information objects
Webmetrics or cybermetrics web connections, manifestations using bibliometric techniques to study the relationship or properties of different sites on the web © Tefko Saracevic
4
Concepts Basic (primitive) concepts: 1. Subject 2. Recorded communication -> document, information object 3. Subject literature Bibliometrics related to: science of science sociology of science - numerical methods © Tefko Saracevic
5
Literature studies Qualitative often in humanities, librarianship
Quantitative bibliometrics
Mixed
© Tefko Saracevic
6
Reasons for quantitative studies of literature Analysis of structure and dynamics search for regularities - predictions possible
Understanding of patterns “order out of documentary chaos” verification of models, assumptions
Rationale for policies & design
© Tefko Saracevic
7
Why quantitative studies? Qualitative methods often depend on assertions. ‘authoritative’ statements, anecdotal evidence Science searches for regularities Success of statistical methods in social sciences Need for justification & basis for decisions Something can be counted - irresistible © Tefko Saracevic
8
Application in ... History of science Sociology of science Science policy; resource allocation Library selection, weeding, policies Information organization Information management utilization © Tefko Saracevic
9
Historical note Bibliometrics long precedes information science But found intellectual home in information science study of a basic phenomenon - literature
It is not ‘hot’ lately, but still produces very interesting results Branched out into web studies (web is a “literature” as well)
© Tefko Saracevic
10
What studied? Governed by data available in documents or information resources in general - that what can be counted author(s) origin organization, country, language
source journal, publisher, patent … © Tefko Saracevic
11
what … more contents text, parts of text, subject, classes
representation citations to a document, in a document, co-citation
utilization circulation, various uses
links any other quantifiable attribute © Tefko Saracevic
12
Tools Science Citation Index Compilation of variables from journals in a subject Use data Publication counts from indexes, or other data bases Web structures, links © Tefko Saracevic
13
Variable: authors number in a subject, field, institution, country growth correlation with indicators like GNP, energy etc. productivity e.g. Lotka’s law collaboration - co-authorship, associated networks dynamics - productive life, transcience, epidemics papers/author in a subject mapping © Tefko Saracevic
14
Variable: origin Rates of production, size, growth by country, institution, language, subject
Comparison between these Correlation with economic & other indicators
© Tefko Saracevic
15
Variable: sources Concentration most often on journals Growth, dynamics, numbers information explosion - exponential laws time movements, life cycles
Scatter - quantity/yield distribution Bradford’s law
Various distributions by subject, language, country © Tefko Saracevic
16
Variable: contents Analysis of texts distribution of words – Zipf’s law words, phrases in various parts subject analysis, classification co-word analysis
© Tefko Saracevic
17
Variable: representation frequency of use of index terms, classes distribution laws - key terms where? thesaurus structure
© Tefko Saracevic
18
Variable: citations Studied a lot; many pragmatic results base for citation indexes, web of science, impact factors, co-citation studies etc
Derived: number of references in articles number of citations to articles research front; citation classics
bibliographic coup[ling © Tefko Saracevic
19
citations … more co-citations author connections, subject structure, networks, maps
centrality of authors, papers
validation with qualitative methods impact
© Tefko Saracevic
20
Variable: utilization frequency distribution of requests for sources, titles e.g. 20/80 law
relevance judgement distributions circulation patterns use patterns
© Tefko Saracevic
21
Variable: links Development of link-based metrics in-links, out-links
Web structure Web page depth; update PageRank vs quality
© Tefko Saracevic
22
Examples from classic studies Comparative publications over centuries Number of journals founded over time Number of abstracts published over time National share of abstracts in chemistry National scientific size vs. economy size Bibliographic coupling and co-citation Web structures, links
© Tefko Saracevic
23
Examples of laws & methods Lotka’s law Bradford’s law Zipf’s law Impact factor Citation structures Co-citation structures
© Tefko Saracevic
24
Alfred J. Lotka 1926 Statistics—the frequency distribution of
scientific productivity Purpose: to "determine, if possible, the part which men of different calibre contribute to the progress of science“ Looked at Chemical Abstracts Index, then
Geschichtstafeln der Physik J. Washington Acad. Sci. 16:317-325
© Tefko Saracevic
25
Lotka’s law: xn • y = C The total number of authors y in a given subject, each producing x publications, is inversely proportional to some exponential function n of x. Where: x y
= =
n
=
C
=
number of publications no. of authors credited with x publications constant (equals 2 for scientific subjects) constant
inverse square law of scientific productivity © Tefko Saracevic
26
No. of authors
Lotka's Law - scientific publications
1 publ.
© Tefko Saracevic
2 publ.
3 publ.
xn • y = C
4 publ.
27
Samuel Clement Bradford 1934, 1948 Distribution of quantity vs yield of sources of information on specific subjects
he studied journals as sources, but applicable to other what journals produce how many articles in a subject and how are they distributed? or How are articles in a subject scattered across journals?
Purpose: to develop a method for identification of the most productive journals in a subject & deal with what he called “documentary chaos” First published in: Engineering (1934) 137:85-86, then in his book Documentation, (1948) © Tefko Saracevic
28
Bradford’s law "If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as a : n : n2 : n3 …" © Tefko Saracevic
29
Bradford's Law of Scattering – an idealized example No. of No. of articles per source source journals 60 1 3 2 35 30 1 25 2 9 2 9 8 4 6 10 5 7 27 5 4 3 5 © Tefko Saracevic
Total no. of articles 60 130 70 30 50 18 130 32 60 35 130 20 15 30
Bradford's Law of Scattering – zones
nucleus
3 sources 130 articles 9 sources 130 articles 27 sources 130 articles © Tefko Saracevic
Garfield hypothesis 31
George Kingsley Zipf 1935, 1949 The psycho-biology of language: an introduction
to dynamic philology (1935) Human behavior and the principle of least effort: An introduction to human ecology (1949) Looked, among others, at frequency distributions of words in given texts counted distribution in James Joyces’ Ulysses
Provided an explanation as to why the found distributions happen: Principle of least effort © Tefko Saracevic
32
Zipf’s law: r • f = c Where: r = f =
rank (in terms of frequency) frequency (no. of times the given word is used in the text) c = constant for the given text For a given text the rank of a word multiplied by the frequency is a constant Works well for high frequency words, not so well for low – thus a number of modifications © Tefko Saracevic
33
Charles F. Gosnell 1944 Obsolescence He studied obsolescence of books in academic libraries via their use • College Res. Libr. (1994) 5:115-125
But this was extended to study of articles via citations, and other sources Age of citations in articles in a subject: half life – half of the citations are x year old etc different subjects have very different half-lives
© Tefko Saracevic
34
Curve of obsolescence
Age at time of use © Tefko Saracevic
35
Eugene Garfield 1955 Focused on scientific & scholarly communication based on citations
• Science (1995) 122:108-111
Founded Institute for Scientific Information (ISI) major proeduct now ISI Web of Knowledge
Impact factor for journals, based on how much is a journal cited Mapping of a literature in a subject Citation indexes/web of knowledge MAJOR resources in bibliometric studies © Tefko Saracevic
36
Citation matrix
citing article
cited article cited article
cited article © Tefko Saracevic
article
citing article
citing article
citing article citing article citing article citing article 37
Science Citation Index Association-of-ideas index citing article
cited article cited article
cited article © Tefko Saracevic
article
citing article
citing article
citing article citing article citing article citing article 38
Co-citation analysis Articles that cite the same article are likely to both be of interest to the reader of the cited article
citing article
article citing article
© Tefko Saracevic
These two articles are likely to be related 39
Impact factor (IF) number of citations received in current year by papers published in the journal in the previous two years divided by number of papers published in the journal in the previous two years IF has become over time a crucial indicator of journal quality and
given ISI a monopoly position in the evaluation of journal quality
Reported in Journal Citation Reports (1976-) © Tefko Saracevic
40
Garfield’s HistCite “Bibiliographic Analysis and Visualization Software” Provides citation statistics & graphs for people, journals, institutions … various citations scores, no. of cited references in articles … various graphs with connections
Example: articles and authors for JASIST (and predecessor names) for 1956-2004 includes citations to authors © Tefko Saracevic
41
Conclusion Bibliometrics, & related scientometrics, infometrics, webmetrics provide insight into a number of properties of information objects some general, predictive “laws” formulated structures have been exposed, graphed myriad data collected & analyzed
A good area for research! © Tefko Saracevic
42
Sources used in making this presentation– among others Ruth Palmquist Bibliometrics Donna Bair-Mundy Boolean, bibliometrics, and beyond Short set of bibliometric exercises by J. Downie
http://people.lis.uiuc.edu/~jdownie/biblio/
© Tefko Saracevic
43
View more...
Comments