Productivity and impact of astronomical facilities: A statistical study of publications and citations

have to choose, it is probably better to opt for a small telescope on a well-supported site than a larger one with less support, and service to the community, in the form of catalogues and mission deﬁnitions, is rewarded, at least in citation counts, if not always in other ways. A few comparisons are made with other studies. The main difference is that we have included all the papers and all the telescopes for the years chosen, rather than focussing on one or a few observatories or skimming the cream of most-cited papers or ones from the highest-proﬁle journals.


Introduction
Of course your telescope is superior in some way, thus deserving of support, or why would you be using it? And what you do with it is undoubtedly at least a bit better than average, or why would you be doing it? But how can one demonstrate these things? There is, of course, the verdict of history, but it will come too late for this year's funding cycle and your next promotion review, while personal opinions of committee members have other problems. Counting of papers that result from work at particular facilities and of the citations to them has the virtues of taking only a few years and of being roughly repeatable, though "objective" may be too strong an adjective. Of course this approach also has flaws. Papers are sometimes cited because they are wrong (though generally not for very long); groups who cite only each others' work can distort the numbers (the entire American astronomical community has been collectively accused of this); and one particular paper, not necessarily with more or more correct data than others, may be picked as a proxy vtrimble@astro.umd.edu Corresponding author: jceja@uci.edu for an entire program and cited to death, while others languish. And no, the data bases are not perfect. The NASA ADS shows a larger fraction of papers with no citations after a few years than does the Web of Science -Science Citation Index compilation. In addition, sometimes you look up a particular paper in successive years only to find it with fewer total citations at the later time, perhaps because it was initially credited with a bunch to some other paper with similar authorship and publication data. And, of course, "it was unusually cloudy on Mount Moriah the year data for 2000 papers would have been collected," or "We were installing and commissioning the new frammator that year." Nevertheless, we proceed as best we can, using the methods outlined in the next section.
The first person to apply similar methods in an astronomical context was Helmut A. Abt (1981Abt ( , 1985, using data available to him as editor-in-chief of the Astrophysical Journal. He attempted to determine whether the publicly supported telescopes at Kitt Peak and Cerro Tololo were as productive and valuable as comparable private ones at Palomar and Lick. The answer was (and remains), on the whole, yes. We plunged in (Trimble 1985) a little later, asking questions about individual astronomers' papers and ci-tations. And the present investigation doubles the data base of that published in three papers (Trimble, Zaich & Bosler 2005, which separated the 2001 papers and citations by wavelength band. There, as here, every publication we could find in the designated time frame, every telescope mentioned in the papers, and every citation that could be counted were used. Other investigators have taken very different approaches, examining fewer than 200 papers (Madrid & Macchetto 2006), only the journals with the highest impact factors (Sanchez & Benn 2004), or focussing on specific telescopes, for instance HST (Meylan, Madrid & Macchetto 2004) or CFHT and UKIRT (Crabtree & Bryson 2001). The single most interesting new number we have seen this year comes from White (2007), who reports that the mean number of references per paper published in 2006 was about 31, meaning that your above average paper will be entitled to a few more than 31 citations over its entire future lifetime, or perhaps a great many more, if the present inflation in number of references per paper continues unabated. The part of this report that we do not understand is the total of about 30 000 astrophysical papers in refereed journals for 2001 and 2002, where we found 7768 observational ones. Mostly theory? Well, that is always said to be the smaller half (and indeed is for most of the obvious, major journals in the field). Meeting abstracts and conference proceedings? The numbers would be about right, but those are not "refereed journals." Some enormous literature catalogued as something else? Perhaps. Let it be said, in any case, that if the number had been 30 000 the present project would not have been attempted.

Methods
Suppose you would like to do this sort of thing yourself. What are the steps? First, collect copies of all the papers published during your time window (calendar years for us, but there are lots of other possibilities, like first two years after launch or commissioning). Second, record enough information about authors, journal, volume, and page number to be able to find them again, plus the subject matter, and the names of all the observing facilities you can identify from which data reported or analyzed in the paper were collected. If there are no such facilities, then the paper is either a theoretical one or based on catalogues compiled from many sources and will not go into the analysis. Third look up the number of citations each paper received during some uniform time window after publication. For us, this is the next three calendar years, though shorter intervals are perhaps more popular. Notice, therefore, that this sort of analysis cannot be done in real time, since telescopes must be commissioned, data collected (often over more than one observing run or cycle), papers written, refereed, published, and cited.
Fourth and of comparable importance, a number of decisions have to be made. First, the subject matter, which we took to be whatever the authors said they were trying to do. Thus quasar/QSO spectra might have been looked at to understand the objects themselves (AGNs), the chemical evolution of metal-line absorbers (galaxies), or the statistics of Lyman-alpha forest lines (large scale structure or cosmology). The edges are fuzzy around, for instance, interstellar material, star formation, and young stellar objects. The next decision was to keep the optical, radio, and spacebased papers separate. The main justification for this is that multi-wavelength astronomy is still rather rare (Table 1), though because multi-wavelength papers tend to have citation rates a bit above average, dividing credit among wavelength bands would slightly reduce not only the number of papers per facility but also the ratio of citations per paper. The most difficult decision is how to apportion credit for papers and citations among the facilities used. Proportional to their importance in the investigations, you will say. And this is what Madrid and Macchetto (2006) have done for their fewer than 200 papers. We quickly found that the information simply is not available in a large fraction of our 7768 papers or would be very difficult to extract in others. Credit was therefore apportioned equally among all facilities mentioned by the authors as contributing to the paper, anywhere from half-and-half for two, down to one-twelfth or less for a few synoptic studies covering long periods or radio observations made with ad hoc assortments of dishes generally used separately.

Definitions of wavelength bands and facilities
Optical astronomy here includes the data collected with about 330 ground-based and infrared telescopes, from the Keck 10-meter mirrors down to amateur-owned instruments of much less than one meter diameter, and also the Hubble Space Telescope, whose lifetime has been long enough for it to have accumulated a decade of archival data by 2002. Radio astronomy includes all arrays, antennas, dishes, and other collectors operating from meter wavelengths down to the submillimeter, of which there were about 100, most of them ground based. But we also included the Cosmic Background Explorer (COBE), which observed many aspects of the 3 K microwave background for a number of years, a couple of balloon-borne CMB projects, and the Japanese satellite (VSOP) used to provide baselines longer than the diameter of the earth for very long baseline interferometry.
Space astronomy is the most heterogeneous, including satellite based observatories at gamma ray, X-ray, ultraviolet, and infrared wavelengths; the Hipparcos astrometric satellite; an assortment of shuttle, balloon, and rocket-borne telescopes and detectors; solar system missions; cosmic ray detectors; and (because we didn't quite know where else to put them) half a dozen Cerenkov light and other detectors for ultra-high energy gamma rays. Part of the rationale for this assortment is that nearly all its 90 members had relatively short lifetimes compared to ground based optical and radio telescopes, so that our snapshot is inevitably unfair to the oldest and the newest. Table 1 adopts these definitions. Clearly, optical astronomy, the oldest field by several centuries (though the oldest telescope used was probably the Mt. Wilson 60 in), is still more than half of observational astronomy. We will see that optical papers are cited a bit more frequently than radio papers, but not quite so often as space papers, as here defined. Multi-wavelength papers score a bit above average, 4.49 citations per paper per year vs. 4.19 for the whole set, and, in particular, almost never end up with zero citations. Perhaps it is just a matter of having more than one set of colleagues who might think of you.

The journals employed
The 7768 papers from 2001 and 2002 appeared in 20 journals. In order, from most papers to fewest, these are: Astronomy and Astrophysics (2127) (36), Astrophysics and Space Science (32, excluding conference proceedings), Acta Astronomica (31), Observatory (21), Astronomische Nachrichten (18, excluding conference proceedings), Revista Mexicana de Astronomia y Astrofisica (17, 2002 only), Journal of the Royal Astronomical Society of Canada (5). Clearly one could catch most of the literature by reading only the first five of these. But Nature and Science carry some of the highest-impact papers; and Icarus (solar system) and Acta Astronomica (microlensing projects and binary stars) occupy some specific niches.
We are missing some journals. Solar astronomy uses almost entirely a set of telescopes (etc.) separate from those used for nighttime research, thus Solar Physics, as well as solar papers in our 20 journals, was not scanned (though a few solar papers that used facilities primarily employed at night appear with solar system). The solar system is underrepresented because the UC Irvine library receives neither Earth, Moon, and Planets nor the solar-system parts of Journal of Geophysical Research. Astrofizica (still published in Armenia) has become almost impossible to find.
Others missing, mostly relatively small and relatively new, are New Astronomy, Astroparticle Physics, and the on-line only Proceedings of the Astronomical Society of Australia.
For a few of the papers, which journal they appeared in and that they contained some observations was the only thing that could be determined. For a few, the subject matter was obvious, but the telescope(s) used were not. Another small set used well-defined telescopes, but it was impossible to decide what they were really about. Thus the total number of papers mentioned here, 7768, is slightly larger than any of the other totals.

Results
Here is what we found and, in a few cases, what we think it might mean. Table 2 divides the sample into 20 semi-arbitrary subfields. Some are well defined with two or three words (gamma ray bursts, supernovae and their remnants). Others require a few more. That is, cosmology includes large scale structure formation and mapping as well as the standard parameters. The line between active galaxies and normal ones is a bit fuzzy. Milky Way means global studies of structure, stellar populations, evolution, and so forth. Stars are the ones not mentioned separately. Neutron stars and black holes include both singles and binaries. Solar system includes both in situ measurements from space missions (rather highly cited) and ground based ones from earth (less so), and the service category includes catalogues of more than one kind of object, descriptions of missions and facilities (some very highly cited like the first three XMM papers published, some not), calibrations of detectors, spectrographs, and so forth.

By subdiscipline
Clearly there are hot topics like cosmology and brown dwarfs and not so hot ones, mostly stellar. This is not just a function of community size -compare cosmology, supernovae, and binary stars, with the number of papers as a proxy for community size and the citations per paper as a temperature indicator. What is hot and what is not varies by wavelength, though not so much as you might suppose. GRBs have to be discovered from space, but vital follow-up includes both radio and optical observations. On the other hand, radio and space papers on exoplanets and ordinary binary stars are fairly rare, and fairly rarely cited.
There are no large year-to-year anomalies, with the following exception. Our data for the 2001 papers as compiled for this table accidentally included almost 4 years of citations rather than precisely three. The numbers as recorded appear in that column of the table, but the citations per paper per year are the sums of the citations shown, divided by the total papers shown, divided by a factor some what larger than three, to correct for this and end up with citations per paper per year. No specific subject matter (or wavelength band or telescope) is likely to benefit or suffer from this.
www.an-journal.org Of greater importance is that a number of large, expensive facilities are currently in design or construction phases. The drivers for these are definitely a subset of astronomical topics, objects and processes, laid out for the US in the Taylor-McKee report. These include cosmology, galaxy formation and evolution, star and planet formation, and perhaps a few others, but exclude for the most part active galaxies and gamma ray bursts for their own sakes (vs. as probes), normal stars past the formation stage, cataclysmic variables and other binary stars, planetary nebulae and white dwarfs, supernovae remnants (and supernovae when they do not contribute to the cosmic distance scale or to chemical evolution of galaxies), and neutron stars and black holes, singly and in binaries. These currently make up roughly half the literature of observational astronomy, but account for considerably fewer than half of the citations. Focus of funding on the driver fields will inevitably exacerbate the differences, and it is a brave graduate student who, in the future, will set out to specialize in W UMa stars, interstellar polarization, planetary nebulae, or even Seyfert galaxies.

The bottom, the top, and the middle
The minimum number of citations possible is zero, unless one had a system that could count disproofs as negative citations, and zero is not so common as is frequently supposed in phrases like "Most papers aren't read by anybody." "Phys. Rev. has more authors than readers." "One third of papers in Nature are never cited," for which documentation is never provided. In the present sample, 283 of 7724 papers (3.3%) garnered no citations in the ISI data base during the three calendar years after the one in which they were published. The variations with wavelength are at the 1% level, with a smidge more radio papers and a smidge fewer space papers uncited, and less than 2% for multiwavelength items. This is almost certainly an upper limit, because we found a few specific papers with, apparently, zero citations that came from large, productive groups, using major facilities to observe popular objects, who if nothing else would have cited themselves the next year. The Astrophysics Data Service also maintains data bases of citations and citation numbers. The percentage of papers with zero citation there is somewhat larger than in ISI, which is why we have chosen to use the latter.
No journal except Science and Nature is exempt from zero-citation papers, though almost half come from seven (none of the top six by numbers of papers published). No subject is spared, not cosmology, or brown dwarfs, or whatever. And no telescope avoids the curse -not HST, the VLT, Keck, or Cassini, except the X-ray missions during the early period when no non-proprietary data were available. but, of course, there are more zeros among the less popular subdisciplines, the less prestigious journals, and the less famous (i.e. less expensive!) telescopes.
As for the top, Table 3 lists the 77 most cited papers, taking us down to 100 citations in three years. By chance, this is just about 1% of the total papers, and they have gathered moss to the extent of 12.9% of the total citations. The largest number belongs to the Hubble Key Project value of the Hubble constant, and cosmology in general has a large presence. The service papers are descriptions and "how to use" documents for XMM and SDSS. You have to go fairly deep into the list to reach anything on the Milky Way, exoplanets, or brown dwarfs, and almost to the bottom to find ordinary stars and interstellar medium.
What about our average papers with about four citations per year? If the average of White's (2007) larger number of papers is the same, then 31 references per current paper implies a half life of only 3.7 years for the literature of the recent past. If this is correct (and we certainly do not know that it isn't), there has been a real change in community customs since Abt (1985) found a much longer one (about 15 years). The paper population whose cream is skimmed by Madrid & Macchetto (2006) must be different. Their 200 papers are, they say, 0.4% of the ADS ones with 2004 publications dates, implying a total of 50 000 papers, which received about 65 000 citations in the next 1.5 years, or 0.9 per paper per year, vs. our 4.19. This again suggests that this giant data base includes a good many meeting abstracts, conference contributions, and other rarely-cited sorts of papers.

Telescope by telescope
Tables 4, 5, and 6 contain numbers of papers (2001 + 2002) and citations (for the next three years for each), separated into the three categories defined in Sect. 2.1, and credited equally to all telescopes, space missions, or whatever said by the authors to have been data sources for the papers. One more decision had to be made at this point, besides the wavebands and equal credit items. This is which facilities to keep individual track of, because the total included about 330 optical and infrared entities, 109 radio ones, and 90 in space. The "about" reflects some uncertainty about, for instance, whether three small Greek telescopes used in one paper did or did not include the 0.3 m in Crete used for another and the exact number of 14-in Meade 'scopes deployed by the Backyard Astronomers. There are no such ambiguities for ground-based mirrors larger than 1.8 m (Catanea = Haro, a 2.1 m in Mexico; Catania is something smaller in Italy), for radio telescopes (which are such large animals that, like elephants, they hardly ever get lost), or things flown (except the one resolved by declaring MIR/Kvant/etc. to be a single facility).
The general rule was a minimum of five papers in one year to qualify for individual tracking, waived for the Japanese radio interferometric satellite, HALCA/VSOP and the Wyoming Infrared Observatory WIRO. Several relatively new facilities, the Ryle (radio) telescope in Cambridge, and the Gemini North, Magellan, and Hobby Eberly optical telescopes contributed five or more 2002 papers, so we went back and dug them out of the 2001 data as well, when there had been fewer.
In the case of ground-based optical and infrared telescopes, we deliberately kept track of all mirrors 1.85 m or larger in diameter and defined an "other" class for each site that has at least one of these plus smaller telescopes. A few of those "others" score fewer than five, though there are occasional small telescopes, like the 1 m Jacobus Kapteyn in the Canary Islands with many more than five papers to their credit which are nevertheless hidden in an "other" line. Special purpose facilities like microlens surveys, 2MASS, and the Sloan Digital Sky Survey are separated from other telescopes of similar sizes as are prototype interferometers and automated photometric telescopes. Papers for which the telescope cannot be identified at all seem to be limited to the optical band. Table 4 excludes many of these and also a number of papers counted as multiwavelength in Table 1 for which the optical data were images or positions used to align images or positions from other wavelengths.
The facilities responsible for the largest numbers of papers are the obvious ones: Hubble Space Telescope and the largest optical mirrors; the Very Large (radio) Array and, to a lesser extent some millimeter and submillimeter collectors; the newest space facilities (XMM-Newton and Chandra) or the most recently deceased in their wavebands (ISO). But notice that some missions of respectable antiquity (ROSAT, RXTE, IRAS, IUE, Hipparcos) are still of great value to the community even when the data are exclusively archival.
The largest C/P ratios, on the other hand, belong to some entities responsible for fewer papers, including the cosmic microwave balloon programs (WMAP was not yet up!) and SDSS, but also Parkes, HEGRA, Mars Global Surveyor, Lick and the Anglo-Australian Telescope. Focus on specific hot topics is responsible for some of these large numbersexoplanet searches at Lick and very large scale structure at the AAT for instance.
The two sets of numbers for 2001 and 2002 have been summed, because the differences are generally small (the answer to some objections raised by referees to the 20001 data alone) and not of obvious significance. One exception is the continuing fade of 4-m class telescopes as frontline work moves to larger ones, predicted by Benn & Sanchez (2001) based on data from the 1990s for 1-2 vs. 3-4 m mirrors and noted in Trimble et al. (2005).
The comparative time histories of Chandra and XMM surprised us slightly. The numbers for 2001 were XMM = 83.5 papers, C/P = 43.4 (including the very-highly-cited initial mission descriptions) and Chandra = 175.8 papers, C/P = 34.6 ("first photons" having appeared in the 2000 literature), while for 2002 we found XMM = 86.9 papers (a slight increase), C/P = 19.8 and Chandra = 258.8 papers (a considerable increase, though it is somewhat the older facility), C/P = 22.6. A colleague suggested a partial cause in the timing of the first public data releases, as being likely to kick the numbers of papers up and the C/P ratio down. A controlled study would be difficult to do. Data collected for XMM by its project scientist showed 82 papers in 2001 and 102 in 2002, indicating, we think, a slight increase in papers making use of more than one facility.

www.an-journal.org
Last, and perhaps least, must come the mention of some facilities that we had almost forgotten, or maybe never even heard of, which nevertheless are sources of astronomical data into the 21st century -Ariel 5 and 6 and even the Uhuru X-ray satellite; radio dishes at Maryland Point, Richmond (Florida), Plativil (North Carolina), and Woodbury (Georgia); and our favorites among the sites of small optical telescopes, Lizard Hollow, Condor Brow, Clark & Coyote, and Raccoon Run, though Hyronerio, Hlovovec, Leledovice, and Skalnate Pleso come close. All are less than 1 meter in diameter and, we suspect, rightly much loved by their users.

Conclusions
More than 20 years ago, one of us (Trimble 1985) examined papers and citations rates for a large number of individual astronomers and concluded that, if you wish to be highly cited, it pays to be a mature, prize-winning theorist, working on high energy astrophysics or cosmology at a prestigious institution. It also paid to be male.
On the other hand, if you are an observer, the present results show that cosmology is still a good bet (but so are exoplanets, brown dwarfs, gamma ray bursters, and some kinds of public-spirited investigations), and it pays to have access to the best telescope around for whatever your purpose may be. Size counts. Being over-subscribed like HST counts. But other winning strategies are focus on a specific high-profile program (exoplanet searches, large scale structure determination, gamma ray burst identification and follow-up) and access to a well-supported site. As Table 4 shows, a 2-meter or smaller at Kitt Peak, Cerro Tololo, or the Canary Islands is a better bet than a 2-meter, or even a 6-meter east of the Oder-Neisse line. The same is true for radio telescopes, though most of the modestly-supported ones have vanished into "other" by location in Table 5. There is, we think, no such thing as a modestly-supported space mission, unless you wish to assign a succession of Soviet X-ray and gamma ray detectors carried by the space station MIR to that category. No systematic tabulation by author gender was attempted here, but the single most highly cited paper from the period (Freedman et al. 2001) has a female first author. It deals with cosmology and used a highly competitive telescope, with over-subscription ratios at the time sitting around 6 or 7.
The obvious differences by subdiscipline and level of facility support might well tempt one to express an opinion on whether the current ensemble of observing facilities and their usage is in any way optimal. We will resist, but believe it is safe to predict that, if funding continues to be ever more focussed on a few subfields thought to be the most fundamental and on a few large, expensive facilities, then the inequalities of both opportunities and rewards will increase, and it will be a brave young astronomer who is willing to choose a career studying binary stars at an institution closely tied to a small wholly owned telescope.
What about the future of this sort of investigation? Continuing it with uniform methodology will gradually erase any inequalities arising from bad weather on Mount Moriah and replacing the frammator at Effelbork. As we write, all or nearly all of the 2006 citations are in the ISI data base, so that tackling papers published in 2003 and citations to them in 2004-2006 is possible. The task of the first author, going page by page through the journals, identifying facilities used and eventually slicing up credit for papers and citations is ponderous and time-consuming, but rather amusing as one goes. The task of the second author (and those mentioned in acknowledgements), looking up the numbers of citations to each paper, is perhaps less ponderous, though probably also less amusing. Any volunteers?   Table 5 Radio, millimeter, and submillimeter papers (2001 + 2002) and citations (2002-2004 + 2003-2005