The validity of peer review in a general medicine journal
- Author(s): Jackson, JL
- Srinivasan, M
- Rea, J
- Fletcher, KE
- Kravitz, RL
- et al.
Published Web Locationhttps://doi.org/10.1371/journal.pone.0022475
All the opinions in this article are those of the authors and should not be construed to reflect, in any way, those of the Department of Veterans Affairs. Background: Our study purpose was to assess the predictive validity of reviewer quality ratings and editorial decisions in a general medicine journal. Methods: Submissions to the Journal of General Internal Medicine (JGIM) between July 2004 and June 2005 were included. We abstracted JGIM peer review quality ratings, verified the publication status of all articles and calculated an impact factor for published articles (Rw) by dividing the 3-year citation rate by the average for this group of papers; an Rw>1 indicates a greater than average impact. Results: Of 507 submissions, 128 (25%) were published in JGIM, 331 rejected (128 with review) and 48 were either not resubmitted after revision was requested or were withdrawn by the author. Of 331 rejections, 243 were published elsewhere. Articles published in JGIM had a higher citation rate than those published elsewhere (Rw: 1.6 vs. 1.1, p = 0.002). Reviewer quality ratings of article quality had good internal consistency and reviewer recommendations markedly influenced publication decisions. There was no quality rating cutpoint that accurately distinguished high from low impact articles. There was a stepwise increase in Rw for articles rejected without review, rejected after review or accepted by JGIM (Rw 0.60 vs. 0.87 vs. 1.56, p<0.0005). However, there was low agreement between reviewers for quality ratings and publication recommendations. The editorial publication decision accurately discriminated high and low impact articles in 68% of submissions. We found evidence of better accuracy with a greater number of reviewers. Conclusions: The peer review process largely succeeds in selecting high impact articles and dispatching lower impact ones, but the process is far from perfect. While the inter-rater reliability between individual reviewers is low, the accuracy of sorting is improved with a greater number of reviewers.