Scientific output in the year of COVID by #ChristosPetrou

Editor’s Note: Today’s post is by Christos Petrou, founder and Chief Analyst at Scholarly Intelligence. Christos is a former analyst of the Web of Science Group at Clarivate Analytics and the Open Access portfolio at Springer Nature. A geneticist by training, he previously worked in agriculture and as a consultant for A.T. Kearney, and he holds an MBA from INSEAD.

As 2020 nears its end, a counterintuitive picture is emerging for scientific output. Rather than suffering a COVID-driven slowdown, 2020 delivered extraordinary growth for journal content. To put it simply, journals are expected to grow by about 500k papers from 2019 to 2020, as much as they grew overall in the previous six years. Will a 2020 boom yield a 2021 bust?

Early reports of growth

I was inclined to look into the market’s performance after coming across reports of major publishers that claimed unprecedented volumes of journal submissions and published articles. For example, Springer Nature reported that articles grew by 11% in the first half of 2020. Elsevier reported 25% submissions growth for subscription journals in the first nine months of the year. And earlier in summer, Wiley reported 13% submissions growth for the fiscal year 2020, which half-overlaps with the calendar year 2020. While these publishers frequently beat the growth numbers seen for the market overall, they are not typically beating the growth numbers for submissions and articles by this much.

Assessing market performance with Dimensions

I used the free version of Dimensions to assess the market’s growth this year, and produced two forecasts in order to account for (a) the lag between publication and indexing on Dimensions and (b) the slowdown of publishing operations toward the end of December.

The conservative forecast (‘low’) assumes that Dimensions is fully up to date and publishing operations cease after the 20th of December. It implies that the observed volume of papers refers to 317 days and there are another 38 days of publishing left.

The aggressive forecast (‘high’) assumes that Dimensions is missing publications from the last two weeks and that publishing continues until the end of the year. It implies that the observed volume of papers refers to 303 days and there are another 63 days of publishing left. The end result should be somewhere between the two estimates.

All analysis is based on the ERA 2018 journal list, which was devised for national research evaluations in Australia, includes about 25,000 journals, and has a high overlap with the selective indexes of Clarivate’s Web of Science. It is not a perfect representation of the market, but it includes the vast majority of the content that matters the most, about 2.5m papers in 2019 per Dimensions. For the purposes of this analysis, the ERA 2018 journal list stands for the market.

COVID-fueled growth

The market (ERA 2018) broadly did very well in 2020, growing between 17% to 26%, which compares with a modest CAGR (Compound Annual Growth Rate) of 3% in the period 2013 to 2019. This is a remarkable result, in line with the reports of major publishers, and it shows that COVID truly fueled scientific output in 2020.

Contrary to previous years when fully Open Access (OA) journals outperformed hybrid and subscription journals (10% CAGR vs 2% CAGR for ERA 2018 journals), they performed similarly in 2020 and are expected to grow by up to 26%. This must have been welcome news both for fully-OA and subscription/hybrid publishers, although the latter may struggle to gain from the additional content given the budget crunch that many institutions are experiencing as a result of COVID.

Growth is also remarkable for selective journals, as the fairly steady Nature Index is expected to beat historical performance, growing by up to 15% in 2020 and exceeding 100k papers for the first time in recent years.

chart showing growth of journals
Figure 1. Past and expected growth by group of journals (based on Dimensions data)
chart of paper volume
Figure 2. Paper volume (k) in 2019 and 2020 by group of journals (based on Dimensions data)

Further analysis shows that all research areas (apart from Arts & Humanities) achieved high growth, ranging from 15% for Life Sciences in the ‘low’ scenario to 31% for Technology in the ‘high’ scenario. As in previous years, the growth leader is Technology, and this year Biomedicine is claiming the second spot.

The notable exception is the area of Arts & Humanities, which appears to have been negatively affected by COVID. Upon further inspection, four of the five underlying areas (Studies in Creative Arts and Writing; Language, Communication and Culture; History and Archaeology; Philosophy and Religious Studies) may shrink, and only Law and Legal Studies is expected to grow. While I have no reason to question these results, there is a possibility that slow indexing or that publishing late in the year, somehow specifically affecting Arts & Humanities, generate misleading results.

chart of expected growth
Figure 3. Past and expected growth by research area of ERA 2018 journals (fields of research are assigned by Dimensions at the paper level per the ANZSRC classification; codes 01-05 are shown here as Physical Sciences, 06-07 as Life Sciences, 08-10 as Technology, 11 as Biomedicine, 12-17 as Social Sciences, and 18-22 as Arts & Humanities)

There have been about 90k COVID papers (papers that mention the word COVID in their text) in ERA 2018 journals so far in 2020. As expected, Biomedicine has benefited more than other areas from COVID-related papers, as 7% of its papers mention COVID. Nonetheless, COVID-related papers account for a fraction of this year’s growth for Biomedicine and the other research areas. In fact, Technology may achieve up to 31% growth, with COVID papers accounting for just about 1% of all papers.

The free version of Dimensions does not provide a regional breakdown, but there is indication that most regions have beaten expectations. This is based on an analysis of papers with popular surnames, which I call the Papadopoulos Index, the most common surname in my country of origin, Greece. Growth of papers that include the most popular surname of a country are broadly in line with the growth trajectory of the country itself.

What is the growth mechanism?

I have come across a few explanations for this year’s phenomenal growth. The obvious explanation, that growth has been driven by papers related to COVID, only explains a small fraction of the additional papers.

Another suggestion is that authors have been resubmitting old, rejected papers. While that may be partly true, the high growth of the very selective Nature Index implies that novel, impactful papers account for part of the growth.

Perhaps, while labs were shut and experimentation was put to a halt, researchers brought forward writing that was planned for a later date, akin to a ‘loan from the future’ that may lead to a dip in scientific output in 2021, or they rushed to publish unfinished research. The latter explanation came up in a Twitter exchange with Scholarly Kitchen Chef Lisa Janicke Hinchliffe. It would be similar to ‘salami slicing’, the practice of splitting up findings into multiple separate papers rather than concentrating them into one stronger publication.

Implications for publishers

While the mechanism of this year’s growth is unclear, it may have implications for the scientific output of 2021. If, for example, growth has been a loan from the future, 2021 is likely to be worse than 2020 and possibly in line with 2019. If the growth of 2020 has been driven by ‘salami slicing’, then 2021 might look like a ‘normal’ year for scientific output. Yet ‘normal’ does not mean using as a baseline the results of 2020 and adding some growth to it; instead, it means using 2019 as a baseline and adding two years of ‘normal’ growth to it.

It gets more complex. Contrary to previous years, this year’s growth was equally strong for fully-OA content as it was for subscription and hybrid content. As a result, it may be that the return to ‘normality’ will be more abrupt for subscription and hybrid than for fully-OA. A back-of-the-envelope calculation implies that a ‘normal’ 2021 output for fully-OA journals will be equal to the expected 2020 output. On the contrary, the ‘normal’ 2021 output for subscription and hybrid journals will be 14% lower than the expected 2020 output. Factor in also that in 2021, OA requirements for funded authors, such as those from Plan S, will go into effect.

Suffice to say that planning teams in publishing houses have a rather complex exercise on their hands. The counterintuitive, strong growth of 2020 can give a false sense of security, and lead to overly optimistic forecasts for 2021. Yet, the market may be running on fumes and face some turbulence in the next months before a return to normality. Publishers will need to be conservative in their planning, while also maintaining flexibility to address high paper volumes for as long as the strong performance continues.

