Popularity of statistical softwares in epidemiology

Bob Muenchen has a series of articles on the Popularity of Data Science Software. He found that SPSS is the most used software, followed by R, SAS, Stata, GraphPad Prism, and MATLAB, by looking at scholarly articles in Google Scholar. He presents his methodology here. While he’s showing popularity (or market share) of several softwares for data science, statistical analysis, machine learning, artificial intelligence, predictive analytics, business analytics, and business intelligence, I was always wondering what the results would be like for my specific field, epidemiology. So let’s try to use the same approach, but for softwares more likely to be used in epidemiology, and trying to focus on epidemiology papers.

Results!

Number of epidemiology scholarly articles in 2016, according to statistical softwares.

Figure 1: Number of epidemiology scholarly articles in 2016, according to statistical softwares.

So like Bob found in an overall search, SPSS is dominant as well in my search within the epidemiology keyword. But the podium is different. While he found R and SAS to complete the podium, with a chasing group made of Stata, GraphPad Prism, and MATLAB, I have SPSS followed by a close group made of SAS, Stata, R, and GraphPad Prism in that order.

And what about over the years for the first six softwares found above?

Number of epidemiology scholarly articles by year, according to statistical softwares.

Figure 2: Number of epidemiology scholarly articles by year, according to statistical softwares.

While Bob found SPSS to be in sharp decline since 2009, its decline in my dataset started in 2013. SAS and Stata had a gentle increase in use but seems to plateau. Interestingly he also found a peak usage of SAS in 2010 and of GraphPad Prism in 2013, then a decline. If we assume a lag of 4 years between “epidemiology” papers and general papers, the plateau we see with SAS might be the beginning of a decline, an it does not look good for GraphPad Prism in the coming years.

What if we remove SPSS from the graph?

Number of epidemiology scholarly articles by year, after removing SPSS

Figure 3: Number of epidemiology scholarly articles by year, after removing SPSS

Like in the paper by Bob, R is pulling away from other softwares and will probably catch SAS and Stata in a couple of years. But this time he’s not alone: GraphPad Prism looks like it receives quite a good chunk of users. I’m surprised and not. I knew it was popular in medical research, but not to this point. It has very good graphics, but it is somewhat limited regarding analytics. It is surprising for epidemiology area of research, as epidemiologists often use advanced analytic methods (or at least, they like to do that 😉). On the other hand, epidemiology is a very vast field, and there are all kind of research requiring different analytic approaches. Ease of use and not requiring any programming skills might be high on the list of epidemiologists for choosing a stat software.

Of note, we can see the slow rise of python. Epidemiologic methods are slowly being integrated into python and I guess this rise will continue for the coming years.

And if we look into specific epidemiology journals?

That was a very broad search, but we can restrict the search to some specific journals, based on Jane Biosemantic Search (and my opinion), instead of using the keyword “epidemiology”:

Journal (“human epidemiology”) Veterinary epidemiology journal
Epidemiology Veterinary Record
American Journal of Epidemiology Preventive Veterinary Medicine
Transboundary and Emerging Diseases Transboundary and Emerging Diseases
Annals of Epidemiology Zoonoses and Public Health
Journal of Clinical Epidemiology BMC Veterinary Research
Emerging Infectious Diseases Vector Borne and Zoonotic Diseases
American Journal of Preventive Medicine Acta Veterinaria Scandinavica
American Journal of Public Health
European Journal of Epidemiology
Number of scholarly articles by year, within "human" epidemiology journals.

Figure 4: Number of scholarly articles by year, within “human” epidemiology journals.

SAS and Stata are first, but are both declining. SPSS maintains its popularity over the years, but might be on a slippery side now. SUDAAN appears within these specific journals, and R is slowly increasing its share of papers. GraphPad Prism is nowhere. And surprisingly, python and MATLAB are not there either.

What about in animal health?

Number of scholarly articles by year, within veterinary epidemiology journals.

Figure 5: Number of scholarly articles by year, within veterinary epidemiology journals.

Wow! that’s different! SPSS is first, with no real sign of decline. There’s also more diversity in the softwares used. Even if it’s a minority, you can find papers using Statistica, Systat, or Minitab.

R is second in popularity, and has taken over SAS since 2013. While SAS somehow maintain its popularity over the years, probably due to its wide use in animal science, Stata is loosing the popularity it gains up to 2014. GraphPad Prism is gaining in popularity, but not as much as we’ve seen before. (Note: special care was taken to avoid including papers discussing an actual reptile instead of the python programming language in veterinary papers…)

Can we see by journals?

Number of scholarly articles by year and journal, "human" epidemiology, for R, Stata, SAS, SPSS, and SUDAAN.

Figure 6: Number of scholarly articles by year and journal, “human” epidemiology, for R, Stata, SAS, SPSS, and SUDAAN.

American Journal of Epidemiology and American Journal of Public Health distort somewhat the illustration, as they produced a lot of papers, and mainly with SAS and Stata. Let’s remove them from the graph:

Number of scholarly articles by year and journal, "human" epidemiology, for R, Stata, SAS, SPSS, and SUDAAN (take 2).

Figure 7: Number of scholarly articles by year and journal, “human” epidemiology, for R, Stata, SAS, SPSS, and SUDAAN (take 2).

According to the journal, and therefore the specific field of research, one or the other software between SAS, Stata, SPSS, and R is preferred.

Number of scholarly articles by year and journal, veterinary epidemiology, for R, Stata, SAS, SPSS, and GraphPad Prism.

Figure 8: Number of scholarly articles by year and journal, veterinary epidemiology, for R, Stata, SAS, SPSS, and GraphPad Prism.

GraphPad Prism and SPSS are popular mainly in more “general” journals like BMC Veterinary Research and the Veterinary Record, while a very specific epidemiology journal like Preventive Veterinary Medicine see a dominance of R since 2013.

And what about Bayesian softwares?

Software Search term
WinBUGS (“WinBUGS” “a Bayesian modelling framework”)
OpenBUGS OpenBUGS -WinBUGS
JAGS “JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling” OR “Just An Other Gibbs Sampler”
Stan “Stan: A probabilistic programming language” OR “Stan Development Team” OR “The Stan Math Library”
Epidemiology scholarly articles using Bayesian softwares.

Figure 9: Epidemiology scholarly articles using Bayesian softwares.

Good to see that Bayesian analyses are more and more popular. But using WinBUGS for which development has stopped since 2006?

I also never could figure out how to properly run OpenBUGS on Linux, and the development team is very succinct (only one person I believe). But very good to see JAGS taking it over, and Stan, which was initially released in 2012, to have a remarkable start.

Scholarly articles using Bayesian softwares in "human" epidemiology journals.

Figure 10: Scholarly articles using Bayesian softwares in “human” epidemiology journals.

This is very unexpected to see so few Bayesian papers in epidemiology journals. And their authors don’t really settle on which software to use. However, I’m only looking at specific Bayesian softwares that can be used in conjunction (or not) with general statistical softwares. But you can also run Bayesian analyses directly into these general softwares (PROC MCMC in SAS, MCMCPack or arm in R, bayesmh in Stata, etc.). These are more difficult to capture in the Google Scholar search and I did not try (yet).

Scholarly articles using Bayesian softwares in veterinary epidemiology journals.

Figure 11: Scholarly articles using Bayesian softwares in veterinary epidemiology journals.

But in veterinary epidemiology, it looks like they decided to use less WinBUGS, and maybe try something else.

Counterpoint: very specific journals

And if we do the same but within Statistics in Medicine and Statistical Methods in Medical Research?

Scholarly articles in Statistics in Medicine and Statistical Methods in Medical Research.

Figure 12: Scholarly articles in Statistics in Medicine and Statistical Methods in Medical Research.

And the winner is: R!

Scholarly Bayesian papers within Statistics in Medicine and Statistical Methods in Medical Research.

Figure 13: Scholarly Bayesian papers within Statistics in Medicine and Statistical Methods in Medical Research.

Here again, WinBUGS is well present, but it will be interesting to see if JAGS catches up.

Final note

By the way, there was a single, lonely paper using Julia, found in Statistical Methods in Medical Research (which also used R and SAS): Bayesian accelerated failure time model for space-time dependency in a geographically augmented survival model, by Georgiana Onicescu et al., from the Medical University of South Carolina.