In 2001, arXiv moved from Los Alamos National Laboratories (LANL) to Cornell University Library. At the time, annual costs were pegged at $300,000, and were covered by funding from the National Science Foundation, the Department of Energy, and LANL.
The recently announced move from Cornell University Library to Cornell Computing & Information Sciences (CIS) generated some renewed interest in how this major free information initiative is faring. While it’s heavily used, and popular with its users, the financial aspects are where the rubber meets the road. Is it going to last? Will the move improve things in this regard?
Since 2001, the arXiv budget has grown, reaching $1.43 million in 2018. According to financial reports, arXiv lost a total of $745,387 between 2010 and 2018. These losses are projected to accelerate rapidly, adding $1,589,827 between 2019 and 2022, for a total operating loss from 2010-2022 of $2.3 million. These losses in the coming years are projected despite rosy cost estimates unsupported by historical averages.
Because Cornell supports most of the indirect costs for arXiv, their bookkeeping allows arXiv to report cash to reserves. However, I’m using a more sensible approach of looking at fully loaded costs, as these overheads and facilities costs are real, and shouldn’t be brushed aside to make it look like arXiv is generating an actual surplus.
Between 2010 and 2018, projections from Cornell show total costs increasing at an average rate of 20.2%. Indirects increased at an average rate of 17.8%. Direct costs increased at an average rate of 24%. The charts below show these increases and the relevant trendlines.
The first chart starts with one data point in 2001, and data between 2010 and 2022 (actuals and projections).
The next chart covers 2010-2022, actuals and projections, for indirect expenses related to arXiv personnel, facilities, and overheads.
Direct costs increased steadily between 2010 and 2022 (actuals and projections). The dip in the middle seems to be an accounting issue, with the costs moving from 2016 into 2017, making the actual trendline more likely the real story. It’s worth noting that projections for 2019-2022 reflect anticipated increases in costs of 2.1-2.9%, an assumption historical data don’t seem to support. (Historical increases run between 3.4% and 39.5%, averaging 17.8% for those years.)
Increases averaging 18% and 24% year over year for indirect and direct costs, respectively, seem steep (some years, costs jumped 40%). The larger increases for direct costs underscore the fact that digital is expensive, and why digital expenses will continue to rise. They certainly have for arXiv.
Even with rosy cost projections, arXiv is projecting cumulative losses of nearly $1.6 million between 2019 and 2022, with all indirect expenses included.
Without indirect costs, arXiv is projected to only generate $608 of positive cashflow in 2021, and based on the assumptions made by CIS will lose $27,451 in 2022 (it becomes a $455,856 loss with indirects that year, illustrating how robust Cornell’s support of arXiv has been and will be).
Revenues from philanthropy and memberships are projected to be flat during this same time. On average, member organizations are paying about $2,100 per year. Philanthropic support has been flat, at $400,000 per year, for a long time, not even increasing to adjust for inflation, causing this form of support to diminish over time as inflation erodes its value slightly each year.
There are also discrepancies in reporting. The major one I found is in the 2017 budget. The most recent statement of outcomes varies significantly from the one recorded in a more detailed filing. The difference between the two amounts to about $120,000, with the money missing from the most recent statement. Someone might want to track that down.
My takeaways from this quick run through of arXiv finances are:
- Cornell is doing the world a big favor hosting arXiv. This “favor” is only possible because it is supported by tuition and other Cornell revenue streams — arXiv is not itself sustainable without large subsidies from a US institution of higher education.
- Cost increases for both direct and indirect costs seem excessive. These should be controlled better, but the projections suggest the the move to CIS may make things worse.
- Cost projections from 2019-2022 seem unrealistic given historical actuals and trends. CIS should brace for these to change in ways they won’t like.
So that’s where arXiv is these days. Is it sustainable? With enough friends — like the Simon Foundation, Cornell University administration, and some membership organization and generous souls — it just may keep plugging along. Let’s hope so. It’s pretty cool, and people find it useful. Could it charge more? Certainly. Would it do better, be better, and be more sustainable as a commercial offering? Probably. There’s a lot to consider here. But one thing is obvious — arXiv should not be held up as an example of how to make free publication services available in a sustainable way.