Peer Review in the S&P Community

I kept these notes lying around my “desk” for far too long, and it is time to start sharing them now, the way they are, lightly polished, hoping they will help the discussion within our community towards a more positive review experience.

Today I share part one of my notes about the review workload and review workload only. Time permitting, I will add more, expanding more on the actual utilization of the available resources (aka reviewers). Time will tell.

Pt 1: Institutionalized Review Workload

tl;wr;

The S&P community avg. reviewer workload in 2021 is ~16.4 papers per TPC member, with almost flat growth since 2000 (slope=0.04)
The S&P community workload in 2021 is 16.4 papers vs 10.5 of ICSE for the same year
S&P TPCs growth matches community growth, with no workload reduction. Workload reduction should be a priority too (see ICSE) but is challenged by the exponential growth, e.g., +535.7% papers in Usenix alone from 2010 (unlike ICSE, at +48.42%)

1. Forewords

In 2020 alone, the top four S&P conferences attracted 3039 submissions, following exponential growth. If you do not believe me, please have read Davide Balzarotti’s System Security Circus piece here, where he predicted 3093 papers for 2020, with +54 estimation error.

Yearly exponential growth can cripple TPCs, where the same number of reviewers need to do more work every year. It simply does not scale, and TPCs must grow too, which is happening. Davide’s post covers TPC growth, but here, I wanted to take a closer look at how TPC growth translates into reviewers’ workload and compares with the SE community. So, I asked Davide the raw numbers for 2000-2020, added the 2021 data points, and collected data for ICSE.

2. Defining Workload

tl;wr;

Conference workload = number of submitted papers * 3
Reviewer workload = conference workload / TPC size

Quantifying precisely the workload of reviewers is not easy. Reviewers need to do several things, e.g., read papers, write reviews, discuss papers, prepare authors-visible comments, revise reviews, and attend PC meetings. We don’t have data for all these activities, and the best we can do is use the number of submitted papers to start approximating workload, which we have.

Roughly 50% of the submitted papers are R1-rejected after two reviews, and the other 50% gets two additional reviews. With that, we could estimate the workload of a conference as three times the number of submitted papers per reviewer.

To complicate further the estimation of workload, not all reviewers review the same number of papers. For example, the popularity of specific topics can distribute workload unevenly, overloading reviewers working on popular topics. Also here, I do not have data and I calculate the reviewer workload as an average of submitted paper / TPC size.

3. Security and Privacy Review Workload

tl;wr;

Avg. reviewer workload is ~17.6 papers a year (period 2000-2021)
Avg. reviewer workload is almost flat over time (slope 0.04)

The plot below is also in Davide’s post. In this section, I am focusing on the workload, and looking at the yearly average workload.

The average workload hasn’t changed much over time, plus-minus a few notable per-conference fluctuations. Instead of going down (as I wish it did), the workload is almost flat, increasing very slowly, converging towards 20 papers per reviewer.

The almost flat workload indicates that the TPC size is optimized to match the community growth, which makes sense because one of the PC chairs’ goal is to maintain operations and avoid TPCs being overwhelmed by an unmanageable volume of papers.

4. What About the SE Community?

tl;wr;

ICSE yearly workload is in a steeper decline (slope -0.20), from 27.8 papers/reviewer in 2010 to 10.5 papers/reviewer in 2021
ICSE is not growing exponentially (yet)

I collected the TPC size and number of submitted papers for ICSE of the 2010 to 2021 editions. For the 2020 and 2021 editions, I used the ICSE HotCRP instances. From 2019 to 2010, I used multiple sources. First, I discovered the ICSE PC Chairs Reports, a fantastic source of helpful information, including PC chairs insights, strategic decisions and their assessment, and survey results. Wow! These reports also include the number of submitted papers and sometimes miss the number of PC members (at least, I couldn’t always find them). You can find the reports here http://www.icse-conferences.org/reports.html. The most useful report is the 2018 one, which includes a table with submitted papers till 2010. Finally, I looked at the conference website for the PC size, either visiting the original website or using the Wayback Machine.

ICSE’s workload is in a steeper decline than the top four security conferences. In particular, 2016 is the year where the PC size grew more than its community, reaching, in 2021, a workload as low as 10.5 papers per reviewer. Among the top four conferences, CCS is the one showing a similar rapid decrease in workload.

It seems like the SE community is doing better than us. But it seems they have a relatively simpler game to play. If we look at their community growth, ICSE does not look like it’s growing exponentially like the S&P conferences.

Conference	Net growth	% growth	2010	2021
ICSE	184	0.4842	380	564
USENIX	1109	5.357	207	1316
NDSS	422	2.7	156	578
Oakland	685	2.56	578	952
CCS	560	1.75	320	880

To put things into perspective, over the past 11 years, the number of papers submitted to ICSE went from 380 in 2010 to 564 papers in 2021: A net growth of 184 papers or +48.42% increase. All top four security conferences experienced far more significant percentage increase from 2010: CSS +175% (+560 papers), NDSS +270% (+422 papers), Oakland +256% (+685 papers), and Usenix +535.7% (+1109 papers).

Let me say that differently: from 2010, Usenix experienced +535.7% increase of submitted papers vs. the almost +50% growth of ICSE.

If you are interested in reading more about how different the reviewing experience is across PL, SE, and S&P, read Andreas Zeller’s blog post: [How do different fields review papers? Experiences from ICSE, PLDI, and CCS] (https://andreas-zeller.info/2021/07/27/Reviewing-across-fields-ICSE-PLDI-CCS.html)

5. Not all S&P Conferences are the Same

tl;wr;

NDSS and CSS trend is upward (especially NDSS)
Usenix and Oakland trend is downward (especially Usenix)
Trends cancel each other

Usenix Security (in green) has the highest reviewer workload among the top four. In the 2000-2021 timeframe, Usenix went above the average workload 18 times. Only four times the workload was below the average, i.e., 2001, 2015, 2016, and 2021. Oakland (in yellow) is right behind Usenix, with 13 data points above the average.

CCS (in blue) and NDSS (in red) are different, with a lower-than-average workload. They went above the average only five and six times, respectively.

Using that to decide whether to accept a PC invitation will not help much because roles may be swapping soon. In fact, Usenix and Oakland seem to be inverting the trend. For example, only recently (starting from 2015), Usenix went below the average three times (out of four!), suggesting a change of direction. On the contrary, the number of reviews per PC member for NDSS and CCS is going up. In particular, NDSS seems to feel the urge to impress the other three conferences, in a way.

6. Wrapping Up

My interpretation of the TPC growth is that reducing reviewers’ workload was not priority number one so far (or the effects are not yet visible). Reducing the workload has to happen, eventually, but it is more challenging than I initially thought:

Exponential growth is at the moment priority number one. Other near-by communities managed to drop the workload significantly, but this can well be because they did not experience the same growth of the S&P community.
Reviewers are invited before the paper submission deadline, and PC chairs need to predict at that point the number of papers they expect. That’s not easy, and considering that, PC chairs did a good job at avoiding congesting TPCs, keeping the

Nevertheless, as a general direction, our community should work more towards reducing reviewers’ workload, inviting more reviewers to TPCs.

There may be other solutions, and I plan to share my thoughts on that later. At the moment, I think the numbers above should be shared, continue a public discussion about the workload, and find ways to help PC chairs to reduce it (let’s not forget that PC chairs are members of the same community like everyone else).

Acknowledgments

Thanks to Davide Balzarotti, Andreas Zeller, and Carmela Troncoso for their feedback.

Resources

How do different fields review papers? Experiences from ICSE, PLDI, and CCS by Andreas Zeller (https://andreas-zeller.info/2021/07/27/Reviewing-across-fields-ICSE-PLDI-CCS.html)
System Security Circus by Davide Balzarotti (https://www.s3.eurecom.fr/~balzarot/notes/top4/)
ICSE Conference Reports (http://www.icse-conferences.org/reports.html)

Last updated on Feb 7, 2022