David A. Wheeler's Blog

Tue, 18 Feb 2020

Census II Report on Open Source Software

The Linux Foundation and the Laboratory for Innovation Science at Harvard have just released a new report: “Vulnerabilities in the Core: Preliminary Report and Census II of Open Source Software” by Frank Nagle, Jessica Wilkerson, James Dana, and Jennifer L. Hoffman, 2020-02-14. Just click on “Download Report” when you get there. A summary is available from Harvard. Here’s a quick introduction to the paper.

Their long-term goal is to figure out what FOSS packages are most critical through data analysis. This turns out to extremely difficult, as discussed in the paper, and they expressly state that their current results “cannot - and do not purport to - be a definitive claim of which FOSS packages are the most critical”. That said, they have developed a method as a “proof of concept” to start working towards that answer.

They describe their approach in detail. Here’s a quick summary. First they use data from Software Composition Analysis (SCAs) and application security companies, including Snyk and Synopsys Cybersecurity Research Center, to identify components used in actual systems. They then use dependency analysis (via libraries.io) to identify indirect (transitive) dependencies. Finally, they averaged the Z-scores to provide normalized rankings.

Here are some key lessons learned from the report (Chapter 7):

Also, here’s an interesting nugget: “These statistics illustrate an interesting pattern: a high correlation between being employed and being a top contributor to one of the FOSS packages identified as most used.”

I’m on the CII Steering Committee, so I did comment on an earlier draft, but credit goes to the actual authors.

path: /oss | Current Weblog | permanent link to this entry