I was recently looking for an interesting relational dataset for another project and the idea of using the dependencies for every Clojure project on GitHub came up. It turns out that it’s possible to download almost every project.clj using Tentacles, so I decided to…
The most annoying part was dealing with GitHub’s rate limits, but after waiting a few hours I had them all on local disk and was able to play around. I haven’t gotten to dig into the data for the actual project I’m doing, but there were a couple simple queries that I thought were worth sharing.
Most frequently included packages
I was able to download 10770 project.clj files. Here are the 50 most frequently included packages listed in their :dependencies
:
Dependency | Count |
---|---|
org.clojure/clojure-contrib | 1524 |
compojure | 1348 |
hiccup | 743 |
clj-http | 738 |
ring/ring-jetty-adapter | 607 |
cheshire | 558 |
org.clojure/data.json | 552 |
clj-time | 526 |
org.clojure/tools.logging | 490 |
enlive | 444 |
noir | 388 |
ring/ring-core | 375 |
ring | 361 |
org.clojure/tools.cli | 348 |
org.clojure/java.jdbc | 344 |
org.clojure/clojurescript | 339 |
org.clojure/core.async | 235 |
midje | 227 |
org.clojure/math.numeric-tower | 219 |
korma | 206 |
incanter | 202 |
seesaw | 195 |
overtone | 172 |
slingshot | 160 |
quil | 158 |
com.taoensso/timbre | 150 |
http-kit | 149 |
ring/ring-devel | 145 |
org.clojure/math.combinatorics | 145 |
org.clojure/core.logic | 138 |
environ | 132 |
aleph | 132 |
log4j | 131 |
ch.qos.logback/logback-classic | 125 |
org.clojure/tools.nrepl | 124 |
congomongo | 124 |
com.datomic/datomic-free | 123 |
com.novemberain/monger | 123 |
lib-noir | 121 |
org.clojure/core.match | 118 |
ring/ring-json | 111 |
clojure | 110 |
org.clojure/data.xml | 110 |
log4j/log4j | 109 |
mysql/mysql-connector-java | 109 |
postgresql/postgresql | 107 |
org.clojure/data.csv | 101 |
org.clojure/tools.trace | 98 |
org.clojure/tools.namespace | 92 |
ring-server | 92 |
I think it makes a nice hit-list of projects to check out!
A couple interesting things jumped out at me:
- 12.5% of Clojure projects on GitHub are using Compojure. Impressive.
- congomongo, com.novemberain/monger, com.datomic/datomic-free, mysql/mysql-connector-java, and postgresql/postgresql are all clustered together in the low 100’s.
Most frequently applied licenses
Just over half of the project.clj’s don’t contain a :license
. Here are the most popular:
License | Count |
---|---|
EPL | 4430 |
MIT | 336 |
Apache | 106 |
BSD | 92 |
GPL | 90 |
LGPL | 25 |
CC | 21 |
WTFPL | 18 |
AGPL | 11 |
Mozilla | 11 |
The EPL’s dominance doesn’t come as a surprise, given Clojure’s use of it for the core libraries.
23 projects have “WTF” or “fuck” in their license string:
License | Count |
---|---|
WTFPL | 18 |
Do What The Fuck You Want To Public License | 3 |
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE Version 2 | 1 |
All Rights Reserved Muthafucka | 1 |
Conclusion
I’d like to share a mirror of just the project.clj files wrapped up in a single download, but I want to be conscientious of the variety of licenses. I’ll clean up the code for pulling and summarizing all this data soon so others can play with it. In the meantime, feel free to suggest other analyses that could be done on these…