Every project.clj

I was recently looking for an interesting relational dataset for another project and the idea of using the dependencies for every Clojure project on GitHub came up. It turns out that it’s possible to download almost every project.clj using Tentacles, so I decided to…

The most annoying part was dealing with GitHub’s rate limits, but after waiting a few hours I had them all on local disk and was able to play around. I haven’t gotten to dig into the data for the actual project I’m doing, but there were a couple simple queries that I thought were worth sharing.

Most frequently included packages

I was able to download 10770 project.clj files. Here are the 50 most frequently included packages listed in their :dependencies:

Dependency Count
org.clojure/clojure-contrib 1524
compojure 1348
hiccup 743
clj-http 738
ring/ring-jetty-adapter 607
cheshire 558
org.clojure/data.json 552
clj-time 526
org.clojure/tools.logging 490
enlive 444
noir 388
ring/ring-core 375
ring 361
org.clojure/tools.cli 348
org.clojure/java.jdbc 344
org.clojure/clojurescript 339
org.clojure/core.async 235
midje 227
org.clojure/math.numeric-tower 219
korma 206
incanter 202
seesaw 195
overtone 172
slingshot 160
quil 158
com.taoensso/timbre 150
http-kit 149
ring/ring-devel 145
org.clojure/math.combinatorics 145
org.clojure/core.logic 138
environ 132
aleph 132
log4j 131
ch.qos.logback/logback-classic 125
org.clojure/tools.nrepl 124
congomongo 124
com.datomic/datomic-free 123
com.novemberain/monger 123
lib-noir 121
org.clojure/core.match 118
ring/ring-json 111
clojure 110
org.clojure/data.xml 110
log4j/log4j 109
mysql/mysql-connector-java 109
postgresql/postgresql 107
org.clojure/data.csv 101
org.clojure/tools.trace 98
org.clojure/tools.namespace 92
ring-server 92

I think it makes a nice hit-list of projects to check out!

A couple interesting things jumped out at me:

  1. 12.5% of Clojure projects on GitHub are using Compojure. Impressive.
  2. congomongo, com.novemberain/monger, com.datomic/datomic-free, mysql/mysql-connector-java, and postgresql/postgresql are all clustered together in the low 100’s.

Most frequently applied licenses

Just over half of the project.clj’s don’t contain a :license. Here are the most popular:

License Count
EPL 4430
MIT 336
Apache 106
BSD 92
GPL 90
LGPL 25
CC 21
WTFPL 18
AGPL 11
Mozilla 11

The EPL’s dominance doesn’t come as a surprise, given Clojure’s use of it for the core libraries.

23 projects have “WTF” or “fuck” in their license string:

License Count
WTFPL 18
Do What The Fuck You Want To Public License 3
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE Version 2 1
All Rights Reserved Muthafucka 1

Conclusion

I’d like to share a mirror of just the project.clj files wrapped up in a single download, but I want to be conscientious of the variety of licenses. I’ll clean up the code for pulling and summarizing all this data soon so others can play with it. In the meantime, feel free to suggest other analyses that could be done on these…

Comments