Ruby Crawling Links


An open crawling project

https://github.com/commoncrawl/commoncrawl

Web-crawling framework

https://github.com/chriskite/anemone/tree/master

Automate filling up forms 

http://mechanize.rubyforge.org/Mechanize.html

https://www.ruby-toolbox.com/projects/cobweb

A super lightweight DSL crawler

https://github.com/felipecsl/wombat

Distributed computing

http://hadoop.apache.org

https://github.com/infochimps-labs/wukong

http://stackoverflow.com/a/4981595

headless WebKit scriptable with a JavaScript API (use this to navigate javascript based site)
http://phantomjs.org/

http://zombie.labnotes.org/

Leave a comment