dhtcrawler2

mirror of https://github.com/btdig/dhtcrawler2.git synced 2025-02-21 20:59:03 +00:00

dhtcrawler is a DHT crawler written in erlang. It can join a DHT network and crawl many P2P torrents. The program save all torrent info into database and provide an http interface to search a torrent by a keyword

Go to file

Kevin Lynx c85e216951 fix cache_indexer		2013-07-16 22:24:55 +08:00
deps	integrate cache_index to hash_reader, default is disabled	2013-07-15 21:27:01 +08:00
ebin	fix cache_indexer	2013-07-16 22:24:55 +08:00
priv	add rmmseg, to segment chinese texts, add a tool to convert the existing torrent file names	2013-07-13 11:45:55 +08:00
tools/db-replset	add database repl-set scripts	2013-07-02 17:42:35 +08:00
www	separate embedded css style to page.temp	2013-07-08 12:43:27 +08:00
create_bin.bat	add text segment config for hash_reader (text_seg), the default is simple	2013-07-13 22:27:17 +08:00
download_sync.bat	add cache_indexer progress displaying	2013-07-15 13:39:41 +08:00
HISTORY.md	update torrent importer, save process state so that can launch next time	2013-07-08 22:16:26 +08:00
README.md	change inc_announce response fileds to empty (only _id)	2013-07-09 21:43:26 +08:00
win_start_cache_indexer.bat	fix cache_indexer	2013-07-16 22:24:55 +08:00
win_start_crawler.bat	first commit	2013-07-01 22:42:14 +08:00
win_start_hash.bat	change hash_reader shell startup script	2013-07-09 21:50:12 +08:00
win_start_http.bat	change http start to pass dbpool size	2013-07-09 22:38:34 +08:00
win_start_import_tors.bat	update torrent importer, save process state so that can launch next time	2013-07-08 22:16:26 +08:00
win_start_torcache.bat	add torrent downloader	2013-07-05 21:07:35 +08:00

README.md

dhtcrawler2

dhtcrawler2 is an extended version to dhtcrawler. It has improved a lot on crawling speed, and much more stable.

This git branch maintain pre-compiled erlang files to start dhtcrawler2 directly. So you don't need to compile it yourself, just download it and run it to collect torrents and search a torrent by a keyword.

Enjoy it!

Usage

install Erlang R16B or newer

download mongodb and start mongodb first

  mongod --dbpath your-database-path --setParameter textSearchEnabled=true

start crawler, on Windows, just click win_start_crawler.bat
start hash_reader, on Windows, just click win_start_hash.bat
start httpd, on Windows, just click win_start_http.bat
wait several minutes and checkout localhost:8000

You can also compile the source code and run it manually. The source code is in src branch of this repo.

Also you can check more technique information at my blog site (Chinese) codemacro.com

Source code

dhtcrawler is totally open source, and can be used in any purpose, but you should keep my name on, copyright by me please. You can checkout dhtcrawler2 source code in this git repo src branch.

Config

Most config value is in priv/dhtcrawler.config, when you first run dhtcrawler, this file will be generated automatically. And the other config values are passed by arguments to erlang functions. In most case you don't need to change these config values, except these network addresses.

Mongodb Replica set

It's not related about dhtcrawler, but only Mongodb, try figure it yourself.

Another http front-end

Yes of course you can write another http front-end UI based on the torrent database, if you're interested in it I can help you about the database format.