Commit Graph

66 Commits

Author SHA1 Message Date
Kevin Lynx
bc00e03b33 fix sphinx xml utf8 related issure, filter these unicode control characters, only backup delta file if the operation failed 2013-08-01 23:20:28 +08:00
Kevin Lynx
5e9c36f787 add sphinx search stats 2013-07-31 22:05:53 +08:00
Kevin Lynx
0bdac737ad add a simple page navigation for sphinx_search 2013-07-31 20:57:35 +08:00
Kevin Lynx
40f2bae9b8 fix sphinx_build memory leak bug, caused by mongo_cursor 2013-07-31 12:17:21 +08:00
Kevin Lynx
149f10724e sphinx worker call infinity 2013-07-30 22:43:02 +08:00
Kevin Lynx
18edffc2a1 fix some sphinx related bugs, now it can be used to build sphinx index, still in experiment stage, add `giza' library to query sphinx in http_fontend 2013-07-30 22:17:31 +08:00
Kevin Lynx
0c67e46e5c fix daterange issure which not only record today torrents, not it only show the today inserted torrents 2013-07-23 22:16:40 +08:00
Kevin Lynx
28acbdaa45 adjust http stats display 2013-07-23 21:45:06 +08:00
Kevin Lynx
94a2ac34bc system stats adjust, add more stats to http front-end 2013-07-23 21:41:08 +08:00
Kevin Lynx
6fbd0cb218 add a new force to string log func, add log to httpd, it can log unicode characters to logfiles 2013-07-22 22:58:07 +08:00
Kevin Lynx
928798ed28 complete all http uri to json api 2013-07-22 21:23:44 +08:00
Kevin Lynx
070e97e826 add hash filter stats to the new hash_reader 2013-07-21 22:10:05 +08:00
Kevin Lynx
3864940905 fix hash_download startup bug 2013-07-21 21:32:19 +08:00
Kevin Lynx
67ff84adaa fix hash_download_cache startup bug 2013-07-21 21:30:28 +08:00
Kevin Lynx
e5b35e58ed NOTE: rewrite hash_reader, config changed, dht_hash database changed, require to remove existed dht_hash database 2013-07-21 21:13:05 +08:00
Kevin Lynx
72c35be437 change default config 2013-07-21 09:24:33 +08:00
Kevin Lynx
d00c84135b fix cache_indexer message leak bug 2013-07-20 19:37:41 +08:00
Kevin Lynx
d9deb8dfc9 add simple `get' json api, fix http search space decode 2013-07-20 10:57:27 +08:00
Kevin Lynx
ba92e9cd77 fix hash_date 2013-07-19 21:31:36 +08:00
Kevin Lynx
28fe69d141 hash_date only record today new inserted torrents 2013-07-19 21:00:37 +08:00
Kevin Lynx
45ca7d584e config max download task per hash-reader, 2013-07-18 22:03:47 +08:00
Kevin Lynx
35a131fa8f nothing 2013-07-18 14:03:34 +08:00
Kevin Lynx
928fc86934 recompile 2013-07-18 13:17:06 +08:00
Kevin Lynx
f5655ba0f3 fix hash_reader stop working bug 2013-07-18 12:38:31 +08:00
Kevin Lynx
810464330d NOTE: big change! Need to delete config files. The crawler will cache hashes and merge duplicated queries. 2013-07-17 22:55:35 +08:00
Kevin Lynx
629e92115d fix cache_indexer download bug 2013-07-17 19:11:01 +08:00
Kevin Lynx
ff338f2c9b fix cache_indexer state not saved correctly 2013-07-16 22:49:08 +08:00
Kevin Lynx
1ed66b3863 fix memory leak for hash_reader (message queue keep increasing), set http search result to 50 2013-07-16 21:44:16 +08:00
Kevin Lynx
ff85af0806 try to fix high cpu usage when no hash and no wait_download 2013-07-15 23:01:26 +08:00
Kevin Lynx
c5db7ae966 restore `top' cache 2013-07-15 22:14:09 +08:00
Kevin Lynx
31a1bd04c0 to avoid there's no hash and no wait_download, the hash reader may stop working 2013-07-15 22:04:41 +08:00
Kevin Lynx
d81d6a2fd2 integrate cache_index to hash_reader, default is disabled 2013-07-15 21:27:01 +08:00
Kevin Lynx
0f24428faa add cache_indexer progress displaying 2013-07-15 13:39:41 +08:00
Kevin Lynx
5153568dc9 add cache_indexer, not integrated now, see src/cache_indexer/readme.md 2013-07-14 22:59:47 +08:00
Kevin Lynx
0579304407 change hash_reader read hash/wait_download using findAndModify, to avoid the read/delete two operations 2013-07-14 15:33:46 +08:00
Kevin Lynx
86665cb93b only build torrent name indexes 2013-07-14 10:00:38 +08:00
Kevin Lynx
a1fc6ec3c0 add text segment config for hash_reader (text_seg), the default is simple 2013-07-13 22:27:17 +08:00
Kevin Lynx
269584c708 add rmmseg, to segment chinese texts, add a tool to convert the existing torrent file names 2013-07-13 11:45:55 +08:00
Kevin Lynx
676d354515 disable numid for sphinx default 2013-07-12 10:27:23 +08:00
Kevin Lynx
6ddb9447ac Merge branch 'master' of github.com:kevinlynx/dhtcrawler2
Conflicts:
	ebin/dhtcrawler.app
	ebin/tor_download.beam
2013-07-12 09:22:09 +08:00
Kevin Lynx
f5965304f7 add torrent download stats for hash reader 2013-07-11 22:38:39 +08:00
Kevin Lynx
1320002674 integrate torrent downloader monitor, change http today_top to show the today request count, instead total request count, remove ibrowse initial config 2013-07-11 22:01:47 +08:00
Kevin Lynx
5a0b21c7b0 chang http top query, add a new database to map date to hashes, to support query by date range 2013-07-11 20:35:16 +08:00
Kevin Lynx
cda02229ad add tor_download req monitor, not integrated yet 2013-07-11 17:50:32 +08:00
Kevin Lynx
42b32810c6 torbuilder(importer) fix badarith bug when there're invalid name torrent
files
2013-07-11 09:06:58 +08:00
Kevin Lynx
4adc0a7df4 change http start to pass dbpool size 2013-07-09 22:38:34 +08:00
Kevin Lynx
164c0f0f21 change inc_announce response fileds to empty (only _id) 2013-07-09 21:43:26 +08:00
Kevin Lynx
aa7e8bb18a use safe insert in torrent importer 2013-07-09 16:58:50 +08:00
Kevin Lynx
40dbbeb581 fix hash_reader stop working bug when there's only wait_download hash 2013-07-09 15:06:13 +08:00
Kevin Lynx
03f98c35be update torrent importer, save process state so that can launch next time 2013-07-08 22:16:26 +08:00