|
03-09-2004, 09:00 AM | #1 |
Green Mole
Join Date: Mar 2004
Posts: 9
|
0 links found
Hi, I applied the patch from the http://www.phpdig.net/showthread.php?threadid=573 thread. And i'm still getting 0 links found. Here is the stdout from cmd line.
%php -f spider.php forceall 47472: old priority 0, new priority 18 Spidering in progress... ----------------------------- SITE : http://maggiv8.funpic.de/ Exclude paths : - @NONE@ No link in temporary table links found : 0 ----------------------------- SITE : http://rbhs.ath.cx/ Exclude paths : - @NONE@ No link in temporary table links found : 0 ----------------------------- SITE : http://localhost/ Exclude paths : - @NONE@ No link in temporary table links found : 0 Optimizing tables... Indexing complete ! % running Fbsd 4.9 w/Apache/1.3.29 (Unix) PHP/4.3.4 any suggestions? |
03-09-2004, 10:15 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi xibalba, and welcome to PhpDig.net!
Perhaps something in this thread might help. Below is output using search depth one: SITE : http://maggiv8.funpic.de/ Exclude paths : - @NONE@ 1:http://maggiv8.funpic.de/ (time : 00:00:15) + + + level 1... 2:http://maggiv8.funpic.de/www/ (time : 00:00:27) 3:http://maggiv8.funpic.de/search.php (time : 00:00:33) 4:http://maggiv8.funpic.de/phpinfo.php (time : 00:00:41) No link in temporary table -------------------------------------------------------------------------------- links found : 4 http://maggiv8.funpic.de/ http://maggiv8.funpic.de/www/ http://maggiv8.funpic.de/search.php http://maggiv8.funpic.de/phpinfo.php Optimizing tables... Indexing complete ! SITE : http://rbhs.ath.cx/ Exclude paths : - @NONE@ 1:http://rbhs.ath.cx/ (time : 00:00:09) + + + + + level 1... 2:http://rbhs.ath.cx/uebimiau/ (time : 00:00:23) 3:http://rbhs.ath.cx/webalizer/ (time : 00:00:29) 4:http://rbhs.ath.cx/moregroupware/ (time : 00:00:35) 5:http://rbhs.ath.cx/phpMyAdmin/ (time : 00:00:41) 6:http://rbhs.ath.cx/phpSysInfo/ (time : 00:00:49) No link in temporary table -------------------------------------------------------------------------------- links found : 6 http://rbhs.ath.cx/ http://rbhs.ath.cx/uebimiau/ http://rbhs.ath.cx/webalizer/ http://rbhs.ath.cx/moregroupware/ http://rbhs.ath.cx/phpMyAdmin/ http://rbhs.ath.cx/phpSysInfo/ Optimizing tables... Indexing complete !
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-09-2004, 11:23 AM | #3 |
Green Mole
Join Date: Mar 2004
Posts: 9
|
search depth
should I be careful with how high I set the search depth?
Even with the search depth set as one for both freebsd.org and rbhs.ath.cx, i get the following output. %php -f spider.php forceall 47723: old priority 0, new priority 18 Spidering in progress... ----------------------------- SITE : http://rbhs.ath.cx/ Exclude paths : - @NONE@ No link in temporary table links found : 0 ----------------------------- SITE : http://freebsd.org/ Exclude paths : - @NONE@ No link in temporary table links found : 0 Optimizing tables... Indexing complete ! % perhaps something is wrong in my configuration. I read over the other thread you linked me too and couldn't find anything in there that would seem to have fixed this problem. Weird...it seems to correctly crawl if I add a URI via the command line %php -f spider.php http://maggiv8.funpic.de/ 47732: old priority 0, new priority 18 Spidering in progress... ----------------------------- SITE : http://maggiv8.funpic.de/ Exclude paths : - @NONE@ +1:http://maggiv8.funpic.de/ (time : 00:00:07) + + + level 1... +2:http://maggiv8.funpic.de/phpinfo.php (time : 00:00:28) +3:http://maggiv8.funpic.de/search.php (time : 00:00:35) +4:http://maggiv8.funpic.de/www/ (time : 00:00:40) + + + + + + + + + + + + level 2... ..... Last edited by xibalba; 03-09-2004 at 11:29 AM. |
03-09-2004, 11:39 AM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. The forceall option is meant to try and force the reindex of sites already indexed regardless of the default days before reindex. If the sites haven't been previously indexed, forceall won't index them.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-09-2004, 11:50 AM | #5 |
Green Mole
Join Date: Mar 2004
Posts: 9
|
Thanks for the help Charter. On a tangent, is it possible to setup phpDig in a distributed fashion?
Say I want to crawl a huge domain, www.example.com with multiple machines crawling that domain. Is there a way currently to set phpdig up in this style? |
03-09-2004, 12:45 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Some users have run spider.php on different (sub)domains at the same time using the same database tables without incident. However, PhpDig doesn't specifically account for multithreading issues.
If you want to try running PhpDig in a distributed fashion on the same domain, perhaps set the the following in the config.php file, where X is one or two: PHP Code:
Code:
prompt> php -f spider.php http://www.domain.com/dir1/ & prompt> php -f spider.php http://www.domain.com/dir2/ &
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
No links found... | pwoc | Troubleshooting | 0 | 11-10-2004 08:05 PM |
Another: links found : 1 | majestique | Bug Tracker | 11 | 07-12-2004 12:19 AM |
0 links found, yes, another one | juzzi | Troubleshooting | 5 | 07-05-2004 07:31 AM |
links found : 0 w/ example | squatty | Troubleshooting | 3 | 06-21-2004 05:00 AM |
Links found: 1 | CafeenMan | Troubleshooting | 10 | 05-12-2004 08:35 PM |