PhpDig.net

Go Back   PhpDig.net > General Forums > Feedback & News

Reply
 
Thread Tools
Old 07-07-2004, 03:23 PM   #1
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
PhpDig Version 1.8.1 Released

Hi. PhpDig version 1.8.1 has been released as a 'minor+++' release. The changes can be found in the Changelog file. Three database tables were added. To upgrade, add the tables to your database, reconfigure the new connect.php and config.php files, and copy over the old files with the new files.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-08-2004, 01:33 AM   #2
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Hey - thanks for updating. I've been playing with the new version and it works well, but when I add a new URL with a subdirectory as the root (e.g. http://www.geocities.com/website/) phpdig still only stores the base URL in the database. In your changelog, you mention that in this version you can "Search by site or directory." Is this only a variation in the search function itself, not the spider function?

Anyway, everything else looks really great and all the changes you've made wil help this become an even better search engine. Now all we need is multiple spiders (I'll stop bugging you)
bloodjelly is offline   Reply With Quote
Old 07-08-2004, 08:40 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> "Search by site or directory." Is this only a variation in the search function itself, not the spider function?

Hi. Yes, it's the search itself, not the spider function. What you might try is to do a limited index, setting links per so that you get a cursory spider, and then go and exclude the directories that you don't want and then reindex. This is a roundabout way to limit spidering to certain directories on sites you don't own.

With 1.8.1 there are changes to limit indexing using the links per option, and if set, the extra index info that used to be kept in the tempspider table is now deleted, so it might be the case that multiple spidering is now possible. What happens if you run two 1.8.1 spiders on two different sites, setting links per for each spider?

One other tip is that if you want to stop all spidering processes, just keep clicking the delete button, without selecting a site, until the sites being spidered go from locked to unlocked. Using the browser stop button will not necessarily stop the process on the server, but once the tempspider table is empty, spidering should stop.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-08-2004, 12:37 PM   #4
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
The muliple spiders seemed to work when I ran two spiders from the web interface at the same time. Both finished correctly - nice! I also started a spider with the exec() command, which ran for a while and then stopped with links still in the temporary table and without unlocking. Most likely this is because I didn't set links per for this spider? This is the command I used:

exec("/usr/bin/php -f /home/search/admin/spider.php $site 2>&1 > /dev/null &");

As for the directory spidering issue, I think I might play around with the code to get it to do what I want, unless you plan on adding this feature to a future version. Thanks for the help.
bloodjelly is offline   Reply With Quote
Old 07-08-2004, 01:13 PM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> I also started a spider with the exec() command, which ran for a while and then stopped with links still in the temporary table and without unlocking. Most likely this is because I didn't set links per for this spider?

Hi. Hmm, not sure on this one. In spider.php is the following:
PHP Code:
if (!isset($linksper) or (int)$linksper LINKS_MAX_LIMIT) {
 if (
$run_mode != 'cgi') {
    
$linksper RELINKS_LIMIT;
 }
 else {
    
$linksper LINKS_MAX_LIMIT;
 }

In 1.8.1 the links per is either set via the browser interface or by values in the config file. One of the new tables has links per for each site, but utilizing this table didn't get done for version 1.8.1, so for now, links per will be the same for all sites crawled via shell.

Anyway, back to your exec issue, I'm not sure why the spider quit. Maybe r****m noise, maybe not. Anything in your error logs?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-08-2004, 07:11 PM   #6
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
I ran three execs this time with three different sites, and 2 of the 3 spidered completely without stopping. The third one stopped until I emptied the temporary table, and then it started up again for some reason. I checked the error log but everything looked fine - it simply paused and then picked back up where it left off after I emptied the temp table.

Anyway all three finished perfectly with a little encouragement, so multiple spiders are indeed possible. Thanks!

As for the directory dilemma, I'll post in a new thread where it's more appropriate. Thanks again for the help.
bloodjelly is offline   Reply With Quote
Old 07-08-2004, 07:17 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Sounds like some r****m noise, maybe too many database connections?

>> it simply paused and then picked back up where it left off after I emptied the temp table.

The tempspider table probably wasn't really empty. It may take a few tries to be sure the table is empty, as info is constantly being placed in the table during spidering.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PhpDig Version 1.8.5 Released Charter Feedback & News 4 12-15-2004 10:18 PM
PhpDig Version 1.8.4 Released Charter Feedback & News 4 12-12-2004 02:43 AM
PhpDig Version 1.8.3 Released Charter Feedback & News 6 08-01-2004 02:04 PM
PhpDig Version 1.8.2 Released Charter Feedback & News 0 07-12-2004 05:41 PM
PhpDig Version 1.6.3 Released Charter Feedback & News 0 11-10-2003 05:00 PM


All times are GMT -8. The time now is 08:09 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.