PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 11-30-2004, 02:53 PM   #1
guinessec
Green Mole
 
Join Date: Nov 2004
Posts: 3
Site not updating - head requests?

Hi,

When I try an spider a site again from the admin control panel, no new links are found (even though I know there are some there).
The site did successfuly index a portion of the site initially, but form this point on, no further pages were indexed then, or on any further spider attempt.

Output:
SITE : http://www.mydomain.com/
Exclude paths :
- @NONE@
<SNIP>
No link in temporary table
links found : 0


My 'LIMIT_DAYS' is set to 0 and my php allow_url_fopen is set to On, safemode is off, and I am trying to respider by clicking on the 'green checkmark' next to the 'root' in the update section of the phpdig contol panel.

I have also tried a fresh attempt by using these instructions from another thread:

# empty all the PhpDig database tables
# delete all files that may be in the temp dir
# delete all files in the text_content dir except keepalive.txt
# run spider.php from a browser

When I try to telnet, as suggested in an older thread, I do recieve a "Connection closed by foreign host." which seems to suggest that 'head requests' may be a problem???? Er, maybe?

So my question, how does the server admin allow head request, and are there any security implications?

My site has over 35000 pages, but I'm sure that isn't an issue for phpdig.

Thanks,

Dale.

Last edited by guinessec; 11-30-2004 at 03:29 PM.
guinessec is offline   Reply With Quote
Old 12-01-2004, 12:14 PM   #2
guinessec
Green Mole
 
Join Date: Nov 2004
Posts: 3
Quick update on this thread:

I have manage to force updates using this command from shell:

php -f path/spider.php forceall

As an estimate, it will take about 4 days to crawl my whole site.

I was thinking too, maybe I've been searching for a more complicated answer to my query.

Because I have "<meta name="Revisit-after" content="5 Days">" on all my web pages, would this prevent phpdig from spidering a second time from the browser admin interface?

Thanks,

Dale.
guinessec is offline   Reply With Quote
Old 12-14-2004, 11:16 AM   #3
darjanp
Green Mole
 
darjanp's Avatar
 
Join Date: Aug 2004
Posts: 5
Same problem here

I have the same problem. Spider won't index new link in my index.shtml, instead it tries to index some other pages which allready exist in index (returning: File date unchanged). Even if this files are not linked to index.shtml.

I have tried everything... just can't get it to work right.
It' the same if I use versin 1.8.5 or 1.8.4

Also, on result page I can't get it to display meta tags.

Older versions 1.6.x were better on this issues and I'm thinking to install old version of phpDig again.
darjanp is offline   Reply With Quote
Old 12-14-2004, 01:03 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
PhpDig tries to follow META revisit-after unless doing a force. The update is for updating what is already there. If you don't want a page, click the delete icon. If you want to reindex, use the textbox. If you want to index across directories, set LIMIT_TO_DIRECTORY to false in the config file. If you want it to index many pages, set search depth to a large number and set links per to zero. If you want to increase the max search depth number, change the *_MAX_LIMIT constants in the config file. If you use an old version of PhpDig, you may find yourself exploited. If you want META description and META keywords in search results, set APPEND_TITLE_META to true, again in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-14-2004, 10:41 PM   #5
darjanp
Green Mole
 
darjanp's Avatar
 
Join Date: Aug 2004
Posts: 5
I allredy did all the things you written before.
I have APPEND_TITLE_META true, and DESCRIPTION on true, snipets on false.
Still, on result paegs, there is no Meta description, instead it always displays text from body - the same as the snippets does.

As for updating... I have very dinamic page (new movies) and I add upt to 10 pages every day. Now, in the version 1.6.x all I have to do was to update index.shtml, where all the links to newly added movies are. Spider found all the new html pages and stop.

In the version 1.8.5 spider doesn't find new pages, even if links are on index.shtml. No matter in what way I try. I tried from text box (puting link to index.shtml in), I have LIMIT_TO_DIRECTORY to false, I have tried depth from 2 to 20, link depth to 0... It doesn't work. Spider always tries to spider some other htmls, which are not even linked to index.shtml in any way (and they were allready indexed some time ago).

Now, if I put all the new htmls in text box manually, then it works.
darjanp is offline   Reply With Quote
Old 12-15-2004, 04:00 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Things used to update like this but now they are different. Maybe this might help you with META tags. Maybe run the "clean" options too.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem updating to 1.8.8 1001studio Troubleshooting 0 04-04-2006 06:53 AM
error when updating webblynx Troubleshooting 9 12-14-2004 02:07 PM
Taking Requests Charter Mod Requests 26 05-04-2004 11:23 AM
Code Requests Charter Feedback & News 0 02-29-2004 12:45 AM
funny requests fzxdude Troubleshooting 2 01-24-2004 10:15 PM


All times are GMT -8. The time now is 02:44 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.