|
11-30-2004, 02:53 PM | #1 |
Green Mole
Join Date: Nov 2004
Posts: 3
|
Site not updating - head requests?
Hi,
When I try an spider a site again from the admin control panel, no new links are found (even though I know there are some there). The site did successfuly index a portion of the site initially, but form this point on, no further pages were indexed then, or on any further spider attempt. Output: SITE : http://www.mydomain.com/ Exclude paths : - @NONE@ <SNIP> No link in temporary table links found : 0 My 'LIMIT_DAYS' is set to 0 and my php allow_url_fopen is set to On, safemode is off, and I am trying to respider by clicking on the 'green checkmark' next to the 'root' in the update section of the phpdig contol panel. I have also tried a fresh attempt by using these instructions from another thread: # empty all the PhpDig database tables # delete all files that may be in the temp dir # delete all files in the text_content dir except keepalive.txt # run spider.php from a browser When I try to telnet, as suggested in an older thread, I do recieve a "Connection closed by foreign host." which seems to suggest that 'head requests' may be a problem???? Er, maybe? So my question, how does the server admin allow head request, and are there any security implications? My site has over 35000 pages, but I'm sure that isn't an issue for phpdig. Thanks, Dale. Last edited by guinessec; 11-30-2004 at 03:29 PM. |
12-01-2004, 12:14 PM | #2 |
Green Mole
Join Date: Nov 2004
Posts: 3
|
Quick update on this thread:
I have manage to force updates using this command from shell: php -f path/spider.php forceall As an estimate, it will take about 4 days to crawl my whole site. I was thinking too, maybe I've been searching for a more complicated answer to my query. Because I have "<meta name="Revisit-after" content="5 Days">" on all my web pages, would this prevent phpdig from spidering a second time from the browser admin interface? Thanks, Dale. |
12-14-2004, 11:16 AM | #3 |
Green Mole
Join Date: Aug 2004
Posts: 5
|
Same problem here
I have the same problem. Spider won't index new link in my index.shtml, instead it tries to index some other pages which allready exist in index (returning: File date unchanged). Even if this files are not linked to index.shtml.
I have tried everything... just can't get it to work right. It' the same if I use versin 1.8.5 or 1.8.4 Also, on result page I can't get it to display meta tags. Older versions 1.6.x were better on this issues and I'm thinking to install old version of phpDig again. |
12-14-2004, 01:03 PM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
PhpDig tries to follow META revisit-after unless doing a force. The update is for updating what is already there. If you don't want a page, click the delete icon. If you want to reindex, use the textbox. If you want to index across directories, set LIMIT_TO_DIRECTORY to false in the config file. If you want it to index many pages, set search depth to a large number and set links per to zero. If you want to increase the max search depth number, change the *_MAX_LIMIT constants in the config file. If you use an old version of PhpDig, you may find yourself exploited. If you want META description and META keywords in search results, set APPEND_TITLE_META to true, again in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-14-2004, 10:41 PM | #5 |
Green Mole
Join Date: Aug 2004
Posts: 5
|
I allredy did all the things you written before.
I have APPEND_TITLE_META true, and DESCRIPTION on true, snipets on false. Still, on result paegs, there is no Meta description, instead it always displays text from body - the same as the snippets does. As for updating... I have very dinamic page (new movies) and I add upt to 10 pages every day. Now, in the version 1.6.x all I have to do was to update index.shtml, where all the links to newly added movies are. Spider found all the new html pages and stop. In the version 1.8.5 spider doesn't find new pages, even if links are on index.shtml. No matter in what way I try. I tried from text box (puting link to index.shtml in), I have LIMIT_TO_DIRECTORY to false, I have tried depth from 2 to 20, link depth to 0... It doesn't work. Spider always tries to spider some other htmls, which are not even linked to index.shtml in any way (and they were allready indexed some time ago). Now, if I put all the new htmls in text box manually, then it works. |
12-15-2004, 04:00 AM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Things used to update like this but now they are different. Maybe this might help you with META tags. Maybe run the "clean" options too.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem updating to 1.8.8 | 1001studio | Troubleshooting | 0 | 04-04-2006 06:53 AM |
error when updating | webblynx | Troubleshooting | 9 | 12-14-2004 02:07 PM |
Taking Requests | Charter | Mod Requests | 26 | 05-04-2004 11:23 AM |
Code Requests | Charter | Feedback & News | 0 | 02-29-2004 12:45 AM |
funny requests | fzxdude | Troubleshooting | 2 | 01-24-2004 10:15 PM |