|
03-30-2004, 04:54 AM | #1 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
No link in temporary table (yet another one)
Yet another one:
SITE : http://cordoba.ugent.be/ Exclude paths : - @NONE@ 1:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/ (time : 00:00:06) No link in temporary table -------------------------------------------------------------------------------- links found : 1 http://cordoba.ugent.be/LW02AC/document/Refs/Theses/ Optimizing tables... Indexing complete ! Charter, I give up for now. I've tried a couple of suggestions from other similar posts, I've re-installed 180 from scratch. Nothing helps! This is a site that I've indexed before, with no problems. I've changed the PHP script, I admit. Now it won't index... http://cordoba.ugent.be/LW02AC/document/Refs/Theses/
__________________
René Haentjens, Ghent University |
03-30-2004, 05:08 AM | #2 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
Here's the result with print $answer."<br>\n"; - no Forbiddens.
I also tried the hosts file modifs (some at least), the new robot_functions, I checked tempspider, it's empty... Spidering in progress... HTTP/1.1 404 Not Found HTTP/1.1 200 OK Date: Tue, 30 Mar 2004 14:06:16 GMT Server: Apache/1.3.27 (Win32) PHP/4.3.3 X-Powered-By: PHP/4.3.3 Expires: Mon, 26 Jul 1997 05:00:00 GMT Content-Type: text/html HTTP/1.1 200 OK Date: Tue, 30 Mar 2004 14:06:17 GMT Server: Apache/1.3.27 (Win32) PHP/4.3.3 X-Powered-By: PHP/4.3.3 Expires: Mon, 26 Jul 1997 05:00:00 GMT Content-Type: text/html HTTP/1.1 404 Not Found -------------------------------------------------------------------------------- SITE : http://cordoba.ugent.be/ Exclude paths : - @NONE@ HTTP/1.1 200 OK Date: Tue, 30 Mar 2004 14:06:17 GMT Server: Apache/1.3.27 (Win32) PHP/4.3.3 X-Powered-By: PHP/4.3.3 Expires: Mon, 26 Jul 1997 05:00:00 GMT Content-Type: text/html 1:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/ (time : 00:00:02) No link in temporary table -------------------------------------------------------------------------------- links found : 1 http://cordoba.ugent.be/LW02AC/document/Refs/Theses/ Optimizing tables... Indexing complete !
__________________
René Haentjens, Ghent University |
03-30-2004, 05:21 AM | #3 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
After spidering, there is one file in the text_content directory (not counting keepalive.txt). It contains one line with 7 spaces, nothing else.
It doesn't help if I delete it and start all over... |
03-30-2004, 05:30 AM | #4 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
First impression from looking at the Apache access log is that spider doesn't even try to fetch the page:
157.193.197.26 - - [30/Mar/2004:16:28:51 +0200] "HEAD /robots.txt HTTP/1.1" 404 0 157.193.197.26 - - [30/Mar/2004:16:28:51 +0200] "HEAD /LW02AC/document/Refs/Theses/ HTTP/1.1" 200 0 157.193.197.26 - - [30/Mar/2004:16:28:52 +0200] "HEAD /LW02AC/document/Refs/Theses/ HTTP/1.1" 200 0 157.193.197.26 - - [30/Mar/2004:16:28:52 +0200] "HEAD /robots.txt HTTP/1.1" 404 0 157.193.197.26 - - [30/Mar/2004:16:28:52 +0200] "HEAD /LW02AC/document/Refs/Theses/ HTTP/1.1" 200 0 157.193.197.26 - admin [30/Mar/2004:16:28:52 +0200] "POST /LW02AC/180phpdig/admin/spider.php HTTP/1.1" 200 1880 157.193.197.26 - - [30/Mar/2004:16:28:53 +0200] "GET /LW02AC/180phpdig/admin/yes.gif HTTP/1.1" 304 - |
03-30-2004, 06:30 AM | #5 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
With "the new robot_functions" in my earlier reply, I mean the version dated 25 Feb 2004 (unchanged).
I also tried to put PHPDIG_SESSID_REMOVE to false, no result. |
03-30-2004, 06:50 AM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. The main page that you are trying to index doesn't look to contain any redirects; it seems just simple HTML. Below is the start of an index of the page. What happens if you use a fresh install on new tables? What are SPIDER_MAX_LIMIT, SPIDER_DEFAULT_LIMIT, RESPIDER_LIMIT, and LIMIT_DAYS set to in the config file?
SITE : http://cordoba.ugent.be/ Exclude paths : - @NONE@ 1:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/ (time : 00:00:12) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + level 1... 2:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=Unknown (time : 00:00:57) 3:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=2003 (time : 00:01:05) 4:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=2001 (time : 00:01:13) 5:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=2002 (time : 00:01:20) 6:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=2000 (time : 00:01:28) 7:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=1998 (time : 00:01:36) 8:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=1999 (time : 00:01:43) ... Also, from the admin panel, when you click the site, then the update button, are any of the directories excluded?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-30-2004, 09:41 PM | #7 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
SPIDER_MAX_LIMIT=20, SPIDER_DEFAULT_LIMIT=3, RESPIDER_LIMIT=4, LIMIT_DAYS=7. (I never touched these.) My most recent tests were with a fresh installed 180, new database, new tables. The PHP script indeed generates straightforward simple HTML with no tricks.
Nothing on the update page seems to indicate any exclusions. Anyway, in my most recent tests, I always deleted the site and checked tempspider and text_content to be empty before my next try. The good news is that you can index my site. The problem must be with my PhpDig installation, I get the same problem (No link in temporary table ... links found : 1) if I try to index other sites... I can't even index a simple text page in the root of my website (same PC as PhpDig, cordoba): PhpDig finds it all right, but seems to ignore its content altogether: (spider table) first_words = filename of the text page, nothing more, 0 words, filesize 0; after "indexing" there are 0 keywords and one text_content file with 7 spaces. (The text page does contain a few lines of text!) Most recent changes on my PC (after last successful indexing with PhpDig): new virus checker eTrust 7.0.139 replaced McAfee, upgrade of ZoneAlarm to 4.5.538.001.
__________________
René Haentjens, Ghent University Last edited by renehaentjens; 03-30-2004 at 10:28 PM. |
03-30-2004, 10:46 PM | #8 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
I think I found it: allow_url_fopen was Off (I put it off because of a security problem) and I just spotted the place where PhpDig gets the page content, yes, you guess it: with:
$file_content = @file($uri); in function phpdigTempFile (robot_functions) See also: http://www.phpdig.net/showthread.php...llow_url_fopen Is there a checklist somewhere, with requirements for PhpDig to function correctly? Could you add some checks in the code of the next version? Last edited by renehaentjens; 03-30-2004 at 10:59 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
No link in temporary table yet again... | funeral | Troubleshooting | 2 | 04-06-2005 01:45 PM |
No link in temporary table | raphael_ita | Troubleshooting | 4 | 12-07-2004 12:25 AM |
Help Please: No link in temporary table | SystemX | Troubleshooting | 5 | 06-27-2004 10:20 PM |
No link in temporary table | gooseman | How-to Forum | 4 | 05-14-2004 02:24 AM |
No link in temporary table | michabis101 | Troubleshooting | 20 | 03-29-2004 01:08 PM |