PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 03-30-2004, 05:54 AM   #1
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
No link in temporary table (yet another one)

Yet another one:

SITE : http://cordoba.ugent.be/
Exclude paths :
- @NONE@
1:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/
(time : 00:00:06)
No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://cordoba.ugent.be/LW02AC/document/Refs/Theses/
Optimizing tables...
Indexing complete !


Charter, I give up for now. I've tried a couple of suggestions from other similar posts, I've re-installed 180 from scratch. Nothing helps!
This is a site that I've indexed before, with no problems. I've changed the PHP script, I admit. Now it won't index...
http://cordoba.ugent.be/LW02AC/document/Refs/Theses/
__________________
René Haentjens, Ghent University
renehaentjens is offline   Reply With Quote
Old 03-30-2004, 06:08 AM   #2
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
Here's the result with print $answer."<br>\n"; - no Forbiddens.
I also tried the hosts file modifs (some at least), the new robot_functions, I checked tempspider, it's empty...


Spidering in progress...
HTTP/1.1 404 Not Found
HTTP/1.1 200 OK
Date: Tue, 30 Mar 2004 14:06:16 GMT
Server: Apache/1.3.27 (Win32) PHP/4.3.3
X-Powered-By: PHP/4.3.3
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Content-Type: text/html

HTTP/1.1 200 OK
Date: Tue, 30 Mar 2004 14:06:17 GMT
Server: Apache/1.3.27 (Win32) PHP/4.3.3
X-Powered-By: PHP/4.3.3
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Content-Type: text/html

HTTP/1.1 404 Not Found

--------------------------------------------------------------------------------
SITE : http://cordoba.ugent.be/
Exclude paths :
- @NONE@
HTTP/1.1 200 OK
Date: Tue, 30 Mar 2004 14:06:17 GMT
Server: Apache/1.3.27 (Win32) PHP/4.3.3
X-Powered-By: PHP/4.3.3
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Content-Type: text/html

1:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/
(time : 00:00:02)
No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://cordoba.ugent.be/LW02AC/document/Refs/Theses/
Optimizing tables...
Indexing complete !
__________________
René Haentjens, Ghent University
renehaentjens is offline   Reply With Quote
Old 03-30-2004, 06:21 AM   #3
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
After spidering, there is one file in the text_content directory (not counting keepalive.txt). It contains one line with 7 spaces, nothing else.

It doesn't help if I delete it and start all over...
renehaentjens is offline   Reply With Quote
Old 03-30-2004, 06:30 AM   #4
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
First impression from looking at the Apache access log is that spider doesn't even try to fetch the page:

157.193.197.26 - - [30/Mar/2004:16:28:51 +0200] "HEAD /robots.txt HTTP/1.1" 404 0
157.193.197.26 - - [30/Mar/2004:16:28:51 +0200] "HEAD /LW02AC/document/Refs/Theses/ HTTP/1.1" 200 0
157.193.197.26 - - [30/Mar/2004:16:28:52 +0200] "HEAD /LW02AC/document/Refs/Theses/ HTTP/1.1" 200 0
157.193.197.26 - - [30/Mar/2004:16:28:52 +0200] "HEAD /robots.txt HTTP/1.1" 404 0
157.193.197.26 - - [30/Mar/2004:16:28:52 +0200] "HEAD /LW02AC/document/Refs/Theses/ HTTP/1.1" 200 0
157.193.197.26 - admin [30/Mar/2004:16:28:52 +0200] "POST /LW02AC/180phpdig/admin/spider.php HTTP/1.1" 200 1880
157.193.197.26 - - [30/Mar/2004:16:28:53 +0200] "GET /LW02AC/180phpdig/admin/yes.gif HTTP/1.1" 304 -
renehaentjens is offline   Reply With Quote
Old 03-30-2004, 07:30 AM   #5
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
With "the new robot_functions" in my earlier reply, I mean the version dated 25 Feb 2004 (unchanged).

I also tried to put PHPDIG_SESSID_REMOVE to false, no result.
renehaentjens is offline   Reply With Quote
Old 03-30-2004, 07:50 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. The main page that you are trying to index doesn't look to contain any redirects; it seems just simple HTML. Below is the start of an index of the page. What happens if you use a fresh install on new tables? What are SPIDER_MAX_LIMIT, SPIDER_DEFAULT_LIMIT, RESPIDER_LIMIT, and LIMIT_DAYS set to in the config file?

SITE : http://cordoba.ugent.be/
Exclude paths :
- @NONE@
1:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/
(time : 00:00:12)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
level 1...
2:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=Unknown
(time : 00:00:57)

3:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=2003
(time : 00:01:05)

4:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=2001
(time : 00:01:13)

5:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=2002
(time : 00:01:20)

6:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=2000
(time : 00:01:28)

7:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=1998
(time : 00:01:36)

8:http://cordoba.ugent.be/LW02AC/document/Refs/Theses/index.php?dirpath=.%2F&row=1&item=1999
(time : 00:01:43)
...

Also, from the admin panel, when you click the site, then the update button, are any of the directories excluded?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-30-2004, 10:41 PM   #7
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
SPIDER_MAX_LIMIT=20, SPIDER_DEFAULT_LIMIT=3, RESPIDER_LIMIT=4, LIMIT_DAYS=7. (I never touched these.) My most recent tests were with a fresh installed 180, new database, new tables. The PHP script indeed generates straightforward simple HTML with no tricks.

Nothing on the update page seems to indicate any exclusions. Anyway, in my most recent tests, I always deleted the site and checked tempspider and text_content to be empty before my next try.

The good news is that you can index my site.

The problem must be with my PhpDig installation, I get the same problem (No link in temporary table ... links found : 1) if I try to index other sites...

I can't even index a simple text page in the root of my website (same PC as PhpDig, cordoba): PhpDig finds it all right, but seems to ignore its content altogether: (spider table) first_words = filename of the text page, nothing more, 0 words, filesize 0; after "indexing" there are 0 keywords and one text_content file with 7 spaces. (The text page does contain a few lines of text!)

Most recent changes on my PC (after last successful indexing with PhpDig): new virus checker eTrust 7.0.139 replaced McAfee, upgrade of ZoneAlarm to 4.5.538.001.
__________________
René Haentjens, Ghent University

Last edited by renehaentjens; 03-30-2004 at 11:28 PM.
renehaentjens is offline   Reply With Quote
Old 03-30-2004, 11:46 PM   #8
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
I think I found it: allow_url_fopen was Off (I put it off because of a security problem) and I just spotted the place where PhpDig gets the page content, yes, you guess it: with:
$file_content = @file($uri);
in function phpdigTempFile (robot_functions)

See also:
http://www.phpdig.net/showthread.php...llow_url_fopen

Is there a checklist somewhere, with requirements for PhpDig to function correctly? Could you add some checks in the code of the next version?

Last edited by renehaentjens; 03-30-2004 at 11:59 PM.
renehaentjens is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
No link in temporary table yet again... funeral Troubleshooting 2 04-06-2005 02:45 PM
No link in temporary table raphael_ita Troubleshooting 4 12-07-2004 01:25 AM
Help Please: No link in temporary table SystemX Troubleshooting 5 06-27-2004 11:20 PM
No link in temporary table gooseman How-to Forum 4 05-14-2004 03:24 AM
No link in temporary table michabis101 Troubleshooting 20 03-29-2004 02:08 PM


All times are GMT -8. The time now is 06:58 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.