PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 12-14-2004, 08:09 AM   #1
Juri Savin
Green Mole
 
Join Date: Dec 2004
Posts: 4
1.8.4 spider doesn't handle some files :(

Hello!

I have used PhpDig since version 1.6.x, and everything was OK. But ones upon a time I added new .php file and got the problem. And now I have no ideas how it can be solved.

My situation. File named import.php represents a catalogue of special equipment. There are lot of web pages based upon import.php with URLs like import.php?id=XXX. All these links are visible in the HTML output of import.php with any id. But the links are visible to human's eyes, not for spider.

Code of the links looks like <a href="/catalogue/import.php?id=1062">. Another script with same ideology was indexed successfully. This one -- wasn't.

I upgraded the version of PhpDig up to 1.8.4. No result.

Ok. I turned on Apache mod_rewrite module and changed my script import.php into a virtual directory /catalogue/import/ with URLs like <a href="/catalogue/import/?id=1062">. Same result.

I read just everithing on the phorum about "can't index" problem. Tried different solutions. Read documentation too. Same result.

Could somebody help?
Juri Savin is offline   Reply With Quote
Old 12-14-2004, 01:19 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
First, upgrade to v.1.8.5 for security reasons. Next, maybe this might help?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-15-2004, 08:14 AM   #3
Juri Savin
Green Mole
 
Join Date: Dec 2004
Posts: 4
So, first a upgraded to 1.8.5, got improved security and the same problem. Then I read the post you adviced me (thank you). But as a PHP-programmer I don't quite understand, what did the author of that post meaned.

PhpDig works via http requests like a usual web browser. It doesn't read files directly via ftp. So, there is no difference to PhpDig how the link was generated. Because PhpDig can see just the output of web server, not source code.

Today I used my rest chance - I converted all trouble links into virtual files (mod_rewrite). Now the links looks like <a href="/catalogue/import/111.html"></a>. And... It does't work again!

Over and over again I get this:

<citation>
SITE : http://www.loip.ru/
Exclude paths :
- @NONE@
1:http://www.loip.ru/catalogue/import/
(time : 00:00:06)
No link in temporary table
</citation>

The page I tried to index is here: http://www.loip.ru/catalogue/import/

Maybe a bug in the html source? Or in PhpDig engine?
Juri Savin is offline   Reply With Quote
Old 12-15-2004, 08:22 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Try search depth to large number, links per to zero, LIMIT_TO_DIRECTORY to false.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-23-2004, 06:37 AM   #5
Juri Savin
Green Mole
 
Join Date: Dec 2004
Posts: 4
Quote:
Originally Posted by Charter
Try search depth to large number, links per to zero, LIMIT_TO_DIRECTORY to false.
I upgraded PhpDig version to 1.8.6, set search depth to 2000 (does it large enough?), links per to zero, LIMIT_TO_DIRECTORY to false. It fails again:

SITE : http://www.loip.ru/
Exclude paths :
- @NONE@
1:http://www.loip.ru/catalogue/import/
(time : 00:00:06)
No link in temporary table
links found : 1
Juri Savin is offline   Reply With Quote
Old 12-23-2004, 06:52 AM   #6
Juri Savin
Green Mole
 
Join Date: Dec 2004
Posts: 4
By the way, version 1.8.6 fails to index entire site http://www.loip.ru/.
Juri Savin is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spider SWF and GIF files Niels01 How-to Forum 1 12-26-2006 08:40 AM
Max size it can handle. Dave A The Mole Hole 0 12-08-2006 12:52 PM
Anbody else can't get it to spider Miva Merchant files? dreamingdigital Troubleshooting 5 04-27-2006 10:26 PM


All times are GMT -8. The time now is 02:40 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.