PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 01-15-2004, 11:07 AM   #1
pager
Green Mole
 
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
Spidering issue with my site

Hello, I'm trying to set up phpdig for a web site and I can make it spider other web sites except mine.

I have tried both locally from the command line and remotely from another server.

Any time I try to spider it the web page freezes for about 30 seconds after I click on the "Dig This!" button and then goes to the result page with:

Spidering in progress...

SITE : http://dev.videx.com/
Exclude paths :
- @NONE@
No link in temporary table


links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
[Back] to admin interface.

The site is, if you didn't notice , dev.videx.com and I have managed to spider other servers in our domain (like www.videx.com).

I have removed the robots.txt file from the site but still have a .htaccess restricting use of the /search folder, but otherwise the site is a basic CSS / php based one on a Mac OS X 10.3 server and I am using phpdig version 1.6.2.

I have modified my config file to not search through .css files, but still no luck.

Any suggestions?
pager is offline   Reply With Quote
Old 01-16-2004, 12:56 PM   #2
pager
Green Mole
 
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
follow up info

Anyone? Anyone? Bueler?

Well, I've done some more searching and it turns out that the spidering will hang on any Mac OS X 10.3 site that I configure (including a default site with one web page!).

It works fine spidering Mac OS X 10.2 servers, however, so I think it has something to do with the Apache config on the server.

The site that I can't get phpdig to spider is http://dev.videx.com/ and it is running with the following config:

OS: Mac OS X 10.3
Apache: 1.3.28
PHP: 4.3.2
phpdig: 1.6.2

I have tried turning on error logging for php, but it never creates the file. My php.ini file is:

include_path=".:/Library/WebServer/php"
log_errors = On
error_log = ".:/Library/WebServer/log.txt"
error_reporting = E_ALL

Feel free to attempt to spider http://dev.videx.com/ and let me know if it works
pager is offline   Reply With Quote
Old 01-18-2004, 07:28 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Below are the results at search depth one for http://dev.videx.com/ - When you try to crawl this site, what shows up in your Apache log files?

links found : 17
http://dev.videx.com/
http://dev.videx.com/favicon.ico
http://dev.videx.com/index.html
http://dev.videx.com/products/index.html
http://dev.videx.com/News/index.html
http://dev.videx.com/about/index.html
http://dev.videx.com/products/downloads/manuals/accesscontrol/cyberaudit_manual.pdf
http://dev.videx.com/products/support.html
http://dev.videx.com/products/download.html
http://dev.videx.com/products/listing.html
http://dev.videx.com/news/tradeshows.html
http://dev.videx.com/news/careers.html
http://dev.videx.com/map.html
http://dev.videx.com/news/press.html
http://dev.videx.com/news/studies.html
http://dev.videx.com/about/privacy.html
http://dev.videx.com/about/contact.html
Optimizing tables...
Indexing complete !
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-19-2004, 08:23 AM   #4
pager
Green Mole
 
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
log results

I cleared my apache logs, restarted it, and ran an index. Here are the results in the log files:

access log:

12.17.172.219 - - [19/Jan/2004:09:12:30 -0800] "GET / HTTP/1.1" 200 7404


error log:

Processing config directory: /etc/httpd/sites/*.conf
Processing config file: /etc/httpd/sites/0000_any_80_.conf
Processing config file: /etc/httpd/sites/virtual_host_global.conf
[Mon Jan 19 09:11:22 2004] [notice] Apache/1.3.28 (Darwin) PHP/4.3.2 configured -- resuming normal operations
[Mon Jan 19 09:11:22 2004] [notice] Accept mutex: flock (Default: flock)

It doesn't look very helpful to me.

I still can't index the site from other Mac 10.3 servers. I timed the delay between when I click on the "Dig this!" button and when the spider page comes up with 0 results, and it is about 3 minutes and 20 seconds.
pager is offline   Reply With Quote
Old 01-19-2004, 09:13 AM   #5
pager
Green Mole
 
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
Smile progress

Well, I just updated my phpdig to 1.6.5 and tried out indexing the site.

It works up to a point with the web interface and then gives me the following message from the web browser:

Could not open the page “http://12.17.172.219/phpdig1/admin/spider.php” after trying for 60 seconds.

All the pages that it indexes up to that point are fine. I am going to try it from the command line, where the timeout should not apply.
pager is offline   Reply With Quote
Old 01-19-2004, 10:05 AM   #6
pager
Green Mole
 
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
Talking It's alive!

Everything is working fine now with phpdig 1.6.5 - apparently there was something in the php code in 1.6.2 that was causing a problem.

So, in case anyone wants to know, phpdig 1.6.5 works on Mac OS 10.3.
pager is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spidering Stops, Site is Locked after 7 Pages nevsie Troubleshooting 3 03-02-2005 03:52 PM
Spidering issue cefiro How-to Forum 0 02-28-2005 09:01 AM
phpdig blocked when spidering any site heli Troubleshooting 3 09-30-2004 10:42 AM
version 1.8.2, 1.8.3 doesn't fully spidering the site Siava Troubleshooting 15 07-19-2004 01:55 AM
Problems spidering dynamic site Ph0nK Troubleshooting 1 01-13-2004 03:39 PM


All times are GMT -8. The time now is 02:50 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.