|
01-15-2004, 12:07 PM | #1 |
Green Mole
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
|
Spidering issue with my site
Hello, I'm trying to set up phpdig for a web site and I can make it spider other web sites except mine.
I have tried both locally from the command line and remotely from another server. Any time I try to spider it the web page freezes for about 30 seconds after I click on the "Dig This!" button and then goes to the result page with: Spidering in progress... SITE : http://dev.videx.com/ Exclude paths : - @NONE@ No link in temporary table links found : 0 ...Was recently indexed Optimizing tables... Indexing complete ! [Back] to admin interface. The site is, if you didn't notice , dev.videx.com and I have managed to spider other servers in our domain (like www.videx.com). I have removed the robots.txt file from the site but still have a .htaccess restricting use of the /search folder, but otherwise the site is a basic CSS / php based one on a Mac OS X 10.3 server and I am using phpdig version 1.6.2. I have modified my config file to not search through .css files, but still no luck. Any suggestions? |
01-16-2004, 01:56 PM | #2 |
Green Mole
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
|
follow up info
Anyone? Anyone? Bueler?
Well, I've done some more searching and it turns out that the spidering will hang on any Mac OS X 10.3 site that I configure (including a default site with one web page!). It works fine spidering Mac OS X 10.2 servers, however, so I think it has something to do with the Apache config on the server. The site that I can't get phpdig to spider is http://dev.videx.com/ and it is running with the following config: OS: Mac OS X 10.3 Apache: 1.3.28 PHP: 4.3.2 phpdig: 1.6.2 I have tried turning on error logging for php, but it never creates the file. My php.ini file is: include_path=".:/Library/WebServer/php" log_errors = On error_log = ".:/Library/WebServer/log.txt" error_reporting = E_ALL Feel free to attempt to spider http://dev.videx.com/ and let me know if it works |
01-18-2004, 08:28 AM | #3 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Below are the results at search depth one for http://dev.videx.com/ - When you try to crawl this site, what shows up in your Apache log files?
links found : 17 http://dev.videx.com/ http://dev.videx.com/favicon.ico http://dev.videx.com/index.html http://dev.videx.com/products/index.html http://dev.videx.com/News/index.html http://dev.videx.com/about/index.html http://dev.videx.com/products/downloads/manuals/accesscontrol/cyberaudit_manual.pdf http://dev.videx.com/products/support.html http://dev.videx.com/products/download.html http://dev.videx.com/products/listing.html http://dev.videx.com/news/tradeshows.html http://dev.videx.com/news/careers.html http://dev.videx.com/map.html http://dev.videx.com/news/press.html http://dev.videx.com/news/studies.html http://dev.videx.com/about/privacy.html http://dev.videx.com/about/contact.html Optimizing tables... Indexing complete !
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-19-2004, 09:23 AM | #4 |
Green Mole
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
|
log results
I cleared my apache logs, restarted it, and ran an index. Here are the results in the log files:
access log: 12.17.172.219 - - [19/Jan/2004:09:12:30 -0800] "GET / HTTP/1.1" 200 7404 error log: Processing config directory: /etc/httpd/sites/*.conf Processing config file: /etc/httpd/sites/0000_any_80_.conf Processing config file: /etc/httpd/sites/virtual_host_global.conf [Mon Jan 19 09:11:22 2004] [notice] Apache/1.3.28 (Darwin) PHP/4.3.2 configured -- resuming normal operations [Mon Jan 19 09:11:22 2004] [notice] Accept mutex: flock (Default: flock) It doesn't look very helpful to me. I still can't index the site from other Mac 10.3 servers. I timed the delay between when I click on the "Dig this!" button and when the spider page comes up with 0 results, and it is about 3 minutes and 20 seconds. |
01-19-2004, 10:13 AM | #5 |
Green Mole
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
|
progress
Well, I just updated my phpdig to 1.6.5 and tried out indexing the site.
It works up to a point with the web interface and then gives me the following message from the web browser: Could not open the page “http://12.17.172.219/phpdig1/admin/spider.php” after trying for 60 seconds. All the pages that it indexes up to that point are fine. I am going to try it from the command line, where the timeout should not apply. |
01-19-2004, 11:05 AM | #6 |
Green Mole
Join Date: Jan 2004
Location: Oregon, USA
Posts: 6
|
It's alive!
Everything is working fine now with phpdig 1.6.5 - apparently there was something in the php code in 1.6.2 that was causing a problem.
So, in case anyone wants to know, phpdig 1.6.5 works on Mac OS 10.3. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spidering Stops, Site is Locked after 7 Pages | nevsie | Troubleshooting | 3 | 03-02-2005 04:52 PM |
Spidering issue | cefiro | How-to Forum | 0 | 02-28-2005 10:01 AM |
phpdig blocked when spidering any site | heli | Troubleshooting | 3 | 09-30-2004 11:42 AM |
version 1.8.2, 1.8.3 doesn't fully spidering the site | Siava | Troubleshooting | 15 | 07-19-2004 02:55 AM |
Problems spidering dynamic site | Ph0nK | Troubleshooting | 1 | 01-13-2004 04:39 PM |