PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 11-05-2005, 11:05 PM   #1
nodoyuna
Green Mole
 
Join Date: Nov 2005
Posts: 8
Cannot spider my website...

When I try to spider my website no data is added to MySQL even though I get the following message...

Spidering in progress... [Stop spider]
Optimizing tables...
Indexing complete !
----------------------------------------
[Back] to admin interface.

The Update form shows no contents in the database. This is confirmed when accessing MySQL directly, all tables are empty.

By the way, I cannot add more URI's to the spider except the root directory, I have tried adding subdirectories, shtml files, sitemap.txt and only the root directory remains in the Update Site list on the right hand side (no URI's are added).

Any suggestions?
nodoyuna is offline   Reply With Quote
Old 11-06-2005, 12:24 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
If you are not on a server with load balancing, then try setting PHPDIG_IN_DOMAIN to true, LIMIT_TO_DIRECTORY to false, both in the config file, and then from the admin panel, use a large search depth, set links per to zero, and choose the no option. You can increase search depth beyond twenty by editing SPIDER_MAX_LIMIT in the config file. PhpDig doesn't work when the installation and site are on a server with load balancing, though you can try to edit the Hosts file for possible bypass. The list of sites on the right hand side is a list of domain names, so to see what is indexed for each domain, highlight a domain, click the update button, and then click a blue arrow.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-06-2005, 08:25 PM   #3
nodoyuna
Green Mole
 
Join Date: Nov 2005
Posts: 8
Thanks for the message. Prior to receiving your answer I had worked with my website provider (Affinity.com) and we were able to spider URI's on dedicated servers and always had failures on all URI's that were placed on shared (clustered) servers.

I requested for Affinity.com to add an entry to the hosts file but I was told that the TCP/IP address that I have is a fake one and not suitable for use in the hosts file [Ed. note: What???? this is way over my head!]. Affinity.com will ask one of its higher-level technicians on Monday to look into possible solutions and get back to me.

I also found another thread in this forum suggesting a solution like spidering http://www.affinity.com/~mywebsite/ and then manually editing the entries in mySQL to http://www.mywebsite.com instead. However the lower-level technician and myself were unable to find a suitable URI address for the workaround. We will also ask the high-level technician to explore this possiblity on Monday.

In conclusion, I think that my installation of PHPdig is working just fine and the problem is on the server side. Do you agree?
nodoyuna is offline   Reply With Quote
Old 11-06-2005, 10:44 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
So far I've found that one of the better ways to tell if someone is trying to index his/her site, where both the site and PhpDig are installed on a load balanced server, is to look at the stats on the PhpDig main admin page after an attempted index:
Code:
DataBase status
Hosts :            1 Entries
Pages :            0 Entries
Index :            0 Entries
Keywords :         0 Entries
Temporary table :  0 Entries
Sometimes people on load balanced servers report that they are able to index other sites, but not their own, so the zeros above are replaced with greater than zero numbers. Basically PhpDig and some other such spider search combos bite, in their current form, when it comes to load balancing.

Here's a really basic history: The way it worked, way back when, is that each site had its own number, but with web growth, there needed to be a way to deal with all the webpage requests. Fast forward, and now there are clusters of web servers that have their own number, and some software exists to process the number of requests.

In short, that means, while you can call http://www.yoursite.com from a browser, there's some, erm, mystery that goes on for that to happen, and it no longer depends on a unique number for each site. Rather, it's a unique number for each cluster of servers, and that, uh, mystery directs browser requests where they are to go.

Load balancing takes webpage requests, using something in the realm of mystery, and directs requests to the best, i.e., less busy server, and feeds the webpage you see onscreen. This process makes it impossible for PhpDig and some other such spider search combos, in their current form, to find where they are supposed to go to do their stuff.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-07-2005, 08:06 AM   #5
nodoyuna
Green Mole
 
Join Date: Nov 2005
Posts: 8
Great explanation... a couple of questions

Great non-techie explanation, thanks. The display that you provided is exactly what shows up on phpdig when I try to spider my website. You've hit the nail in the head; this is caused by a cluster server.

A couple of questions:
1) Why can Google, Yahoo, MSN, etc spider my website and phpdig cannot?
2) What "MINOR" request can I ask Affinity.com to add to their cluster for phpdig to work?

AL

Last edited by nodoyuna; 11-07-2005 at 08:08 AM.
nodoyuna is offline   Reply With Quote
Old 11-07-2005, 01:09 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
1) Google, Yahoo, MSN, etcetera can index your site because their software is not installed on your server. PhpDig is however installed on your server. Think of the issue with PhpDig as a feedback loop: PhpDig installed on your server, trying to index a site on your server, gets lost in the process.

2) You can ask your host if they will let you test out a private IP address for free, and have them use that IP in the Hosts file. If/when they set that up, have them let you know so you can do a test index of your site. If it works, you will probably need to pay your host for the private IP address.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-09-2005, 05:44 PM   #7
nodoyuna
Green Mole
 
Join Date: Nov 2005
Posts: 8
Affinity.com - still unable to spider...

Affinity.com tried to help me and spent considerable amount of time on the tel with me. They accessed my phpdig/spider and indexed various websites on their server (both dedicated servers and shared servers). All the dedicated server websites were spidered correctly. All the shared server websites, like mine, failed as expected.

Affinity.com does not have a method for:
1) Accessing my website via www.affinity.com/~mywebsite/
2) Providing a real fixed TCP/IP address on a temporary basis without a costly upgrade.
Affinity.com also played around with the robots.txt and hosts file in my website and none of the changes allowed phpdig to spider my website. In essence, after considerable time on the part of Affinity.com, we reached the end of the line and there is no solution.

It occurs to me that a solution is to have someone from the outside spider my website with phpdig and ftp the mysql database to my website. Does this make sense? Will it work? Can someone in this board do it for me?

I would hate to have spent all this time installing and troubleshooting the application to have it fail.

Thanks
nodoyuna is offline   Reply With Quote
Old 11-09-2005, 06:00 PM   #8
nodoyuna
Green Mole
 
Join Date: Nov 2005
Posts: 8
PhpDig v.1.8.7

I forgot to add that I have installed PhpDig v.1.8.7 .

Nodoyuna
nodoyuna is offline   Reply With Quote
Old 11-10-2005, 10:44 PM   #9
nodoyuna
Green Mole
 
Join Date: Nov 2005
Posts: 8
Deafening silence....

Let me rephrase the question to elicit an answer without a commitment.

Will it work to have someone else index the site for me and FTP the resulting MySQL database?

If this proposal is a viable solution, then I will have someone spider my website from the outside, the responder will not have to answer this part of the question.

Thanks
nodoyuna is offline   Reply With Quote
Old 11-10-2005, 11:11 PM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
You could do that or install Apache, MySQL, PHP, and PhpDig on the machine in front of you (if you are on Windows, check out easyphp.org) and index. Either way once indexed make a MySQL dump of the PhpDig tables and install the dump server-side, and then FTP over the PhpDig files (including any files created from the index) to the server, changing your database information in the connect.php file, so you can do searches server-side.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-11-2005, 12:26 AM   #11
nodoyuna
Green Mole
 
Join Date: Nov 2005
Posts: 8
Connect.php...???

Thanks for the reply

...I am a bit puzzled by the portion about changing the Connect.php. I assume that the windows phpdig from my PC will spider the website and it indeed will have its own connect.php.

However, I belive that the server side already has its own connect.php and it should be configured to work server side already.

Pardon my ignorance, but could you be more specific about what change I need to make?

Thanks,

AL
nodoyuna is offline   Reply With Quote
Old 11-11-2005, 10:07 AM   #12
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Quote:
However, I belive that the server side already has its own connect.php and it should be configured to work server side already.
Correct, but just in case the 'server-side' connect.php gets overwritten with the 'local' connect.php, all you need to do is make sure that the following in connect.php matches whatever database you are using:
Code:
    define('PHPDIG_DB_PREFIX','<dbprefix>');
    define('PHPDIG_DB_HOST','<host>');
    define('PHPDIG_DB_USER','<user>');
    define('PHPDIG_DB_PASS','<pass>');
    define('PHPDIG_DB_NAME','<database>');
Example:
Code:
    define('PHPDIG_DB_PREFIX','phpdig_');
    define('PHPDIG_DB_HOST','localhost');
    define('PHPDIG_DB_USER','username');
    define('PHPDIG_DB_PASS','password');
    define('PHPDIG_DB_NAME','database');
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-11-2005, 08:32 PM   #13
nodoyuna
Green Mole
 
Join Date: Nov 2005
Posts: 8
connect overwrite fix....

Thanks....
nodoyuna is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unable to Spider Corporate Website jigr69 Troubleshooting 1 12-01-2006 01:42 AM
I cannot update my website humanitaire.ws How-to Forum 7 01-19-2005 10:00 AM
Website invisible christophe How-to Forum 5 01-03-2005 03:55 PM
crawl my website only hula Troubleshooting 0 08-29-2004 04:37 AM
Insert PhpDig in a website pbpub Troubleshooting 6 06-30-2004 05:26 AM


All times are GMT -8. The time now is 07:50 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.