![]() |
Choosy about domains?
Hi, for the last few days I've been spidering without a single hitch, until today. The last website I tried to spider has the .ph domain and I wonder if that could be the reason it could not be spidered. If you could try it out for me, the URL is http://www.birdwatch.ph ..
And lastly, I also noticed that when I spider a site that is hosted under Geocities, the site_url becomes www.geocities.com without including the folder where the site really is. (e.g. www.geocities.com/mysite). Is there a way around this? It may seem like a weird request but I really really need it to be this way coz I'm working on a hack that will benefit from it. Thanks in advance!! |
Hi. What message did you get when you tried to crawl birdwatch.ph? Does setting PHPDIG_DEFAULT_INDEX to false in the config file have any effect?
|
I already tried that yesterday, but didn't work. Actually, when I try to spider the site, it times out and would seem like nothing's happened. When I refresh the admin page, the URL is added to the list however no page is crawled.
Any ideas about my other question? Thanks. |
I'm actually curious about druesome's second question as well, and found this thread searching for the answer, but no answer yet. Why does phpDig erase the folder name to a site when it stores the URL? I just searched http://gino.go-gaia.com/forum and it worked well, sticking to that directory, but in the admin panel the link has the forum directory removed. Sorry if this is an easy question but can I make phpDig leave the format of the URL I spidered alone? So that if I spider http://gino.go-gaia.com/forum then that URL will be in the sites table? Thanks.
|
Hi. As to birdwatch.ph what do you get onscreen when you uncomment //print $answer."<br>\n"; in the robot_functions.php file?
WRT the admin index page, it shows only the site, domain or subdomain as the case may be. This is based off of parse_url (see below code). To view the directories/branches for a specific (sub)domain, just click the site and then click the update button. PHP Code:
|
How about if I wanted to store the directory information exactly as entered in the spider script in the spider's "sites" table? Or am I missing something...
|
Hi. To get a feel for how it works, look through the tables and see how the domain is stored in the sites table and path/file info is stored in the spider/tempspider/excludes tables, and then search the robot_functions.php file for the parse_url function.
|
Thanks Charter - my host lost all MySQL for about a week (no explaination why) so I haven't been able to try this, but I will ASAP. Thanks for pointing me in the right direction.
|
All times are GMT -8. The time now is 01:44 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.