|
10-19-2003, 08:40 AM | #1 |
Orange Mole
Join Date: Oct 2003
Posts: 30
|
Choosy about domains?
Hi, for the last few days I've been spidering without a single hitch, until today. The last website I tried to spider has the .ph domain and I wonder if that could be the reason it could not be spidered. If you could try it out for me, the URL is http://www.birdwatch.ph ..
And lastly, I also noticed that when I spider a site that is hosted under Geocities, the site_url becomes www.geocities.com without including the folder where the site really is. (e.g. www.geocities.com/mysite). Is there a way around this? It may seem like a weird request but I really really need it to be this way coz I'm working on a hack that will benefit from it. Thanks in advance!! |
10-19-2003, 11:31 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. What message did you get when you tried to crawl birdwatch.ph? Does setting PHPDIG_DEFAULT_INDEX to false in the config file have any effect?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
10-19-2003, 09:26 PM | #3 |
Orange Mole
Join Date: Oct 2003
Posts: 30
|
I already tried that yesterday, but didn't work. Actually, when I try to spider the site, it times out and would seem like nothing's happened. When I refresh the admin page, the URL is added to the list however no page is crawled.
Any ideas about my other question? Thanks. |
04-19-2004, 05:47 PM | #4 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
I'm actually curious about druesome's second question as well, and found this thread searching for the answer, but no answer yet. Why does phpDig erase the folder name to a site when it stores the URL? I just searched http://gino.go-gaia.com/forum and it worked well, sticking to that directory, but in the admin panel the link has the forum directory removed. Sorry if this is an easy question but can I make phpDig leave the format of the URL I spidered alone? So that if I spider http://gino.go-gaia.com/forum then that URL will be in the sites table? Thanks.
__________________
Foundmyself.com artist community, art galleries |
04-20-2004, 12:49 PM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. As to birdwatch.ph what do you get onscreen when you uncomment //print $answer."<br>\n"; in the robot_functions.php file?
WRT the admin index page, it shows only the site, domain or subdomain as the case may be. This is based off of parse_url (see below code). To view the directories/branches for a specific (sub)domain, just click the site and then click the update button. PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
04-20-2004, 04:52 PM | #6 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
How about if I wanted to store the directory information exactly as entered in the spider script in the spider's "sites" table? Or am I missing something...
__________________
Foundmyself.com artist community, art galleries |
04-20-2004, 05:28 PM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. To get a feel for how it works, look through the tables and see how the domain is stored in the sites table and path/file info is stored in the spider/tempspider/excludes tables, and then search the robot_functions.php file for the parse_url function.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
04-30-2004, 12:39 AM | #8 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
Thanks Charter - my host lost all MySQL for about a week (no explaination why) so I haven't been able to try this, but I will ASAP. Thanks for pointing me in the right direction.
__________________
Foundmyself.com artist community, art galleries |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Banned Domains | JLutterklas | How-to Forum | 0 | 09-05-2006 10:38 AM |
Blocking particular hosts or domains? | cewyattjr | How-to Forum | 0 | 06-09-2006 11:49 AM |
Blocking domains | richwilson | How-to Forum | 0 | 03-29-2006 06:02 AM |
Newbie on Domains: Yes or No Answer Please :) | new2dev | How-to Forum | 1 | 03-01-2005 11:24 PM |
Working with Domains | bazarin | How-to Forum | 1 | 02-28-2004 03:28 PM |