|
01-10-2005, 10:53 PM | #1 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
robots.txt and URL
Hello,
I've a robot.txt file wich work perfectly with PhpDig. Notice the forum folder into: Code:
User-agent: * Disallow: /cgi-bin/ Disallow: /flash/ Disallow: /forum/ It is possible to force a few URLs (in fact 19 entries) when the forum is exclude form indexing? Thx for your help and time. Regards; Dominique |
01-10-2005, 11:27 PM | #2 |
Purple Mole
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
|
From what I can see your are asking how to make PHP spider a directory that has been excluded in the robots.txt file.
From my experience of the software it is built into PHPDIG to read the robots.txt file and obey it, ethically if the web master has excluded the directory from robots would it be right to try and index it? No doubt people with more experience than me may well answer your question or know how it's done but I would ask the question "Wouldn't that make the robot spider little more than a hacking device?" Hopefuly someone with more knowledge will help you. |
01-11-2005, 12:18 AM | #3 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
No ambigous way in my question, but I understand what you mean and I've not think about until you talk about.
I've a forum (www.john-howe.com/forum) with a lot of section and one is about FAQ wich I wisch to include into indexing. My question is: How can I do that? I don't want to list into robots.txt my thousand treads It is possible to specified at robot.txt wich URL index? I've found nothing about. Is "Allow:" supported in phpdig? Regards, Dom Last edited by djavet; 01-11-2005 at 12:37 AM. |
01-11-2005, 02:05 AM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
First, remove "Disallow: /forum/" from your robots.txt file.
Next, go to the PhpDig admin panel and copy down the "update sites" values for your site. Then, enter each FAQ-type URL that you want to index, one per line, in the PhpDig textbox like so: Code:
http://www.john-howe.com/forum/phpbb/viewtopic.php?t=XXXX http://www.john-howe.com/forum/phpbb/viewtopic.php?t=YYYY http://www.john-howe.com/forum/phpbb/viewtopic.php?t=ZZZZ Once PhpDig is done, go to "update sites" and edit the values back to their original settings. Remember to add "Disallow: /forum/" back to your robots.txt file. PhpDig currently does not understand robots.txt "allow" lines. Also, read this documentation for further information.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-11-2005, 03:19 AM | #5 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
Super tricks. Thx a lots.
I will try it tonigh. Regards, Dom |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
robots.txt seems to be ignored :? | galacticvoyager | Bug Tracker | 1 | 11-12-2005 12:52 PM |
robots.txt comments | edkay | Mod Submissions | 2 | 03-12-2004 12:41 PM |
robots.txt versus robotsxx.txt | Charter | IPs, SEs, & UAs | 0 | 03-11-2004 06:00 PM |
robots.txt ignored | roy | Troubleshooting | 3 | 02-20-2004 08:02 PM |
robots.txt | renehaentjens | Troubleshooting | 3 | 12-05-2003 02:40 PM |