PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 01-10-2005, 10:53 PM   #1
djavet
Orange Mole
 
Join Date: Jan 2005
Posts: 31
robots.txt and URL

Hello,

I've a robot.txt file wich work perfectly with PhpDig.
Notice the forum folder into:
Code:
User-agent: *
Disallow: /cgi-bin/
Disallow: /flash/
Disallow: /forum/
Into my forum (PhpBB), i've a FAQ section wich I'd index with PhpDig.
It is possible to force a few URLs (in fact 19 entries) when the forum is exclude form indexing?

Thx for your help and time.
Regards; Dominique
djavet is offline   Reply With Quote
Old 01-10-2005, 11:27 PM   #2
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
From what I can see your are asking how to make PHP spider a directory that has been excluded in the robots.txt file.
From my experience of the software it is built into PHPDIG to read the robots.txt file and obey it, ethically if the web master has excluded the directory from robots would it be right to try and index it?
No doubt people with more experience than me may well answer your question or know how it's done but I would ask the question "Wouldn't that make the robot spider little more than a hacking device?"
Hopefuly someone with more knowledge will help you.
Dave A is offline   Reply With Quote
Old 01-11-2005, 12:18 AM   #3
djavet
Orange Mole
 
Join Date: Jan 2005
Posts: 31
No ambigous way in my question, but I understand what you mean and I've not think about until you talk about.
I've a forum (www.john-howe.com/forum) with a lot of section and one is about FAQ wich I wisch to include into indexing.
My question is: How can I do that?
I don't want to list into robots.txt my thousand treads

It is possible to specified at robot.txt wich URL index? I've found nothing about. Is "Allow:" supported in phpdig?

Regards, Dom

Last edited by djavet; 01-11-2005 at 12:37 AM.
djavet is offline   Reply With Quote
Old 01-11-2005, 02:05 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
First, remove "Disallow: /forum/" from your robots.txt file.

Next, go to the PhpDig admin panel and copy down the "update sites" values for your site.

Then, enter each FAQ-type URL that you want to index, one per line, in the PhpDig textbox like so:
Code:
http://www.john-howe.com/forum/phpbb/viewtopic.php?t=XXXX
http://www.john-howe.com/forum/phpbb/viewtopic.php?t=YYYY
http://www.john-howe.com/forum/phpbb/viewtopic.php?t=ZZZZ
Now, set the radio button to no, search depth to zero, links per to zero, and click the dig button.

Once PhpDig is done, go to "update sites" and edit the values back to their original settings.

Remember to add "Disallow: /forum/" back to your robots.txt file.

PhpDig currently does not understand robots.txt "allow" lines.

Also, read this documentation for further information.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-11-2005, 03:19 AM   #5
djavet
Orange Mole
 
Join Date: Jan 2005
Posts: 31
Super tricks. Thx a lots.
I will try it tonigh.

Regards, Dom
djavet is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
robots.txt seems to be ignored :? galacticvoyager Bug Tracker 1 11-12-2005 12:52 PM
robots.txt comments edkay Mod Submissions 2 03-12-2004 12:41 PM
robots.txt versus robotsxx.txt Charter IPs, SEs, & UAs 0 03-11-2004 06:00 PM
robots.txt ignored roy Troubleshooting 3 02-20-2004 08:02 PM
robots.txt renehaentjens Troubleshooting 3 12-05-2003 02:40 PM


All times are GMT -8. The time now is 05:24 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.