|
11-25-2003, 08:30 AM | #1 |
Orange Mole
Join Date: Nov 2003
Posts: 41
|
index only HTML files
a have indexed my site and it indexes .html and .swf files,
it also indexes the file directory. i.e.: '-' but i just want the html files to be indexed is there a way of setting this if so how and where because i cant find it anywhere, the '-' index links are the biggest problem, the swf files don'e really matter, can anyone please help me!!! cheers, alex |
11-25-2003, 10:17 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. You might try adding a robots.txt file in web root with the following, assuming it's the index.html to the main site that you don't want to crawl:
User-agent: PhpDig Disallow: index.html To remove the '-' index links that were crawled, go to the admin panel, click a site, click the update button, click a blue arrow, and on the right side, click a red X for those links you want to delete. Another option, if you have shell access, would be to crawl via command line using a text file, where only the links you want crawled are in the text file, one per line. There are three options in the config file (SPIDER_MAX_LIMIT, SPIDER_DEFAULT_LIMIT, RESPIDER_LIMIT) that can be set to limit the number of levels crawled when using shell to index.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-26-2003, 01:41 AM | #3 |
Orange Mole
Join Date: Nov 2003
Posts: 41
|
not what i meant
cheers, i meant that within each folder it spiders there is three results for example:
- hello.txt hello.swf the top result is just a '-' but it links to the folder itself so you get a kind of ftp page not a html page, the swf doesn't really matter because i dont think it appears as a result in any searches. but when i said index i meant the ftp version of the folder in question does that make sense, surely there is a way of tellin phpdig to ONLY index html files and no folders or files without the .html file type hope this makes more sense, alex. |
11-26-2003, 09:16 AM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Is there a link from dir/filename.html to just dir/ in the filename.html files? What are the filenames of the html files? You might try setting a .htaccess file in web root with the following as the first line:
Options -Indexes For the swf files, try adding swf to the FORBIDDEN_EXTENSIONS list in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-26-2003, 03:50 PM | #5 |
Orange Mole
Join Date: Nov 2003
Posts: 41
|
i dont think there are links from filenames.html
i dont think there are links from filenames.html to dir/
couldn't i add '-' to the forbidden extentions list or will that just mess it all up? the html files are named by regions and towns in england, i.e. 'norwich.html', they are not called 'index.html' if thats what you were thinking perhaps. do you get these directory indexs in you spider results? alex |
11-26-2003, 04:32 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. I wouldn't add '-' to the forbidden extentions because it isn't an extension; it's just a representation for domain.com. Yes, I do get '-' in my results. Did using the .htaccess file work?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-26-2003, 04:38 PM | #7 |
Orange Mole
Join Date: Nov 2003
Posts: 41
|
i don't know how to get .htaccess files made or added to my site root, do you get the 'index of blah blah blah' in your search results?
if i type index of into my search field and click go i get a huge list of search results made up of the pages i don't want listed do you get the same? |
11-26-2003, 04:58 PM | #8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. No, I don't get that because I don't allow directory listings. The attached zip file contains a .htaccess file. Just FTP the .htaccess file to your web root in ASCII mode.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-26-2003, 05:01 PM | #9 |
Orange Mole
Join Date: Nov 2003
Posts: 41
|
cheers, o.k. i'll have to ask my domain hosts to tell me what my root is because they have set it up and may have changed things round a bit, i'll reply and tell how it goes, cheers!
alex |
11-26-2003, 05:04 PM | #10 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. The place to FTP the file is the same place the main index.html file would go for your site. For instance, if your main site page is domain.com/index.html, then the web root is where this index.html file resides.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-27-2003, 02:32 AM | #11 |
Orange Mole
Join Date: Nov 2003
Posts: 41
|
aargghh!!! if i place the htaccess file on the server it restricts access to the phpdig administration page, even if i rename the index.php page it still wont allow access.
if that had workied it would've been cool, sorry. any other ideas on how to avoid this problem, i'd deal with it normally but the index directories get in the way of the actuall relevent results of the serach you see. cheers, alex |
11-27-2003, 09:03 AM | #12 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Another option would be to make one filename.html that links to the files that you want crawled and index filename.html at level one. After the index is done, just go to the admin panel, click a site, click the update button, click a blue arrow, and delete the '-' on the right hand side.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-01-2003, 11:34 AM | #13 |
Orange Mole
Join Date: Nov 2003
Posts: 41
|
sorry for the long wait,
but i have constantly changing html pages, new ones created regularly, so i need to somehow disable the index directories from being indexed, there must be something in the phpdig that tells the engine to search and display these pages else they wouldn't be indexed, who may know how to find and disable such a function? thankyou, alex |
12-01-2003, 01:13 PM | #14 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Do your HTML pages link to the index directories?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-01-2003, 01:25 PM | #15 |
Orange Mole
Join Date: Nov 2003
Posts: 41
|
no
the pages i have made do not link to the index directory, i was wondering the same thing,
the link created by the spider/indexer is a link to the directory not a html file, phpdig is finding the index of a folder and displaying it as a link: see here is an example i have taken from the site: http://www.robotstxt.org/wc/ the '-' i get is linking to pages the same as the above link: so if i search my 'cars' html page on my site i get results that link to addresses like: (these are made up examples) 'cars/cars.html' and 'cars/' |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
index only *.doc files ? | ipguy | Troubleshooting | 1 | 01-16-2006 04:45 PM |
How to index only local directory files? | sf44 | How-to Forum | 0 | 01-28-2005 03:56 AM |
How to make phpdig index certain content, located in certain html tags?! | r3m | How-to Forum | 1 | 11-18-2004 06:27 PM |
Can only index files in a single directory | gcrachel | Troubleshooting | 5 | 09-28-2004 07:23 AM |
Index on html pages build by template | Magnetic Core | How-to Forum | 1 | 09-07-2004 11:06 AM |