|
11-26-2003, 11:48 PM | #1 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
Invision power board - spidering doesn't work?
Hi everyone,
I installed phpdig on my site recently and was very happy with it until i did a search for the word "beer" to test my search engine. I know the word only appears in one thread in my Invision power board forum. However I get 193 hits. The worst thing though, is that if you click on the first result, the page it takes you to does not include the word "beer" (it takes you to the members list instead). So can anyone tell me what is wrong and how I can get the spider to work correctly. Otherwise I am very happy with phpdig (thanks charter). Will I have to exclude the forum from the search? george. |
11-27-2003, 09:19 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. That's a lot of beer.
In the config.php file, set the following: Code:
define('PHPDIG_DEFAULT_INDEX',false); define('PHPDIG_SESSID_REMOVE',true); define('PHPDIG_SESSID_VAR','s');
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-29-2003, 05:20 AM | #3 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
no luck
Hi Charter,
I followed your advice. I am currently re-indexing my site (been going for about 1 hour now). I am indexing to a depth of 5.... is this OK? I was not really sure. If I do a search for "beer" I get 119 hits for beer. Again, I have the same problem where the returned pages do not actually include the word "beer". Anything else I can try? Best Regards, George Last edited by george; 11-29-2003 at 05:31 AM. |
11-29-2003, 05:30 AM | #4 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
Hi Charter, In the meantime I have just deleted the /forums folder from the admin panel so i don't confuse my users. So if you try the search engine, there are no beer hits now. But trust me the problem was still there.
Thanks George |
11-29-2003, 11:09 AM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Not sure, but I'm wondering if there is some problem with .js, .cab, and .swf files. Can you try adding js, cab, and swf into the FORBIDDEN_EXTENSIONS in the config file? Also, if this doesn't solve the problem, can you set me up with a demo IPB somewhere on your site and add a couple of posts so that I can crawl there? Do you recall if all the extra beer links went to the members list? You might try crawling http://www.domain.com/forums/ at a level of two, level one for the forum links and level two for the thread links.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-29-2003, 05:39 PM | #6 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
Hi Charter,
I added those extensions to the forbidden list (plus also .ico) since i noticed it spiders my fav.ico file and this is not necessary. I will respider and see what happens. To answer your beer question. No, each link seemed to take me to a different page but none of them included the word beer. Regards, george Last edited by george; 11-29-2003 at 06:39 PM. |
11-29-2003, 06:35 PM | #7 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
Hi Charter,
I respidered and still the same problem. I am not sure how to set up another IPB forum for you to play with. I have left the spider results so you can see what it says. Thanks, George |
12-01-2003, 12:55 PM | #8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. The problem it seems is that for almost every http://www.domain.com/forums/index.php?different=query&string=here the text from http://www.domain.com/forums/index.php is returned. It almost seems like some sort of redirect. Anyway, I can duplicate the problem with your site, but other Invision Power Boards seem to index without this trouble. Maybe there are some settings in the IPB to allow for spiders?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-01-2003, 03:10 PM | #9 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
Yes there are various spider options in the Admin panel of Invision Power Board.
The board should recognise popular spiders such as hotbot and googlebot and will log their activitity. Also, you can set the privileges for the bots. So I have my privileges set to "guest". Which means that the bot can see any page that a normal guest can see. I am getting spidered by hotbot and googlebot quite often and seems to work ok. It is weird that other IPB's work cause I have made very few modifications. I have added google ads to mt board wrapper and changed a few images and otherwise it is a standard invision board version 1.2 Here is some info from the IPB help pages: >>>>>> Search Engine Spiders Enable the search engine spider recognition? - Yes/No toggle Log all spider visits? - Yes/No toggle Treat spider/bot as part of which group? - Choose a group that these bots are to be shown under when they index the board. Show spider/bot in the active users list? - Choose whether they will be shown as anonymous. Yes/No toggle also. Call Googlebot... - Name of the Googlebot, which will be shown on the active member list. Call Microsoft / Hotbot... - Name of the Microsoft / Hotbot bot, which will be shown on the active member list. Call Lycos... - Name of the Lycos bot, which will be shown on the active member list. Call Ask Jeeves... - Name of the Ask Jeaves bot, which will be shown on the active member list. Call What U Seek... - Name of the What U Seek bot, which will be shown on the active member list. Link to this page: http://www.invisionpower.com/documen...oc.php?page=31 |
12-01-2003, 03:57 PM | #10 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Are you able to crawl other IBP forums without experiencing the problem that appears from your boards?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-02-2003, 04:49 AM | #11 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
The spidering of the threads that contain "showforum" and "showtopic" are OK. I don't really want it to spider anything else.
It would good to be able to limit spidering depending on the words in the url. Or conversely, to refuse to spider pages that contain certain phrases in the url. I will try to spider some other boards and see what happens. Thanks charter. george Last edited by george; 12-02-2003 at 05:11 AM. |
12-02-2003, 05:10 AM | #12 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
I just spidered a few pages at a different site.
I did a search for the phrase "connection speed" which is included in one of the spidered pages and it worked fine. You can try it. I will leave it on my board. With my forum, if you do a search for the word "chips" you will get a similar problem to "beer". Thanks, George |
12-04-2003, 03:55 PM | #13 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. I have not yet been able to come up with a reason why this problem appears with your IPB but not other IPBs.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-04-2003, 08:15 PM | #14 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
Hi Charter,
I have recently realised I have another problem that might be related. I was trying to install those Google ads on my IPB forum and the session ID in the url was interefering with the delivery of relevant ads. Then I found out that Internet Exploer was often not accepting cookies properly from my site. So if the cookies were not accepted it would include a session ID in the url. But now if i get the cookie working ok (by messing around with IE privacy settings) then my urls no longer contain the SESSION IDs because the session data is stored in the cookie instead. So i wonder if now i am freeing myself of session IDs in url's maybe phpDig will work better. I will try to re-spider next time i have time and let u know if things are better. If i can't get it to work it is not the end of the world cause the IPB forum has a search function anyway. i still think phpDig is very good. Thanks again, Glen |
12-05-2003, 08:37 AM | #15 |
Green Mole
Join Date: Nov 2003
Posts: 10
|
Hi,
I am re-spidering and the problem seems to be fixed! so how did I do it? Well I did a few things so not sure which achieve the end result: 1) I modified my privacy settings in internet explorer so that cookies from my website were put on the safe list. (n.b. I am running the spider.php file from within Internet Explorer) 2) I fiddled with the cookie settings in the IPB control panel. My new settings are available in the attached image. I used to have a "cookie name prefix" set but i now just leave it blank. Also I changed the "cookie path" from /forum/ to /forum Thanks for all your help charter. George |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
not work | isababa | Troubleshooting | 3 | 08-30-2005 05:54 PM |
Searching UBB Message Board | Beans | How-to Forum | 3 | 08-04-2005 09:18 AM |
Cronjob for spidering doen't work anymore with PhpDig 1.8.6 | gaam | Troubleshooting | 0 | 12-22-2004 01:28 AM |
It doesn't work | humanitaire.ws | Script Installation | 8 | 12-15-2004 04:37 AM |
This board not letting you do something? | Charter | Feedback & News | 2 | 08-04-2003 12:54 AM |