|
03-09-2004, 07:33 AM | #1 | |
Green Mole
Join Date: Mar 2004
Posts: 5
|
Problem indexing site (uses mod_rewrite)
Hy there
I installed phpdig 1.8.0 on the site www.personalsite.ch After setting up the db and changing permissions I get the following (already described) output, when indexing: Quote:
Thank You for help. |
|
03-09-2004, 10:52 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi ragaller, and welcome to PhpDig.net!
Perhaps try the mod attached in this thread. Below is output with the mod and a search depth of one: SITE : http://www.personalsite.ch/ Exclude paths : - @NONE@ 1:http://www.personalsite.ch/ (time : 00:00:10) + + + + + + + level 1... 2:http://www.personalsite.ch/portfolio/ (time : 00:00:27) 3:http://www.personalsite.ch/info/ (time : 00:00:34) 4:http://www.personalsite.ch/kontakt/ (time : 00:00:41) 5:http://www.personalsite.ch/kontakt/oisjdfoijdf (time : 00:00:48) 6:http://www.personalsite.ch/webdesign/ (time : 00:00:55) 7:http://www.personalsite.ch/web-it/ (time : 00:01:02) 8:http://www.personalsite.ch/grafik/ (time : 00:01:09) No link in temporary table -------------------------------------------------------------------------------- links found : 8 http://www.personalsite.ch/ http://www.personalsite.ch/portfolio/ http://www.personalsite.ch/info/ http://www.personalsite.ch/kontakt/ http://www.personalsite.ch/kontakt/oisjdfoijdf http://www.personalsite.ch/webdesign/ http://www.personalsite.ch/web-it/ http://www.personalsite.ch/grafik/ Optimizing tables... Indexing complete !
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-10-2004, 12:48 AM | #3 | |
Green Mole
Join Date: Mar 2004
Posts: 5
|
Hi Charter!
Thank You for the answer! I got the engine working and producing the exact output You wrote in Your post for an indexing at depth one on the root level. There is a problem related to mod_rewrite: A website using mod_rewrite needs to use absolute links (or root relative ones). In the header the base part of the url ist set: Quote:
Is it possible phpdig does not read the base url and treats the links as relative ones? If so, a search with depth one at the root level works - digging deeper breaks. The following is a part of the search at depth 2, showing the problem for grafik: Jürgen |
|
03-10-2004, 09:12 AM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. For base href tags, perhaps try the code in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-11-2004, 12:56 AM | #5 |
Green Mole
Join Date: Mar 2004
Posts: 5
|
Hi Charter!
I tried indexing with the code for <base> Tag parsing (Your link in the previous post) - with or without the rewrite patch. The result for me is still the same: The links are treated as relative ones. I spidered www.personalsite.ch/grafik/ depth:1 phpdig found links like: www.personalsite.ch/grafik/grafik/portfolio/ ... --> should be: www.personalsite.ch/grafik/portfolio/ Any further ideas on this one? Maybe I set up something else the wrong way? Thank You, Jürgen p.s. personalsite.ch was off yesterday - it works now, just in case You'd like to try spidering. |
03-11-2004, 02:59 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. The code is that link won't work when the base href tag is something like <base href="http://www.personalsite.ch" /> because the regex in that code isn't matching it so something else will have to be coded. In the meantime, to get rid of the name/name directories/files just click the site, click the update button, and click the red circle noway symbol next to the bogus directories to delete and exclude them.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-13-2004, 10:16 AM | #7 |
Green Mole
Join Date: Mar 2004
Posts: 5
|
Hi Charter
I found a quick solution that seems to work for a website with root relavite links (like mine). in robot_functions after: $file_content = @file($tempfile); I added: $path = ''; I know, this is just quick and dirty workaround for my exotic case... Jürgen |
03-13-2004, 10:51 AM | #8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Ah, I see the problem. The regex wasn't matching the base href tag. Using the code in the other thread, if you change the following:
PHP Code:
PHP Code:
Remember to remove any "word" wrapping in the above code.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-16-2004, 11:22 PM | #9 |
Green Mole
Join Date: Mar 2004
Posts: 5
|
Hi Charter!
This works perfectely now for my site! Thank You so much! |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
problem: HTTP authentication versus mod_rewrite | honza | Coding & Tutorials | 0 | 02-14-2007 05:58 AM |
Problem indexing site due to backslash | F.Keniki | Troubleshooting | 1 | 12-26-2006 08:34 AM |
Problem with site indexing.... | Lamer38 | Troubleshooting | 1 | 09-11-2004 07:36 AM |
Indexing problem: PhpDig will not spider all of the site | mih | Troubleshooting | 5 | 03-25-2004 12:54 AM |
Strange indexing problem on my site | drbill | Troubleshooting | 9 | 01-01-2004 02:29 PM |