![]() |
Problem indexing site (uses mod_rewrite)
Hy there
I installed phpdig 1.8.0 on the site www.personalsite.ch After setting up the db and changing permissions I get the following (already described) output, when indexing: Quote:
Thank You for help. |
Hi ragaller, and welcome to PhpDig.net!
Perhaps try the mod attached in this thread. Below is output with the mod and a search depth of one: SITE : http://www.personalsite.ch/ Exclude paths : - @NONE@ 1:http://www.personalsite.ch/ (time : 00:00:10) + + + + + + + level 1... 2:http://www.personalsite.ch/portfolio/ (time : 00:00:27) 3:http://www.personalsite.ch/info/ (time : 00:00:34) 4:http://www.personalsite.ch/kontakt/ (time : 00:00:41) 5:http://www.personalsite.ch/kontakt/oisjdfoijdf (time : 00:00:48) 6:http://www.personalsite.ch/webdesign/ (time : 00:00:55) 7:http://www.personalsite.ch/web-it/ (time : 00:01:02) 8:http://www.personalsite.ch/grafik/ (time : 00:01:09) No link in temporary table -------------------------------------------------------------------------------- links found : 8 http://www.personalsite.ch/ http://www.personalsite.ch/portfolio/ http://www.personalsite.ch/info/ http://www.personalsite.ch/kontakt/ http://www.personalsite.ch/kontakt/oisjdfoijdf http://www.personalsite.ch/webdesign/ http://www.personalsite.ch/web-it/ http://www.personalsite.ch/grafik/ Optimizing tables... Indexing complete ! |
Hi Charter!
Thank You for the answer! I got the engine working and producing the exact output You wrote in Your post for an indexing at depth one on the root level. There is a problem related to mod_rewrite: A website using mod_rewrite needs to use absolute links (or root relative ones). In the header the base part of the url ist set: Quote:
Is it possible phpdig does not read the base url and treats the links as relative ones? If so, a search with depth one at the root level works - digging deeper breaks. The following is a part of the search at depth 2, showing the problem for grafik: Jürgen |
Hi. For base href tags, perhaps try the code in this thread.
|
Hi Charter!
I tried indexing with the code for <base> Tag parsing (Your link in the previous post) - with or without the rewrite patch. The result for me is still the same: The links are treated as relative ones. I spidered www.personalsite.ch/grafik/ depth:1 phpdig found links like: www.personalsite.ch/grafik/grafik/portfolio/ ... --> should be: www.personalsite.ch/grafik/portfolio/ Any further ideas on this one? Maybe I set up something else the wrong way? Thank You, Jürgen p.s. personalsite.ch was off yesterday - it works now, just in case You'd like to try spidering. |
Hi. The code is that link won't work when the base href tag is something like <base href="http://www.personalsite.ch" /> because the regex in that code isn't matching it so something else will have to be coded. In the meantime, to get rid of the name/name directories/files just click the site, click the update button, and click the red circle noway symbol next to the bogus directories to delete and exclude them.
|
Hi Charter
I found a quick solution that seems to work for a website with root relavite links (like mine). in robot_functions after: $file_content = @file($tempfile); I added: $path = ''; I know, this is just quick and dirty workaround for my exotic case... Jürgen |
Hi. Ah, I see the problem. The regex wasn't matching the base href tag. Using the code in the other thread, if you change the following:
PHP Code:
PHP Code:
Remember to remove any "word" wrapping in the above code. |
Hi Charter!
This works perfectely now for my site! Thank You so much! |
All times are GMT -8. The time now is 01:27 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.