|
10-10-2003, 01:10 AM | #1 |
Green Mole
Join Date: Oct 2003
Location: Püttlingen (Saar) - Germany
Posts: 8
|
Not able to index [some site]...
Hi,
I think i recently found a bug in the indexer. Yesterday I tried to index the site http://www.rover-club-berlin.com/ . It is not possible to index this site completly (about 550 pages) - only the first page gets indexed. The problem is that the website author does not have a correct markup style (I think this at least). Other indexers (phpCMS indexer, isearch) can spider this site correctly. So i come to the conclusion that there ist some bug in phpDig. I have also another problem with the site http://www.rover-club-hessen.de/ . It is possible to index the first level but for example the pages below "Mitglieder" (Members) will not indexed. At the moment I have no idea where the bug is. I don't know if it is a bug in phpDig or a bad markup style. I was able to index this site with the indexer of phpcms and isearch completly Bernhard |
10-10-2003, 11:07 AM | #2 |
Purple Mole
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
|
First Site has 34 BAD Errors in Validator W3C and is full of JAVA
Line 25, column 17: "FRAMESET" not finished but containing element ended Line 16, column 15: end tag for "HEAD" which is not finished - something to much for phpDig Second Site works fine: Code:
links found : 40 http://www.rover-club-hessen.de/ http://www.rover-club-hessen.de/NOMATCH http://www.rover-club-hessen.de/burningbook/guestbook.php http://www.rover-club-hessen.de/rchevo_2/html/about.htm http://www.rover-club-hessen.de/burningbook/?page=2 http://www.rover-club-hessen.de/burningbook/?page=3 http://www.rover-club-hessen.de/burningbook/gbae.php http://www.rover-club-hessen.de/rchevo_2/html/members.htm http://www.rover-club-hessen.de/rchevo_2/html/meetings.htm http://www.rover-club-hessen.de/rchevo_2/html/forum.htm http://www.rover-club-hessen.de/rchevo_2/html/tutorials.htm http://www.rover-club-hessen.de/rchevo_2/html/links.htm http://www.rover-club-hessen.de/guestbook.php http://www.rover-club-hessen.de/rchevo_2/ http://www.rover-club-hessen.de/burningbook/guestbook.php?a20198 http://www.rover-club-hessen.de/burningbook/?page=1 http://www.rover-club-hessen.de/burningbook/ http://www.rover-club-hessen.de/burningbook/help.php http://www.rover-club-hessen.de/rchevo_2/html/spacecake.htm http://www.rover-club-hessen.de/rchevo_2/html/geisterfahrer.htm http://www.rover-club-hessen.de/rchevo_2/html/dermeister.htm http://www.rover-club-hessen.de/rchevo_2/html/hometown.htm http://www.rover-club-hessen.de/rchevo_2/html/disasterman.htm http://www.rover-club-hessen.de/rchevo_2/html/dirty-t.htm http://www.rover-club-hessen.de/rchevo_2/html/thunderdome.htm http://www.rover-club-hessen.de/rchevo_2/html/thunderdine.htm http://www.rover-club-hessen.de/rchevo_2/html/englischepatient.htm http://www.rover-club-hessen.de/rchevo_2/html/fastrabbit.htm http://www.rover-club-hessen.de/rchevo_2/html/butterflyel.htm http://www.rover-club-hessen.de/rchevo_2/html/frosty.htm http://www.rover-club-hessen.de/rchevo_2/html/joker.htm http://www.rover-club-hessen.de/rchevo_2/html/dragon.htm http://www.rover-club-hessen.de/rchevo_2/html/treffen011003.htm http://www.rover-club-hessen.de/rchevo_2/html/oldtimershow.htm http://www.rover-club-hessen.de/rchevo_2/html/treffen053103.htm http://www.rover-club-hessen.de/rchevo_2/html/tutorial_1.htm http://www.rover-club-hessen.de/rchevo_2/html/tutorial_2.htm http://www.rover-club-hessen.de/rchevo_2/html/roverlinks.htm http://www.rover-club-hessen.de/rchevo_2/html/clublinks.htm http://www.rover-club-hessen.de/rchevo_2/html/tuninglinks.htm Optimizing tables... Indexing complete !
__________________
-Roland- :: Test PhpDig 1.6.2 here :: - :: Test-Search for (little) Intelligent Php-Dig Fuzzy :: Last edited by Rolandks; 10-10-2003 at 11:13 AM. |
10-10-2003, 03:40 PM | #3 |
Green Mole
Join Date: Oct 2003
Location: Püttlingen (Saar) - Germany
Posts: 8
|
Hi Roland,
I know that the first page has many errors - and by god I swear I do not wrote this page - but the problem is that other indexers could fetch the site but phpDig not. I'll try to find the bug by myself now. By the way I think it would be best if an indexer searches just for something that contains src="URL" or href="URL" surrounded by < and >. With this every problem with wrong markup should go away. The problem is that there could be false positives in the generated URL table. The second site was updated recently so it is possible that some errors are corrected now. Anyway phpDig is a great project! Bernhard |
10-11-2003, 04:18 AM | #4 |
Green Mole
Join Date: Oct 2003
Location: Püttlingen (Saar) - Germany
Posts: 8
|
Hi!
After some testing with various settings I managed to index http://www.rover-club-berlin.com/ . The only setting I needed to change was PHPDIG_DEFAULT_INDEX to false. Bernhard |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
cannot index my site | ENTHALPIE | Troubleshooting | 2 | 11-18-2005 03:02 AM |
How to Index a site? | davey147 | Troubleshooting | 0 | 08-30-2004 01:10 PM |
Reindexing site won't index certain page | gman | Troubleshooting | 4 | 08-06-2004 02:05 PM |
How do I create "Site Index" using PHPDig ? | jimfletcher | How-to Forum | 5 | 07-14-2004 05:56 AM |
Can't get PHPDig to index an htaccess protected site | mlerch@mac.com | Troubleshooting | 28 | 02-25-2004 04:13 PM |