|
12-28-2004, 02:59 PM | #1 |
Green Mole
Join Date: Dec 2004
Posts: 6
|
indexing for the 1st time but getting "duplicate of existing doc" msg with some files
Hello everyone,
I just installed phpdig and I'm in the process of indexing my website. It's working great, except for a little problem. I'm indexing from directory index pages, listing the contents of the directory dynamically, with a search depth of 1. All the links are detected, and most of the pages are searched and indexed. But somehow, with some of the pages I get the message "duplicate of an existing document". Does any of you have any idea why that would happen, since I haven't indexed these pages before? When I go to the update form, the "duplicate" files don't appear in their directory. When I try to index the page by specifying the full URL, I still get the "duplicate" message, even when I set "Use values from Update sites table if present and use default values if values absent from table" to "no". Is there any way to "force" the indexing of these pages? Any idea, anyone? |
12-28-2004, 11:42 PM | #2 | |
Green Mole
Join Date: Sep 2003
Location: Germany
Posts: 7
|
Quote:
|
|
12-29-2004, 04:47 AM | #3 |
Green Mole
Join Date: Dec 2004
Posts: 6
|
Getting weirder...
Just tried something else:
I created a plain .htm file with links to all the pages that wouldn't be indexed the first time around. This file is placed in subdirectory alpha/ It links to pages placed in subdirectories alpha/b to alpha/z (relative links) The pages in alpha/a were indexed correctly the first time, so I didn't link to any of them. Now I try to index my htm file with search depth set to 1: it detects the many links in the page (plenty of + + + + +), but it doesn't go on to level 1. Instead, it tries to index files in alpha/a (while I don't have any links to any of these pages on my indexing page) before stopping suddenly after the 12th file in that subdirectory (no more activity in browser but still no [back to admin] link. I'm probably doing something wrong. Could someone please help? |
12-29-2004, 02:00 PM | #4 |
Green Mole
Join Date: Dec 2004
Posts: 6
|
Still working on the problem... Searched the forum's archives and didn't find anything useful...
I uploaded phpdig in another directory, installed it with another table prefix, and tried to index the plain htm file I described above: this time it managed to index most of the linked files (still not all of them though). So now I have tables with the biggest part of my website indexed on it, and other tables with (most of) the missing files indexed. At the moment I'm working on a script to join both tables, for a lack of a better solution... Anyone with a quicker and easier idea is more than welcome! |
12-30-2004, 08:20 AM | #5 |
Green Mole
Join Date: Dec 2004
Posts: 6
|
Just in case anyone is still following this thread, I'll let you know that my script worked: the pages and associated keywords were correctly added to the tables. The pages also show up in the search page, which is good!
The only tiny remaining problem: when one of these pages is in the results of the search, the snippet is the beginning of the file, and not the part featuring the (highlighted) keyword. So now I'm having a look at the search_function.php script to see how it works and why it wouldn't show a correct snippet for the files I added "manually". |
12-30-2004, 12:08 PM | #6 |
Green Mole
Join Date: Oct 2004
Posts: 27
|
you make too much work for nothing - tsk tsk - search on duplicate - see
http://www.phpdig.net/forum/showthre...ight=duplicate
__________________
rAdoN was here |
12-30-2004, 02:06 PM | #7 |
Green Mole
Join Date: Dec 2004
Posts: 6
|
I *did* search on duplicate... But I didn't look too far back in time in case it was due to the version or something...
But thanks for the link, I'll try that! Last edited by Morphea; 12-30-2004 at 02:48 PM. |
12-30-2004, 02:43 PM | #8 |
Green Mole
Join Date: Oct 2004
Posts: 27
|
be not afraid - if doubt - compare code - if different - use newest code
__________________
rAdoN was here |
12-30-2004, 02:48 PM | #9 |
Green Mole
Join Date: Dec 2004
Posts: 6
|
At first I edited my last post, but seeing you replied in the meantime I thought I'd post it in a new message.
So I increased the CHUNK_SIZE (doubled it to 2048), but it still wouldn't work. My pages are simple php files, with no variables called in the URL like in the exemple in the link provided. They're like entries of a dictionary or an encyclopedia, so each page is quite different from any other (maybe not filesize-wise though). After changing the CHUNK_SIZE, and trying to index a specific page with its full URL, with depth to 0, the spider didn't even seem to try indexing this one page, but instead began trying to index files in another directory... (and not even the root directory) I'm sure there HAS to be some way to fix this... But right now I'm quite discouraged... |
12-30-2004, 04:03 PM | #10 |
Green Mole
Join Date: Oct 2004
Posts: 27
|
remove mods - do virgin 1.8.6 install - use CHUNK_SIZE 2048 or 4096 - use dynamic title - use LIMIT_TO_DIRECORY false - use SPIDER_MAX_LIMIT 100 - use LINKS_MAX_LIMIT 100 - use search depth 100 - use links per 0 - links in iframe or heavy javascript not followed - index after setting config - expect duplicate of an existing document at high depth - may be real duplicate - same link is duplicate - see
http://www.phpdig.net/forum/showthread.php?t=1139 relax - know not what more
__________________
rAdoN was here |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
shows blank page if "Search All" and "exact phrase", timeout? | alokjain9 | Troubleshooting | 2 | 03-07-2006 08:08 AM |
Re-indexing a page, "boosting" pages | Kozz | How-to Forum | 2 | 04-06-2005 02:32 PM |
"search depth" and "links per" features | laurentxav | How-to Forum | 1 | 01-12-2005 08:27 PM |
Problem with indexing "links found : 0" | IAMHHawaii | Troubleshooting | 1 | 09-20-2004 01:06 PM |
indexing " numeric " words | laurentxav | How-to Forum | 2 | 01-26-2004 06:11 AM |