|
07-29-2005, 09:23 AM | #1 |
Green Mole
Join Date: Jan 2005
Posts: 6
|
Documents disappear
I have two sites on which I'm experiencing the same issue. Both sites contain a large number of PDF and DOC files - one site has 300, the other about 500. Every time a new document is added to the site, I manually add that single document to the index.
However, it appears that certain documents are not coming up in searches after a period of time. If I just re-submit the document, it says that its already there, of course. But if I delete all documents and rebuild the entire index the documents will show up again. Then they stop being returned on searches after a period of time. I am at a loss as to why this is happening; any advice is appreciated. |
07-29-2005, 09:34 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
What version of PhpDig are you using? Do you use a cron job or the PhpDig admin panel?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-29-2005, 09:37 AM | #3 |
Green Mole
Join Date: Jan 2005
Posts: 6
|
v.1.8.7
Admin panel. I could never get the chron to work correctly. |
07-29-2005, 10:11 AM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Do you run any of the "cleans" prior to experiencing this issue?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-29-2005, 10:16 AM | #5 |
Green Mole
Join Date: Jan 2005
Posts: 6
|
No - should I?
|
07-30-2005, 07:51 AM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
No, you do not have to run the "cleans." What about space; are you running out of space? Maybe adding a document is wiping a previous document, or when you reindex, do all (new and old) documents show up in the search results?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-30-2005, 08:11 AM | #7 |
Green Mole
Join Date: Jan 2005
Posts: 6
|
Space isn't an issue. Each document is uniquely named -- I trap for that to ensure nothing is getting wiped out.
The documents still exist in the spider table. The keywords still exist in the keywords table. But the connection between the two disappears from engine. So, when I try to resubmit the document, it says its already there. But its not coming up on searches as the keyword connection is gone. But, if I delete that document from the system (using admin interface) and re-submit it, then its fine. But of course, I can't tell what's been axed and whats okay when I get hollered at, so I wipe out the whole thing and re-index the whole thing again. And then that seems to make things better. Of course, I'd like to preserve the original index. But if there is something going on that precludes that, can you suggest a way I could re-index the site (300/500 docs) w/o my intervention? Something I could run nightly that wouldn't timeout? I really appreciate any advice. this has happened a few times and I really don't like the testy calls from passive aggressive clients. |
07-30-2005, 08:26 AM | #8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Is CONTENT_TEXT set to 1 in the config file? If so, is there ever a case where a TXT file in the TEXT_CONTENT_PATH directory is manually removed? The text files in the TEXT_CONTENT_PATH directory are named spider_id.txt (spider_id is a number from the spider table). For a cron job, do as in the documentation, and also make the change shown in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
spider documents without extensions | jguert | External Binaries | 0 | 08-17-2006 08:39 AM |
How to scan XML documents | batman1056 | How-to Forum | 1 | 05-19-2005 08:34 AM |
Textual content of indexed documents | Dreamory | How-to Forum | 2 | 10-25-2004 08:50 AM |
Spidering a directory - timeout after 10 documents | tams | Troubleshooting | 2 | 03-15-2004 11:31 AM |
Duplicate Documents Problem... | vonbrocklin | Troubleshooting | 3 | 11-25-2003 02:16 PM |