|
01-17-2004, 09:03 AM | #1 |
Green Mole
Join Date: Dec 2003
Posts: 4
|
Indexing Directories
Hi,
I'm trying to spider my website and need to index a lot of pages. (easily over 120,0000) most of the spidering is done but it's just getting longer and longer to index. The site is static (a newspaper archive) and added to daily. The pages are broke down like this: www.foo.com/years/xxxx<-(being the year)/(the issues in this format 0112 <--January 12th / I've told phpdig to spider the site by typing out the year url - for example - www.foo.com/years/2003/ the problems are the a) it always show's up as indexed site www.foo.com in the control panel - not the individual years and B) It always wants to look thru all the previous years to index Do I have to actually create sub domains (i.e. 1996.foo.com) to have seperate directories indexed or is there some other way. I basically want to make a static search database and don't need to reindex anything but the current days additions. Thanks in advance if you have any ideas. Eric McClary www.recordernews.com |
01-18-2004, 07:39 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. What happens when you click a site, click the update button, and then click a green check mark for a specific directory?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-28-2004, 05:07 PM | #3 |
Green Mole
Join Date: Dec 2003
Posts: 4
|
Sorry about the late reply,
I can't even open that screen (it's too big) I run out of virtual memory on my computer (besides the minimum 15 minutes to open. Like I said the site is HUUUGGGEEE. Any way, I'm playing around with a robots.txt file but that doenst seem to work. Even though I told it to exclude all it still seems to take a look at the ones I already did. So a couple of questions: A) Whats the excludes table - can I place parts I don't want reindexed in this table? B) Is there a way to make it not recheck the stuff I already did? C) Last but not least, the only solution I can see is multple installs of phpdig (each with there own database) of course I don't like this answer and If I did this is there a way to have phpdig still search through these databases and give one result page? I know I'm asking alot but I'm hoping there is a solution to searching my Huge archaic website. Thanks Eric McClary www.recordernews.com |
01-28-2004, 05:11 PM | #4 |
Green Mole
Join Date: Dec 2003
Posts: 4
|
Also On a quick note - how about I modify all the update field to some time in the far future (like 2080 or something) would that make them skip checking them (i.e. does it only look at items by current date)?
Thanks Again Eric Last edited by emcclary; 01-28-2004 at 05:57 PM. |
01-28-2004, 07:12 PM | #5 |
Green Mole
Join Date: Dec 2003
Posts: 4
|
Just tried using a txt file (via command line) same problem - updates all (at least checks) I just want to add to the database not update the database.
__________________
Eric McClary www.recordernews.com |
01-29-2004, 06:28 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Perhaps increase LIMIT_DAYS in the config file. Also, you might try version 1.8.0 and a text file via command line, making sure tempspider is empty between runs and SPIDER_MAX_LIMIT, SPIDER_DEFAULT_LIMIT, and RESPIDER_LIMIT are all set to zero in the config file so that just the one page gets indexed, no links are followed.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Indexing sub directories | mlisondra | How-to Forum | 0 | 02-22-2008 06:55 AM |
Spider Indexing and htaccess directories | webmaster_k | Troubleshooting | 0 | 10-01-2007 10:50 AM |
Indexing new directories | bugmenot | How-to Forum | 1 | 03-28-2006 03:33 AM |
indexing directories | iconeweb | Troubleshooting | 1 | 12-04-2005 01:27 AM |
Not Indexing Sub-Directories | jayhawk | Troubleshooting | 3 | 02-11-2004 02:41 PM |