View Single Post
Old 04-14-2004, 12:07 PM   #1
catchme
Green Mole
 
Join Date: Apr 2004
Posts: 3
incremental building of dig database

Greetings Dig Board Members.

I've just started working with dig. Overall, I am happy to find such a fine search engine tool available open source for PHP.

I run a couple of larger sites 8,000 - 80,000 pages of content, that I have interest to index with a search engine. These sites will add about 20 new pages of content per day.

I've noticed that, while possible to index these pages with dig, it can be a slow process sometimes - and also a load intensive process as well.

What I want to accomplish is - to make incremental builds of the dig database.

First, I will build the existing sites. Then afterwards, I would like to index the new files that are added to the site - perhaps every few hours.

Can someone suggest a protocol for only indexing the new files that are added recently into the site?

My thought is to write a script that collects the URIs of the new pages into a file, and then feed this to the spider.php file, when I run it via cron every few hours.

Is this a common procedure for using Dig?

thanks!

Danny
catchme is offline   Reply With Quote