View Single Post
Old 08-27-2004, 12:34 PM   #14
rispbiz
Green Mole
 
rispbiz's Avatar
 
Join Date: Jan 2004
Posts: 15
Problem with newurls.txt

Due to many problems trying to index urls from a text file I made this little script from JWSmythe's build.searchimages.pl.

This script will pull the url from the database and index it, delete it from the db, and then optimize tables.

#!/usr/bin/perl

use DBI;
use MIME::Base64;


$db = DBI->connect("DBI:mysql:database:localhost", username, 'password') || die "$!";

$source_query = "SELECT new_site_url FROM newsites ";

$source = $db->prepare("$source_query") || die "$!, error on source prepare\n";
$source->execute || print "Error on source execute\n";

while (@curarray = $source->fetchrow_array){
$req_url = $curarray[0];
$req_url =~ s/\;//g;
$outfile = $req_url;
chop ($outfile);
$outfile =~ s/\n//g;
$outfile = "$outfile";
print "Indexing $req_url -> .....\n";
$sysstring = "php -f /path/to/admin/spider.php $req_url";
system(`$sysstring`);
print "Finished Indexing $req_url -> ...Complete\n";
$db->do("DELETE FROM newsites WHERE new_site_url = '$curarray[0]'");
$db->do("OPTIMIZE TABLE newsites");
$db->do("OPTIMIZE TABLE tempspider");
};



Then I can run this script from shell or cron with this command with no problem.

perl /path/to/cgi-bin/newurls.pl

Note: If running cron be sure to allow enough time to index new urls before starting new cron. So don't set your cron up for evey minute.
__________________
Sometimes the shortest way home is the longest way around!

Thank you PhpDig for a great search engine!
www.2-surf.net
rispbiz is offline   Reply With Quote