PhpDig.net - View Single Post

rispbiz · 08-27-2004, 12:34 PM

Due to many problems trying to index urls from a text file I made this little script from JWSmythe's build.searchimages.pl.

This script will pull the url from the database and index it, delete it from the db, and then optimize tables.

#!/usr/bin/perl

use DBI;
use MIME::Base64;

$db = DBI->connect("DBI:mysql:database:localhost", username, 'password') || die "$!";

$source_query = "SELECT new_site_url FROM newsites ";

$source = $db->prepare("$source_query") || die "$!, error on source prepare\n";
$source->execute || print "Error on source execute\n";

while (@curarray = $source->fetchrow_array){
$req_url = $curarray[0];
$req_url =~ s/\;//g;
$outfile = $req_url;
chop ($outfile);
$outfile =~ s/\n//g;
$outfile = "$outfile";
print "Indexing $req_url -> .....\n";
$sysstring = "php -f /path/to/admin/spider.php $req_url";
system(`$sysstring`);
print "Finished Indexing $req_url -> ...Complete\n";
$db->do("DELETE FROM newsites WHERE new_site_url = '$curarray[0]'");
$db->do("OPTIMIZE TABLE newsites");
$db->do("OPTIMIZE TABLE tempspider");
};

Then I can run this script from shell or cron with this command with no problem.

perl /path/to/cgi-bin/newurls.pl

Note: If running cron be sure to allow enough time to index new urls before starting new cron. So don't set your cron up for evey minute.

08-27-2004, 12:34 PM	#14
rispbiz Green Mole Join Date: Jan 2004 Posts: 15	Problem with newurls.txt Due to many problems trying to index urls from a text file I made this little script from JWSmythe's build.searchimages.pl. This script will pull the url from the database and index it, delete it from the db, and then optimize tables. #!/usr/bin/perl use DBI; use MIME::Base64; $db = DBI->connect("DBI:mysql:database:localhost", username, 'password') \|\| die "$!"; $source_query = "SELECT new_site_url FROM newsites "; $source = $db->prepare("$source_query") \|\| die "$!, error on source prepare\n"; $source->execute \|\| print "Error on source execute\n"; while (@curarray = $source->fetchrow_array){ $req_url = $curarray[0]; $req_url =~ s/\;//g; $outfile = $req_url; chop ($outfile); $outfile =~ s/\n//g; $outfile = "$outfile"; print "Indexing $req_url -> .....\n"; $sysstring = "php -f /path/to/admin/spider.php $req_url"; system(`$sysstring`); print "Finished Indexing $req_url -> ...Complete\n"; $db->do("DELETE FROM newsites WHERE new_site_url = '$curarray[0]'"); $db->do("OPTIMIZE TABLE newsites"); $db->do("OPTIMIZE TABLE tempspider"); }; Then I can run this script from shell or cron with this command with no problem. perl /path/to/cgi-bin/newurls.pl Note: If running cron be sure to allow enough time to index new urls before starting new cron. So don't set your cron up for evey minute. __________________ Sometimes the shortest way home is the longest way around! Thank you PhpDig for a great search engine! www.2-surf.net