PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 02-16-2004, 01:22 PM   #1
tomas
Orange Mole
 
Join Date: Feb 2004
Posts: 47
running out of memory

hello list,

spidering a bunch of pdf-files (about 250) in one directory -
spider.php runs out of mem (8k in php.ini) after file 50 -
setting php.ini to 32k error after file 110 -
setting to 128k error after file 220 -

i think there is a bug in spider.php with freeing mem ???

any ideas???

tomas
tomas is offline   Reply With Quote
Old 02-16-2004, 02:42 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Is it kb or mb? Maybe try breaking the list into smaller lists and/or index from shell.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-16-2004, 03:47 PM   #3
tomas
Orange Mole
 
Join Date: Feb 2004
Posts: 47
sorry charter,

of course 8mb -> 32mb ->128mb

why a smaller list - is spider.php eating my memory :-)
when is called from browser ???

tomas
tomas is offline   Reply With Quote
Old 02-16-2004, 04:35 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Using shell bypasses the web server. What version of PHP are you using and what's your OS? Maybe this is a timeout issue? What are the actual errors that you are receiving?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-16-2004, 04:47 PM   #5
tomas
Orange Mole
 
Join Date: Feb 2004
Posts: 47
hi charter,

php-4.3.3
fedora core_1
apache 2

by the way - maybe tricky helpful for other who open pdf-files via javascript which is not recognized by spider.php:

1) on one of the websites make a dummy-link eg. <a href="pdf.php"></a>
2) setup pdf.php in website-root:

<?php
$files = explode("\n",`find .|sort`);
for ($i = 0; $i < count($files); $i++) {
$file=$files[$i];
if (!is_dir($file) and strpos($file, ".pdf", "0")!="") {
printf("<a href=\"%s\"></a><br>\n", $file);
}
}
?>

regards
tomas

Last edited by tomas; 02-16-2004 at 05:34 PM.
tomas is offline   Reply With Quote
Old 02-16-2004, 09:40 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I'm not sure if the issue is related to pdftotext and/or PhpDig. Maybe try memory_get_usage and get_defined_vars within the spider.php file to see if anything unusual shows.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-18-2004, 03:22 PM   #7
tomas
Orange Mole
 
Join Date: Feb 2004
Posts: 47
hello charter,

setting php.ini back to 8mb and running spider.php with bash/cron:

spider dies - and his last words were :-)

<b>Fatal error</b>: Allowed memory size of 8388608 bytes exhausted (tried to allocate 653 bytes) in <b>/var/www/html/search/admin/robot_functions.php</b> on line <b>707</b><br />


?
tomas is offline   Reply With Quote
Old 02-19-2004, 07:04 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. In the phpdigTempFile function of robot_functions.php, perhaps replace the following:
PHP Code:
$f_handler fopen($tempfile1,'wb');
if (
is_array($file_content)) {
   
fwrite($f_handler,implode('',$file_content));
}
fclose($f_handler);
$tempfilesize filesize($tempfile1); 
with the following:
PHP Code:
$f_handler fopen($tempfile1,'wb');
if (
is_array($file_content)) {
   
fwrite($f_handler,implode('',$file_content));
}
fclose($f_handler);
unset(
$file_content);
$tempfilesize filesize($tempfile1); 
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-19-2004, 08:52 AM   #9
tomas
Orange Mole
 
Join Date: Feb 2004
Posts: 47
hi charter,

i tried and tested a bit -
now i'm sure the reason are pdf's larger than 2or3 mb
with lots of vector-graphics inside.

so - how could we setup spider.php - to go on
spidering the next files even if one or more files
are too big for allowed memory setting in php.ini.

thanks
tomas
tomas is offline   Reply With Quote
Old 02-19-2004, 10:10 AM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. If you are asking to do something like "if fatal error, no more memory, so skip this file and go to next file" I doubt this can be done because, by the time PHP encounters the fatal error, no more memory, there isn't room to do anything else.

Untested but what you might try though is the following. In the phpdigTempFile function, add the following:
PHP Code:
if (memory_get_usage() + 2000000 8000000) {
    return array(
'tempfile'=>0,'tempfilesize'=>0);

right before the following line:
PHP Code:
$f_handler fopen($tempfile1,'wb'); 
That way at least if the current memory being used (in bytes) plus 2MB is greater than 8MB then the function will end, the file shouldn't be indexed, and the index process should continue.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-19-2004, 11:52 AM   #11
tomas
Orange Mole
 
Join Date: Feb 2004
Posts: 47
Unhappy

sorry charter - to bother you again and again,
but nothing works.

tomas
tomas is offline   Reply With Quote
Old 02-19-2004, 12:12 PM   #12
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Try changing the numbers like in the below code or just make a list of PDFs that are less than the 2 or 3 MB ones that are using so much memory.
PHP Code:
if (memory_get_usage() + 1000000 3000000) { 
    return array(
'tempfile'=>0,'tempfilesize'=>0); 

__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Running phpdig cloefke Troubleshooting 0 07-05-2007 01:00 PM
Memory Leak kwa1975 Bug Tracker 0 07-13-2006 09:07 AM
Memory usage? Dave A How-to Forum 0 01-25-2006 04:50 PM
Memory allocation error olivier External Binaries 6 02-17-2005 04:37 AM
Script running on its own? druesome Troubleshooting 1 10-26-2003 10:15 AM


All times are GMT -8. The time now is 08:34 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.