|
02-12-2004, 03:58 AM | #1 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
Index/Spider looping
I managed to get indexing/spidering into a loop. In a very small branch of the site, with no links to elsewhere, it kept repeating the same URL over and over again, with a red cross in front and "Was recently indexed" at the end.
When I stopped the process, the site was locked (of course), and MySQL gave error 145 on table tempspider: cant open file tempspider.MYI. I had to delete the table and create it back with phpMyAdmin. As I have long URLs in the site, I first thought that this might have caused the problem. Why, by the way, are fields 'file' and 'path' limited to 127 chars in the spider table? That is not going to be enough for my site! Can't they be TEXT fields like in tempspider? Anyway, my long URLs are not yet long enough, I have around 50-60 characters in path and file currently. So, something else must have been the cause of this looping.
__________________
René Haentjens, Ghent University |
02-13-2004, 11:19 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Without running a test index on the small branch of your site, you may find the following article useful.
http://www.databasejournal.com/featu...le.php/3300511 You may also find the below links useful. http://www.faqts.com/knowledge_base/view.phtml/aid/329 http://www.mysql.com/doc/en/Choosing_types.html
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-16-2004, 06:10 AM | #3 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
Thanks, Charter, for not giving up on educating me. The articles that you refer to are always interesting and to the point.
My summary on VARCHAR vs. TEXT: VARCHAR: max 255, no loss of space, trailing spaces removed; TEXT: in most respects like unlimited VARCHAR, but no DEFAULT value and sorting only uses the first 1024 chars. My summary on URL length: no limit specified in RFC 1738, in Windows products upto 2083 chars (max32://max2048). As there is no loss of space anyway, may I suggest to change VARCHARs to 255 in the next release, except where for some reason you count on a truncation to a smaller size? I understand that checking all possible side-effects of DB corruptions in the PHP code would be a major coding effort.
__________________
René Haentjens, Ghent University |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Cannot Spider/Index | kccsai | Troubleshooting | 0 | 08-24-2007 11:20 AM |
Index some, but spider all pages | griemer | Troubleshooting | 0 | 01-16-2007 06:30 AM |
the Spider don't index well, why ? | napster | Troubleshooting | 1 | 09-27-2005 09:13 AM |
spider deleted index | liquidice | Troubleshooting | 1 | 04-06-2005 02:15 PM |
Looping when indexing | ZoRaC | Troubleshooting | 6 | 07-26-2004 01:05 AM |