|
07-04-2004, 07:01 PM | #16 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Don't know if this helps at all, but I found a few pages that were indexed with 1.8.0 and weren't with 1.8.1. These are all from www.napathon.net
/AlbumID1107.php - which can be found on: /Rock12.php /AlbumID1113.php - on /Rock23.php /AlbumID1114.php & /AlbumID1115.php - on /Rock28.php Doesn't make sense why these pages wouldn't be spidered. I don't have a complete spider log or anything, but I could go spider again and make that if you need it. |
07-04-2004, 07:34 PM | #17 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Try comparing the version 1.8.0 config file against the version 1.8.1 config file. Perhaps something there is causing the difference?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-04-2004, 08:20 PM | #18 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Yes, there are some differences in my config file between 1.8.0 and 1.8.1. I'm not sure whether they'd make that much difference though. Here are the ones that have anything to do with spidering:
SPIDER_MAX_LIMIT 20 (1.8.0) vs. 10 (1.8.1) SPIDER_DEFAULT_LIMIT 3 (1.8.0) vs. 5 (1.8.1) Everything else is the same in both versions. I'll copy the 1.8.1 config file to my server, re-spider and let you know what happens. |
07-05-2004, 05:04 AM | #19 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Try checking CHUNK_SIZE in the config file too.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-05-2004, 05:33 AM | #20 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
CHUNK_SIZE in the 1.8.1 config file I have is 1024 vs. 2048 in my original config file.
When I re-spidered last night after copying the 1.8.1 config file to the server, I got 2 more pages indexed this time. Still far short of what I should have. Charter, when you get a chance, could you put together a complete 1.8.1 zip file with all the latest stuff? I want to re-download again to make absolute certain I'm using all files from that. Then I'll re-spider again and let you know what happens. Nice to have plenty of bandwidth to do that. |
07-05-2004, 06:05 AM | #21 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
PhpDig: 1.8.1 alpha <removed> and two replacement files <removed>. Manual install still required as install.php not yet included.
EDIT: PhpDig version 1.8.1 released.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-05-2004, 07:02 AM | #22 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Are you aware that the 1.8.1 alpha zip has nothing separated into the appropriate folders? I don't remember it being like that when I downloaded it the first time.
|
07-05-2004, 07:07 AM | #23 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. It should be separated as before; it's the same file. Maybe an unzip program option needs to be un/checked?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-05-2004, 09:58 AM | #24 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
You were correct. I had "Use Folder Names" un-checked.
OK, this time I emptied all my folders and started from scratch with those two zip files. Search depth = 10. Links per = 0. Here's the spider log: Spidering in progress... -------------------------------------------------------------------------------- SITE : http://www.napathon.net/ Exclude paths : - test/ - phpdig181/ - BW-Original/ - Joe_and_Eddie/ 1:http://www.napathon.net/ (time : 00:00:06) No link in temporary table -------------------------------------------------------------------------------- links found : 1 http://www.napathon.net/ Optimizing tables... Indexing complete ! ========================== Hosts: 1 Pages Entries: 1 Pages Index: 177 Entries Keywords: 177 Entries Temporary Table: 0 Entries |
07-05-2004, 10:47 AM | #25 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. When indexing http://www.napathon.net/ with search depth of ten and links_per of zero, and hand stopping after 100 links, below is the output. Do you not get this? Does anything show in your error logs?
Spidering in progress... -------------------------------------------------------------------------------- SITE : http://www.napathon.net/ Exclude paths : - test/ - phpdig181/ - BW-Original/ - Joe_and_Eddie/ 1:http://www.napathon.net/ (time : 00:00:08) + + + + + + + level 1... 2:http://www.napathon.net/miscmenu.php (time : 00:00:20) + + + + + + + + + + + + 3:http://www.napathon.net/musicmenu.php (time : 00:00:28) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 4:http://www.napathon.net/SearchMenu.php (time : 00:00:42) Ok for http://search.napathon.net/search.php (site_id:469) + 5:http://www.napathon.net/sitemap.php (time : 00:00:50) 6:http://www.napathon.net/FAQ.php (time : 00:00:57) 7:http://www.napathon.net/ContactMe.php (time : 00:01:03) 8:http://www.napathon.net/Privacy.php (time : 00:01:09) level 2... 9:http://www.napathon.net/1219AshlandIntro.php (time : 00:01:21) 10:http://www.napathon.net/1219AshlandSlideShow.php (time : 00:01:27) HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/ See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. HTTP/1.1 404 Not Found - http://www.napathon.net/theimage See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. 11:http://www.napathon.net/BeeGeesMobile.php (time : 00:01:34) 12:http://www.napathon.net/BoganReunion2003SlideShow.php (time : 00:01:40) HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/ See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. HTTP/1.1 404 Not Found - http://www.napathon.net/theimage See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. 13:http://www.napathon.net/EstherEscortToHeaven.php (time : 00:01:48) 14:http://www.napathon.net/MeAtWork.php (time : 00:01:54) 15:http://www.napathon.net/RekkidRoom.php (time : 00:02:00) HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/ See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. HTTP/1.1 404 Not Found - http://www.napathon.net/theimage See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. 16:http://www.napathon.net/WongFamily.php (time : 00:02:08) 17:http://www.napathon.net/Wonglets.php (time : 00:02:14) 18:http://www.napathon.net/BillsRecordsSlideShow.php (time : 00:02:20) HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/ See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. HTTP/1.1 404 Not Found - http://www.napathon.net/theimage See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. 19:http://www.napathon.net/JohnnyGimbleSlideShow.php (time : 00:02:27) HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/ See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. HTTP/1.1 404 Not Found - http://www.napathon.net/theimage See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. 20:http://www.napathon.net/CDTrusteeReview.php (time : 00:02:33) 21:http://www.napathon.net/MusicIntro.php (time : 00:02:39) Meta Robots = NoIndex, or already indexed : No content indexed 22:http://www.napathon.net/MyCollection1.php (time : 00:02:45) 23:http://www.napathon.net/NewArrivals1.php (time : 00:02:52) + + + + + 24:http://www.napathon.net/BeeGees1.php (time : 00:03:00) + + + + + + + + + + + + + + + + + + + + + + + + + + + 25:http://www.napathon.net/Blues1.php (time : 00:03:11) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 26:http://www.napathon.net/Corrs1.php (time : 00:03:24) + + + + + + + + + + + + + + + + + + + + + 27:http://www.napathon.net/Country1.php (time : 00:03:34) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 28:http://www.napathon.net/EasyListening1.php (time : 00:03:46) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 29:http://www.napathon.net/Folk1.php (time : 00:03:59) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 30:http://www.napathon.net/Jazz1.php (time : 00:04:10) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 31:http://www.napathon.net/Miscellaneous1.php (time : 00:04:23) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 32:http://www.napathon.net/Reggae1.php (time : 00:04:34) + + + + + + + + + + + + + + + + + + + 33:http://www.napathon.net/Rock1.php (time : 00:04:44) + + + + + + + + + + + + + + + + + + + + + + + + + + + 34:http://www.napathon.net/SavageGarden2.php (time : 00:04:55) + + + + + + + + + + + + + + + + + + + + + + + 35:http://www.napathon.net/TradeList.php (time : 00:05:05) + + + + + + + + + + + + + + + + + + 36:http://www.napathon.net/WantList.php (time : 00:05:14) 37:http://www.napathon.net/BadTrader.php (time : 00:05:21) 38:http://www.napathon.net/LPtoCD.php (time : 00:05:28) + 39:http://www.napathon.net/ABeeGeesChristmas.php (time : 00:05:34) 40:http://www.napathon.net/BeeGeeTastic.php (time : 00:05:40) 41:http://www.napathon.net/KerrvilleEarlyYearsReview.php (time : 00:05:48) 42:http://www.napathon.net/GottaGetReview.php (time : 00:05:54) 43:http://www.napathon.net/ThreeBees.php (time : 00:06:00) 44:http://www.napathon.net/BeeGees6.php (time : 00:06:07) + + + + + + + + + + + + + + + + + + + + + + 45:http://www.napathon.net/WeLoveTheBeeGees.php (time : 00:06:18) 46:http://www.napathon.net/BillsRecordsArticle.php (time : 00:06:24) + 47:http://www.napathon.net/IStartedAJoke.php (time : 00:06:31) 48:http://www.napathon.net/JohnnyGimbleConcert.php (time : 00:06:38) 49:http://www.napathon.net/CliveAnderson.php (time : 00:06:44) 50:http://www.napathon.net/RustyWier.php (time : 00:06:50) 51:http://www.napathon.net/InternetCollecting.php (time : 00:06:58) 52:http://www.napathon.net/BWStevenson.php (time : 00:07:05) 53:http://www.napathon.net/BWStevensonOnRhino.php (time : 00:07:12) 54:http://www.napathon.net/BW-Lyrics.php (time : 00:07:18) + + + + + + + + + + 55:http://www.napathon.net/BWStevenson/bw_intro.php (time : 00:07:26) 56:http://www.napathon.net/BWStevenson/bw_page_1.php (time : 00:07:32) 57:http://www.napathon.net/BWStevenson/bw_discography.php (time : 00:07:39) HTTP/1.1 404 Not Found - http://www.napathon.net/BWStevenson/top/ See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. HTTP/1.1 404 Not Found - http://www.napathon.net/BWStevenson/top See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. + 58:http://www.napathon.net/BWStevenson/bw_memories.php (time : 00:07:47) 59:http://www.napathon.net/BWStevenson/bw_tv.php (time : 00:07:55) Meta Robots = NoIndex, or already indexed : No content indexed 60:http://www.napathon.net/MusicDBSearch.php (time : 00:08:01) level 3... Meta Robots = NoIndex, or already indexed : No content indexed 61:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=74 (time : 00:08:12) 62:http://www.napathon.net/AlbumID1552.php (time : 00:08:18) 63:http://www.napathon.net/AlbumID1553.php (time : 00:08:24) Meta Robots = NoIndex, or already indexed : No content indexed 64:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=55 (time : 00:08:30) 65:http://www.napathon.net/AlbumID1557.php (time : 00:08:36) Meta Robots = NoIndex, or already indexed : No content indexed 66:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=338 (time : 00:08:43) 67:http://www.napathon.net/AlbumID1063.php (time : 00:08:49) Meta Robots = NoIndex, or already indexed : No content indexed 68:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=354 (time : 00:08:55) 69:http://www.napathon.net/AlbumID1153.php (time : 00:09:01) Meta Robots = NoIndex, or already indexed : No content indexed 70:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=39 (time : 00:09:07) 71:http://www.napathon.net/AlbumID42.php (time : 00:09:13) 72:http://www.napathon.net/AlbumID43.php (time : 00:09:20) Meta Robots = NoIndex, or already indexed : No content indexed 73:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=350 (time : 00:09:26) 74:http://www.napathon.net/AlbumID1116.php (time : 00:09:32) Meta Robots = NoIndex, or already indexed : No content indexed 75:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=51 (time : 00:09:38) 76:http://www.napathon.net/AlbumID1546.php (time : 00:09:44) 77:http://www.napathon.net/AlbumID1547.php (time : 00:09:50) 78:http://www.napathon.net/AlbumID534.php (time : 00:09:56) 79:http://www.napathon.net/AlbumID1456.php (time : 00:10:03) 80:http://www.napathon.net/AlbumID80.php (time : 00:10:09) 81:http://www.napathon.net/AlbumID82.php (time : 00:10:15) 82:http://www.napathon.net/AlbumID91.php (time : 00:10:22) 83:http://www.napathon.net/AlbumID537.php (time : 00:10:28) 84:http://www.napathon.net/AlbumID83.php (time : 00:10:34) 85:http://www.napathon.net/AlbumID1336.php (time : 00:10:40) 86:http://www.napathon.net/AlbumID1077.php (time : 00:10:47) 87:http://www.napathon.net/AlbumID92.php (time : 00:10:53) 88:http://www.napathon.net/AlbumID802.php (time : 00:11:00) 89:http://www.napathon.net/AlbumID93.php (time : 00:11:06) 90:http://www.napathon.net/AlbumID826.php (time : 00:11:13) 91:http://www.napathon.net/BeeGees2.php (time : 00:11:19) + + + + + + + + + + + + + 92:http://www.napathon.net/BeeGees18.php (time : 00:11:28) + + + + + + + + + + + + + + Meta Robots = NoIndex, or already indexed : No content indexed 93:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=92 (time : 00:11:36) 94:http://www.napathon.net/AlbumID242.php (time : 00:11:42) Meta Robots = NoIndex, or already indexed : No content indexed 95:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=97 (time : 00:11:48) 96:http://www.napathon.net/AlbumID1296.php (time : 00:11:55) 97:http://www.napathon.net/AlbumID253.php (time : 00:12:01) Meta Robots = NoIndex, or already indexed : No content indexed 98:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=119 (time : 00:12:07) 99:http://www.napathon.net/AlbumID297.php (time : 00:12:13) Meta Robots = NoIndex, or already indexed : No content indexed 100:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=400 (time : 00:12:19)
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-05-2004, 11:28 AM | #26 | |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Here's what shows up in my error log:
Quote:
Wonder why spidering works for you and not for me. All I did was delete the contents of my folders on the server, deleted and re-created my phpdig database and the user for it, made the necessary changes to config.php and connect.php, then re-spidered. |
|
07-05-2004, 11:39 AM | #27 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Those errors are just 404s, created when PhpDig thinks it found a link, but really it's not a link. The 404s shouldn't cause a problem with spidering though. Version 1.8.1 alpha, was it working for you, but now it's not? Are you using the two replacement files with the alpha version?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-05-2004, 03:45 PM | #28 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Well, I did indeed discover that I had made a mistake with those two replacement files. True, I had unzipped them, but they didn't end up where I thought they were. Consequently, my previous spidering was with the wrong files.
However, letting the spider run its course with the correct 1.8.1 files, I still only have 1264 pages spidered, which is about 200 or so pages short of what I end up with for 1.8.0. I have no idea why. |
07-05-2004, 08:02 PM | #29 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Actually, I think it is working as it should.
Before respidering anything, using version 1.8.1, if you do an exact phrase search on "rock collection: page" (without the quotes) and see how many Rock Collection: Page X titles show up in the search results, you'll see the following, along with some other titles: Rock Collection: Page 1 Rock Collection: Page 2 Rock Collection: Page 3 Rock Collection: Page 4 Rock Collection: Page 5 Rock Collection: Page 6 Rock Collection: Page 7 Rock Collection: Page 8 Rock Collection: Page 9 Rock Collection: Page 34 Rock Collection: Page 35 Rock Collection: Page 36 Rock Collection: Page 37 Rock Collection: Page 38 Rock Collection: Page 39 Rock Collection: Page 40 Seeing as how you kept SPIDER_MAX_LIMIT at ten, the index process worked as follows (using RockX.php links as example, omitting other links from the example): Level Zero http://www.napathon.net/ (links to musicmenu.php) Level One: http://www.napathon.net/musicmenu.php (links to Rock1.php) Level Two: http://www.napathon.net/Rock1.php (links to Rock2.php and Rock41.php) Level Three: http://www.napathon.net/Rock2.php (links to Rock3.php and Rock41.php) http://www.napathon.net/Rock41.php (links to Rock1.php and Rock40.php) Level Four: http://www.napathon.net/Rock3.php (links to Rock4.php and Rock41.php) http://www.napathon.net/Rock40.php (links to Rock1.php and Rock39.php) Level Five: http://www.napathon.net/Rock4.php (links to Rock5.php and Rock41.php) http://www.napathon.net/Rock39.php (links to Rock1.php and Rock38.php) Level Six: http://www.napathon.net/Rock5.php (links to Rock6.php and Rock41.php) http://www.napathon.net/Rock38.php (links to Rock1.php and Rock37.php) Level Seven: http://www.napathon.net/Rock6.php (links to Rock7.php and Rock41.php) http://www.napathon.net/Rock37.php (links to Rock1.php and Rock36.php) Level Eight: http://www.napathon.net/Rock7.php (links to Rock8.php and Rock41.php) http://www.napathon.net/Rock36.php (links to Rock1.php and Rock35.php) Level Nine: http://www.napathon.net/Rock8.php (links to Rock9.php and Rock41.php) http://www.napathon.net/Rock35.php (links to Rock1.php and Rock34.php) Level Ten: http://www.napathon.net/Rock9.php (links to Rock10.php and Rock41.php) http://www.napathon.net/Rock34.php (links to Rock1.php and Rock33.php) So, with SPIDER_MAX_LIMIT at ten, PhpDig won't go further than ten levels. Applied to the above example, this means Rock10.php through Rock33.php were not indexed. Solution: Increase SPIDER_MAX_LIMIT in the config file and then select a higher search depth to index your site.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-05-2004, 08:23 PM | #30 | |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Quote:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
MySQL version? | jmitchell | Coding & Tutorials | 1 | 02-01-2005 11:09 AM |
What abou the new version!? | BulForce | The Mole Hole | 1 | 01-19-2005 07:02 PM |
RSS version? | AllKnightAccess | Troubleshooting | 2 | 09-27-2004 12:06 AM |
Corrections for Version 1.8.1 | Charter | Feedback & News | 3 | 07-12-2004 04:37 PM |
Next version? | tazmandev | Mod Requests | 1 | 03-09-2004 11:59 AM |