The deeper in the file structure the bot has to dig (for example multiple folders inside folder i.e. http://www.ahfx.net/folder1/folder2/folder3/file.php) the less likely the spider will visit that page. Normally the number of links that link down into deeper pages will be smaller than those that link to high level main pages; thus the spiders do not visit them as much. There isn´t a set limit on url size but each browser and operating system have their own limits. It is always better to keep the url small for people to remember and keep the directory rather short and fat rather than tall and skinny.
tandrus3
Hello Adam,
I would like the spiders to come to some pages located in my modules directory and index them, so that I can optimize these pages too. So far, the index.php file has been indexed, and maybe some of the main website pages have been indexed, but these other pages have yet to be found.
I´m worried, however, that the search engines may never find these pages since they are only doorway pages-which will help people in a specific location find my service. (None of the main pages point to these doorway links)
The site will work for everyone, not just Idaho Falls and Pocatello, and the service is location-specific. Therefore, I believe that I will be using doorway pages in a legitimate way.
The problem is, these doorway pages are also location-specific pages. Until I create similar pages for every city, nationwide, it might seem too "tra-la-la" to have pages for Idaho Falls, and Pocatello on a "Here-is-Where-We-Are-Marketing" page, and nothing for any other cities. Doing so would show my clients less company development than I want to show.
A site map, might get me through this problem (since very few people in my opinion use these as a navigation tool, yet the spider might access and explore the rest of the site-including the doorway pages which I could include on the site map). Do you think I am on the right track?
If so, is creating a site map as simple as creating a page (or a few pages) showing all of my links, or is there more to it than this?
I would be wary of any "doorway" pages. Any time you try to show your business as being somewhere it isn´t, the search engines will penalize your site. By adding a sitemap to you site will help each of the pages get indexed that you want indexed. However, realize that the pagerank will not filter down stream as well through a sitemap as compared to links from on of the main pages and just because you have a page that is "indexed" doesn´t mean that it will show up in the SERPS.
tandrus5
Hello Adam,
I just got back from CES, and got to talk with some of the Google underlings. I asked them how I might moniter when their crawlers last indexed my page. They hinted that the date in the Cache might be an indicator of this information.
I´ve used your site as a case study, (since you are the only person I have personally talked with you has hit 1 at any significant level in the search rankings).
It bothers me that when I look at ahfx.net´s cached info on Google to see a date of 12/26/2005. To me, this older cache date bucks against the theory that "content is king"(assuming the Googlings hint was correct), since your content is dynamically refreshed every time someone opens the home page.
I know that you have created a monitoring software to track for you when your site has been spidered last.
This brings me to the following questions:
1) Does your software agree that 12/26/2005 was the last time Google spidered you?
2) If so, why do you think that you are not getting "the love" from Google, when you are always presenting them with "new content"?
Unfortuanely they might have led you astray to a point. I´ve been crawled 403 times this month (Jan 1 - Jan 10th) without counting any days after Dec 26th to the end of the year. However, just because they crawl my site doesn't mean that they re-"indexed" my site. You must further realize that Google uses many datacenters and each datacenter is updated on its own schedule. So I could have a cache date of Dec 26th on one datacenter and Jan 10th on another (as is the case on some of the datacenters right now). I would be interested in what keyword terms you are using to run your tests.
rich duplessis7
I was in a forum and noticed a lot of yahoo spiders looking at pages. how can I get the yahoo spider to visit my pages?
All you need is a normal HTML link from a currently "indexed" page in Yahoo, Google, or MSN and their spiders will find you automatically. If you want to submit a new site and invite the search engines to spider your site, you can look at our e-list for information on how to submit your sites for free.
Derrick9
I am new to all of this and I have a few questions.
1. How do I know when msnbot, googlebot and all the other bots out there crawl my site?
2. How do I get them to crawl my site more often?
3. How do I get a better PR? I believe the beter my PR the better off I will be, but not sure.
I have a google site map, but is there anything else I need in order to improve my site?
If anyone can answer these questions, it would be very helpfull to me. Also, please visit my site and let me know what you think I need to do in order to improve it.
Derrick, great questions. Here are some short answers:
Check your server logs for the following user agents: Googlebot, MSNbot, and Yahoo! Slurp. That will tell you the last time those bots have visited your site. If you don´t have access to your logs, you can check your "cached" version of your site to see the last time they indexed your page (but not necessarily the last time they visited your page)
Frequent updates (daily/weekly) to your site content will help bring back spiders more often. Also increasing your pagerank will help bring back spiders more often because your page will be "more important".
Read our blogs about pagerank for great ways to increase your pagerank.
A Sitemap is a good start to get pages indexed, but you need to make sure that you have done your basic SEO to get the benefit out of your sitemap.
jill gaylor11
great information on bots and spiders, very useful
cully cangelosi12
I am wondering is there any thing special I need to do, to have my website available sooner on all the search engines. I was told it could take as long as 6 weeks.
Cully, it really depends how the spiders find your site. If you get a link from a site that the spiders visit on a frequent basis, they will find you quicker. However, that doesn´t mean that you will indexed that quickly. Once you are "found" they have to decide on where to put your site. Based on previous experience, I tell people that they will normally need to wait 3 months for all of their pages (in a normal sized website) to be fully indexed. But keep in mind that indexing and ranking well are totally different.
Michiel Malotaux14
how do I quickly determine the page-size of any website?
Chris15
great resourse!
steve mac16
Hi
Great information thanks
Regards
Steve.
yoyoyo117
Heh, this is good knowledge :)
Rick Teller18
Can bots and other such programs get to web pages that are not referenced in a link such as an href or image? For example, if a page can only be reached via a link in an email, will a bot, spider, or crawler find it?
Selvam19
In nearly 45days old.Still my website not crawled by google.What can i do now? How to make google bot crawl my website??
concord20
Doorway pages are specially created to fool the search engines algorithm and
draw search engine visitors to a website. Doorway pages are Web pages
designed and built specifically to draw search engine visitors to your
website. They are standalone pages designed only to act as entry or door to
your websites. Usually these pages are theme based. They are also known as
portal pages, jump pages, gateway pages, and entry pages
Doorway pages are considered to be part of black hat and should not be used,
although many of seo companies use these pages for gaining more traffic.
Excellent post for webmasters! I did find this to be slightly to technical for those curious about web crawlers who are not computer people!
What Is A Spider?23
Excellent post for webmasters and those who already understand search engine spiders. I did find from some research users are more searching for a layman term explanation of what a spider is, so I decided to put it into layman terms and hope the article hows further enhance your vistors question about a spider.
What Is A Spider?24
A spider is a web application or program that visits websites and reads the page information, while searching for more pages on the website. This allows companies like google, with the googlebot crawler, MSN with the msnbot, and Yahoo! Slurp, Yahoo!´s Web Crawler to add to their abundant source of information.
EM25
Can someone view my profile on myspace if its private with a google bot?.meaning read my messages, look at my pictures? etc?
Buddy Dixon26
I have numerous web sites that I am either building or have built. Some are doing OK in YAHOO and some are not. I believe I have found what is wrong with the ones that aren´t doing well, my problem is trying to get YAHOO to crawl them again. Any advice would be very much appreciated. Thank you in advance.
RRj27
Thanks for your articles.
Si Gembala28
Spider bot was eating very much bandwidth.On my website, Googlebot spends bandwidth of 1.3 gigabytes.How to be more sparing bandwidth?
sevgi29
very nice than you admin
lee30
my blog hasnt been indexed for two weeks and my site has been indexed but never gets updated. Is there anyone who can tell me when google or bing updates there search engines please thankyou lee help
Viv31
How does a yahoo slurp spider manage to get into the shoutbox of a fansite??? And seemingly answer people
Mark Thomas32
What is the difference between a spider and a bot?
Laurie33
What is a Hound Spider Bot? This shows up as a content title, when clicked in analytics it shows tmwebminetestconfiguration.php under contnent performance.
Is this a bad bot? Do you think it installed something on our site?
Thanks for any help you cna offer.
Rajesh34
I am new to all of this and I have a few questions.
1. How do I know when msnbot, googlebot and all the other bots out there crawl my site?
2. How do I get them to crawl my site more often?
3. How do I get a better PR? I believe the beter my PR the better off I will be, but not sure.
JS35
Thank you for sharing the information. I wonder if you have some information about the MSN bot and Yahoo bot. It seems they hardly visit my sites.
Atacoplease36
Thanks for the article was very quick and to the point, I must ask however why do they sometimes go over the same content over and over, and I don´t think they are stuck in a loop when they do that.
Lilly37
I have a blog through Blogspot with Feedjit installed. It´s a widget similar to the Google Analytics that is available to Blogspot users only it gives me more information such as cities my visits are coming from. I´ve been getting visits from Mountain View, CA for a while and I assumed it was a Google bot or spider, however only recently have the Mountain View visits been disappearing, meaning if I´m watching my live traffic feed and see this city show up, the ´footprints´ are then erased. If I refresh my live view, the visit doesn´t show anymore. Can you solve this mystery? I can´t seem to find any information about it anywhere. Is this a Google bot/spider or an actual visitor erasing their visit?
Murter Kallande38
Please robot register my site
Andrew39
Hi, Many thanks for your great article about bots and spiders.
I just recently got more interest to know how these spiders and bots works. My website on average gets between 40 to 120 bots and spiders every time i visit it. One thing that i don´t understand is I have a "who is online" in my website admin. when i click to check who is online, instantly these bots drop down by half or even a 3rd. Did i scared these bots? I quite like these bots, because it get my site higher traffic ranking in search engine.
But now since you mentioned about bad bots, i´m a little bit concerned. If i make a page available with lots of email addresses "just unwanted spams email address", would these bad bots stay at that page and not going through my whole website like the good bots?