We all know its a little bit of a task to understand how search engines work, never mind try to explain it to someone else. So I've decided on 2 roots to explain how search works, today is the first - a simplified over view of how search works, the second will come in a few weeks in the form of a somewhat detailed guide to how search works and how you can improve your site to make the most of the known search mechanisms. [edit: second root will be posted in a few weeks]
So without further ramblings...
Search Engines and more accurately their algorithms have always been something of a mystery. As the saying says; there are known knowns, known unknowns and unknown unknowns. Basically this means there are certain things we know, certain things we know change and things we've no idea exist as SEO's because algorithms can change any seconds (and generally do) making for many more variables than you may think. Confused yet? If not, just wait a few moments as you continue to read this article.
This overview takes you from your content creation to search queries to the results page.
From Generation to Consumption
1. You create a new blog post / web page / add some content to some website
2. Search Bot crawling the web finds this content
Search Bots (SB's like google bot) will follow links on website and so if you have no or few links you are unlikely to be crawled.
SB's won't index pages and directories if told not to by a robots.txt file.
If a link has a rel="nofollow" attribute SB's wont index the linked page.
SB's may also find pages on your site using a sitemap (a specialised XML file).
The more links a page has from higher ranking websites you have the better your quality score will be in the index. (as long as these linked aren't nofollows)
3. Once crawled, shortly after another bot will come along and index the content.
Meta tags (title, description etc) are considered to be stored in one index, used for broad match searching.
"on page" content is believed to live in another index, used for more obscure long tail searches.
It is important to remember, when you are searching you are not searching the active web, but rather a cache (store) of the web that the search engine holds internally - this is to stop SEO's manipulating the index easily.
4. The search engine will then estimate your ranking, generally based on links to the content (though not always).
5. The search engine will cross check the content with the policies of engine.
Web spam teams double check for real content, test the search algorithms and refines it.
Google uses over 10K testers (normally in India I believe) to test the quality of results.
Search Engines then check for spam reports.
6. User send a search query (searches the index)
In reality you aren't just searching one index of the engine but multiple indexes and factors of the search engine.
During this process Google suggests relevant keywords.
7. Initial search results are shown.
Google may show billions in the index of relevance, but only the top 10K are generally shown.
Localised Search Results - google and bing will use your IP location to show localised results higher in the results.
8. Results are shown in accordance to search ranking, authority and duplicates are removed.
The big search engines use the keywords of the search to find adverts and include these in the relevant hotspots of the results page.
Many search engines also offer refinement tools alongside the results, such as "blogs", "news" & "social" offering the user ease of access to data they want.
Multiple pages from same domain are likely to be grouped together ("clustered results")
Trending sites (locally/nationally) move up the index temporarily.
User personalisation - google will add your previous results searches and click through (clicks to website) into the results page, putting your most viewed and searched pages at the top.
9. The final results pages (serp's) are shown to the user.
From submission of a query to the results page showing take less than 100 milliseconds (generally).
This crawl, index, rank & search route is the same for most types of content.
10. Search is a huge industry so what is said above is always changing to some extent, most of it stays the same but there are extra variables outside of the 9 steps above ... such as how fast your content is indexed, how you achieve getting links, who you link to (apparently), whether you are known through social media ... also websites/blogs etc are linked via the algorithms to your social media profiles ... for example try searching your name your site and your twitter page may show up!
Do you have any questions about how search engines work? or how to get to the top of Google? simply tweet me @andykinsey or leave a comment below