How do search engines work?

Search Engine Optimization

Without getting into technical details or programming jargon, a search engine's functionality can best be compared to the index of a book. With this analogy, you need to consider the entire internet to be the book, and the search engine to be an electronic index that knows the contents of every page it can find. If you're planning to invest in Search Engine Optimization (SEO) then your goal is to get your website into that index, and make it as prominent as possible.

Information Gathering and Processing

So then, how does a search engine gather its information? It sends out programs, which are known in the trade as "robots" (because they work automatically) or "spiders" (because they crawl the World Wide Web.) These programs gather all the information on your website that they can find, or that you allow them to see.

Information gathered by a search engine then gets "cached," or stored, in the search engine's servers, where the pages of one website are dissected and categorized among results for every other online page that a search engine can find. As an illustration, you can go to Google, do a search on the word "the," and find billions of results. This is because Google is returning the number of pages on the internet that contain the word "the," and it has listed them in its perceived order of importance.

SERPs - Search Engine Results Pages

Most search engines will deliver 10 results on a page for any keyword query. Search Engine Results Pages (or SERPs) generally continue into infinity, with each page showing web pages that are (ideally) less relevant as you go down the list. In the past, the user would get a bland list of SERPs in the form of 10 blocks of text, which may have shown the page's Title, Description, and URL. Over the past couple of years, "universal search" results will now show images, video results, and maps incorporated into the results. Some searches will also serve up definitions, weather forecasts, and movie show times. A Google search on 2+2 will likely give you the answer "4" without the presentation of any relevant sites with the same content. This information should be in the back of any website owner's mind, because new ways of presenting information may mean that alternative search engine presentation methods may be necessary, depending on the site owner's audience.

How Search Engines Figure Things Out

Since search engines were first created, the main problem has been the delivery of relevant results to the user. Because there are millions of pages about any given topic, it is necessary to categorize the pages that are most likely to be a match for the consumer's request. A search engine that consistently delivers good results gets visited more often, and gains trust in the eyes of the user. One of the ways that Google became so successful was that it found ways to index credible websites while eliminating search results that were more interested in traffic than relevancy. Google's method for establishing trust was to count the quantity and quality of links to any given website from other pages on the internet, and to use a formula for doing this automatically. This formula is known in the search engine world as the "Google Algorithm."

What is a search engine algorithm? Basically it is a complex set of rules that defines how any page on the internet relates to any other page. The Google Algorithm has over 200 different factors that help determine a web page's relevance for any particular keyword. Some parts of the algorithm are a closely guarded trade secret, while others are very well known. Even though search engine algorithms can be very complex, it is still possible to configure your website so that it gets properly classified by algorithms from major search engines.

Submission Tips

If you have a brand new website, you probably want other people to find it on Google, Yahoo, or Bing. There are a few ways to make this happen. First, you can submit the website to each engine by yourself, for free. In many cases you don't have to do anything, because search engines (like all other major server sources on the Internet) get continuous updates of all the active domain names in the world, and will periodically look at the sites that are attached to them. As of January 2010, a search engine may have a site cached within 24 hours of the domain name purchase. Another common way to get a site's content cached by a search engine is to get links from other websites, and allow the search engine spiders to follow those links to your site. Several SEO theorists will tell you that search engines "prefer" to find your site this way, and whether or not that is true, you get more value from the links coming off other people's sites.

One of the important things to consider when you want to get a website found by Google, or any other search engine, is that your site should have enough content on its pages to become a resource for its topic. The content also has to be readable by a search engine, and generally it has to be put on the site in such a way that it would be useful for human readers. Having a minimum of 250 words on any given page would be ideal, though this is not always possible when you have a large product catalog. You should also keep in mind that search engines classify the content on the page by using reference points like the site's title, paragraph headers, and other usability information. As a whole, the way a site links to its own pages is also a big hint for a search engine, so your website navigation should be easy for people and their robot counterparts.

Avoid Shortcuts or "Search Engine Tricks"

It is also important to note that search engines don't like to be tricked. Since the advent of search, people have found various ways to artificially inflate their search engine rankings. Usually the goal was to siphon traffic for popular searches into dubious offers by using "bait and switch" tactics. An entire industry grew up around fooling search engines, and this provided a bad user experience for the average search engine user. Additionally, people who sold products of any kind would use questionable practices to improve their position on a search engine. For instance, they would repeat a popular keyword several thousand times on a page, but camouflage it by placing the keyword in small type in a font color that matched the page background. Search engines now have multiple ways of spotting tricks designed to inflate rankings, and will take your site out of their index if they believe you are not providing a good user experience.

Creating Legitimacy and Originality

Some of the most highly ranked sites on any legitimate search engine get there by having trusted pages that are relevant to their topic matter. For example, Wikipedia results will appear for a wide variety of searches. This is not an endorsement of Wikipedia's accuracy, but instead a reflection of the trust that other web users place in the site as a whole. People also commonly link to Wikipedia articles when they are making a reference to a concept or topic. To get your own site to rank in search engines, it pays to become a trusted resource for your topic. When people refer to your site, and link back to your content, search engines take notice and reward you accordingly. By keeping in mind that the stated goal of Google is to "organize the world's information," you can build your site around this principle so it attracts the kind of traffic you are looking to get.

It should also be noted that search engines are looking for original content. If you have a site that just copies other information, even if you have permission, you should not expect to be featured prominently in search engine rankings. Fresh, timely content or reliable reference material is always going to have an edge over stale information. For topics where a search query is likely looking for the latest information, you may even want to update your site on a frequent basis. In the world of search engines, keywords that are relevant to timely topics are considered to be part of the QDF (Query Deserves Freshness) segment of an algorithm. If you have a blog that writes about celebrities (especially ones who are always up to something interesting) then you are more likely to be dealing in QDF keywords.

Paid Search

All major search engines have a Pay-Per-Click (PPC) option that allows people to run ads next to relevant search results. The search engine only charges the advertiser money when someone clicks on the ad. For people with a new site, or who can't bring their website into a high position on the natural search engine results pages, PPC is a viable option that can get traffic. However, some lines of business can charge a high dollar amount for every click, to the extent that a PPC campaign can be unprofitable for the average merchant. Even if a site owner opts for Pay-Per-Click as an advertising option, a separate search engine spider will likely visit the site to make sure its content is relevant to the keywords being purchased. Therefore, many of the same principles that apply to search engine optimization also apply to PPC in the sense that you want your keywords to land on pages that are specific to the search query. Additionally, you will convert more sales if visitors see keywords on the page that match their original search query.

Continuous Improvement

Search engines are always working to deliver the best possible results to the user. This is because trusted search results mean that a user will keep coming back, and users will be exposed to more of the paid ads that drive search engine revenue. Engines are always adding new features, and experimenting with ways to show results that are better tailored to the user. Search engines even keep track of a user's search history, or the type of searches coming from a particular IP address, in order to deliver "personalized" results that match up with a user's interests. As a result, your search engine results may be different from those of your neighbor, even when you're looking for the same keyword at the same time. If the search engine is doing its job, you will both get the best results possible.

Optimization and Staying Current with Search Engine Results

Previously, it was much easier for the average webmaster to stay current with search engine trends and modify a site so it got the attention it deserved from search engines. Over the past 5 years, it has become increasingly difficult. This is partially because there are a variety of filters and penalties that can be applied to a site if it inadvertently violates a search engine rule. An SEO consultant also has to be an amateur SEO historian in order to avoid esoteric punishments for site design practices. Furthermore, search engines update their algorithms on a continuous basis, instead of making one big update every 6 months. This means that SEO professionals need to stay up to date on Google Updates, Bing Trends, and changes to Yahoo. They need to review keyword rankings on a monthly basis, or sooner, in order to spot trends that may harm a client's web rankings. All of these factors are guaranteed to keep savvy SEO experts busy over the coming years as they help their customers gain and maintain high search engine rankings.

Written by Patrick Hare, SEO Engineer at Web.com Search Agency