At the time you are reading this post, there are currently 3,263,532,000+ (Yup!!! it’s a billion figure) internet users worldwide and the number is rapidly growing every second. This number is more than one third of world’s total population which consists of approximately 7.3 billion human beings. On average every second almost 31,500 GB of data is transferred over internet.
Every internet user has their own use of the internet such as some like to listen to songs or watch movies, some read news or books, some other buy or sell products and all other stuff like that, but do you know what is the most common thing among all these internet users? Any guess?
Well, all these users at least once in their life time have used any search engine to find anything over internet; I bet you had too, isn’t it? Just to tell you why I am so confident about this is because there are almost 50,000+ searches per second made by internet users using different search engines. That was my reaction when I first came to know about these numbers, and yes it’s true.
Ever wondered how efficient these search engines are? How they manage to accommodate all these searches simultaneously. This article is all about the working of a search engine, how they gather information and how they return your desired results on the go, but before going further let’s take a look on some big search engines that help us to find our desired thing over internet:
The big Brands:
Those who don’t know what a search engine is, in a nutshell a search engine is a platform that take inputs from users, identifies items in a database that correspond to keywords or characteristics specified by the user, and return the exact or close matches for the search terms.
There are plenty of search engines over internet but following are the top leading search engines based on their Global search share:
With an exceptional difference in search share currently Google is the top leading brand in all search engines worldwide. So which is your favorite one?
The Google search engine is mainly famous for:
- Simplicity – Look and feel is simple
- Quick Response - High speed to address any query
- Precise results – Mostly Google satisfy users need on their first attempt to find anything
Some fun facts about Google:
- Back in 1999 when Google merely started its business, it took Google about a month to crawl and create index of about 50 million pages.
- In 2012, the same above task of crawling and indexing of 50 million pages was completed in less than a minute.
- Every day about 16% to 20% of all queries searched in Google are those which are new to Google which were never asked before.
- On average about 1,500 miles is the distance a query travel from the user’s computer to one of its 12 data centers and back to return the answer to the user.
- Any query which searched on Google has to travel through almost 1000 computers or servers to return the answer, all this happen in a time frame of 0.2 second.
So how Google actually work behind this efficient mechanism, lets dig into this.
Working of Google search engine:
For a search engine to show results for any incoming user query, it should have all information at first place. To collect all information, the search engines, establish a web crawler which crawl from site to site, and grab all information.
All information that a web crawler gets from around the internet is then sorted out, categorized and indexed in data storage. Then there came a hectic part which is to match user queries and filter out the most relevant results, for this search engines develop an algorithm called search algorithm, which match the queries and filter out the results, and finally the filtered results are then showed to the users.
Do you notice that you almost took 20 secs to read this paragraph? In this short period of time Google have already coped to process about 1 million search queries.
Now let’s discuss about the processes and elements involved in all this one by one.
The Hard Worker (Web Crawler):
Web crawler also known as crawling bots or a web spider (curious about why it’s called a spider?).
These crawlers are systematic programs developed and used by search engines, which work automatically to browses through the internet. The World Wide Web is literally interconnected with each other website using hypertext links like a spider’s web, which is why it’s called a “World Wide Web”.
Web crawler use these hypertext linking structure to wander from one webpage to another webpage (now you know why it’s called a spider) and grab the information about the webpages.
Step by Step working of a web crawler:
The basic functionality of a web crawler is depicted below:
- The crawler needs a starting point to start its journey, so a URL is provided to the crawler.
- The crawler send request to the server of the URL to bring the content of the URL.
- After getting overall content of the URL’s HTML file, Crawler extract hypertext links from the file.
- It prepares a list of URLs.
- Log the content of current URL in Database.
- Grab one of the extracted URL from the URL list, it prepared earlier and do the same again.
- The crawler continues this process until or unless the URL list is finally finished.
A decent search engine must have its hands on all information that is available over internet which a user can ask for. Also the content on websites keeps changing so it is required for search engines to keep their current data up-to-date. Hence the crawler has to run again and again to visit any website to check and grab any updated content.
From the very first website started in 1991, the total number of active websites is now reached near 1 billion; this means a massive number of webpages remains in crawling queue for the crawler and it keeps crawling all these stuff even without any tea break.
--- End of Part 1---
Note: All facts and numbers provided in the content are dynamic and are subject to change with time.