At the time you are reading this post, there are currently 3,263,532,000+ (Yup!!! it’s a billion figure) internet users worldwide and the number is rapidly growing every second. This number is more than one third of world’s total population which consists of approximately 7.3 billion human beings. On average every second almost 31,500 GB of data is transferred over internet.
Every internet user has their own use of the internet such as some like to listen to songs or watch movies, some read news or books, some other buy or sell products and all other stuff like that, but do you know what is the most common thing among all these internet users? Any guess?
Well, all these users at least once in their life time have used any search engine to find anything over internet; I bet you had too, isn’t it? Just to tell you why I am so confident about this is because there are almost 50,000+ searches per second made by internet users using different search engines. That was my reaction when I first came to know about these numbers, and yes it’s true.
Ever wondered how efficient these search engines are? How they manage to accommodate all these searches simultaneously. This article is all about the working of a search engine, how they gather information and how they return your desired results on the go, but before going further let’s take a look on some big search engines that help us to find our desired thing over internet:
Those who don’t know what a search engine is, in a nutshell a search engine is a platform that take inputs from users, identifies items in a database that correspond to keywords or characteristics specified by the user, and return the exact or close matches for the search terms.
There are plenty of search engines over internet but following are the top leading search engines based on their Global search share:
With an exceptional difference in search share currently Google is the top leading brand in all search engines worldwide. So which is your favorite one?
The Google search engine is mainly famous for:
So how Google actually work behind this efficient mechanism, lets dig into this.
For a search engine to show results for any incoming user query, it should have all information at first place. To collect all information, the search engines, establish a web crawler which crawl from site to site, and grab all information.
All information that a web crawler gets from around the internet is then sorted out, categorized and indexed in data storage. Then there came a hectic part which is to match user queries and filter out the most relevant results, for this search engines develop an algorithm called search algorithm, which match the queries and filter out the results, and finally the filtered results are then showed to the users.
Do you notice that you almost took 20 secs to read this paragraph? In this short period of time Google have already coped to process about 1 million search queries.
Now let’s discuss about the processes and elements involved in all this one by one.
Web crawler also known as crawling bots or a web spider (curious about why it’s called a spider?).
These crawlers are systematic programs developed and used by search engines, which work automatically to browses through the internet. The World Wide Web is literally interconnected with each other website using hypertext links like a spider’s web, which is why it’s called a “World Wide Web”.
Web crawler use these hypertext linking structure to wander from one webpage to another webpage (now you know why it’s called a spider) and grab the information about the webpages.
The basic functionality of a web crawler is depicted below:
A decent search engine must have its hands on all information that is available over internet which a user can ask for. Also the content on websites keeps changing so it is required for search engines to keep their current data up-to-date. Hence the crawler has to run again and again to visit any website to check and grab any updated content.
From the very first website started in 1991, the total number of active websites is now reached near 1 billion; this means a massive number of webpages remains in crawling queue for the crawler and it keeps crawling all these stuff even without any tea break.
Note: All facts and numbers provided in the content are dynamic and are subject to change with time.