Web search engine

From Saferpedia

Jump to: navigation, search

A web search engine is a tool designed to search information regarding WWW. Search results are usually displayed in a list and are called hits. Information can be displayed as web pages, images and other file types. Unlike web directories, maintained by human editors, search engines operate algorithmically or are a mixture between algorithmic operation and human operators.

History

The first tool used to search on the Internet was Archie. The name comes from the word "archive". Archie was created in 1990 by Alan Emtage, student to the McGill University in Montreal. The software developed directories lists for all public files on FTP sites creating a database that could be scanned after file names. However, Archie did not index the content of these sites.

Next it appeared Gopher (1991 by Mark McCahill, Minnesota University) and it led to the appearance of two new search programs: Veronica and Jughead. As Archie, they were searching titles and file names stored in the Gopher indexing system.

In the summer of 1993 there has been no web search engine. Oscar Nierstrasz from the University from Geneva wrote a series of Perl scripts copying in the mirror existing web pages and rewriting them in a standard format forming the bases for the W3Catalog, the first web search engine launched on September 2 1993.

In June 1993 Matthew Gray, then to the Massachusetts Technologies Institute, produced what is thought to be the first web robot, World Wide Web Wanderer based on Perl and used to generate a index called "Wandex". Wanderer's purpose was to measure the size of WWW which it did by the end of 1995.

The second web search engine, Aliweb, appeared in November 1993. Aliweb did not use a web robot but it was dependable of admins notifications trough the existence of an index file for each site.

Jump Station (launched in December 1993) used a web robot to find web pages and to build them an index. So this was the first tool of resource discovery combining the three essential characteristics of a search engine (crawling, indexing and searching).

One of the first full text search engine based on crawlers was WebCrawler, launched in 1994. Unlike its predecessors this allowed users to search any word in any web page. This search became the standard for most search engine. This was also the first web search engine known by the public.

Soon after this appeared several engines competing for popularity, like Magellan, Excite, Infoseek, Inktomi, Northern Light and AltaVista. Yahoo! was one of the most popular search engines but its search function was working mostly on its web directories instead on full text copies of web pages. Information seekers could cover the directory but couldn't make searches based on keywords.

Around 2000s the Google search engine increased in importance and the company obtained better results for several searches with an innovation called "Page Rank". This algorithm classifies web pages according to page rank the number of websites linking to that page. Also, Google maintained a minimalist interface for its search engine, unlike many competitors who transformed their search engines interface in a web portal.

Microsoft launched its first search engine in the fall of 1998 using search results from Inktomi. At the beginning of 1999 the site started to display mixed results from Looksmart and Inktomi. In 2004 Microsoft started the transition to its own search technology supplied by its own crawler called msnbot.

The Microsoft search engine was renamed to Bing and relaunched at June 1 2009. On July 29 Yahoo! and Microsoft finalized an agreement stipulating that the Yahoo! engine will be powered by Microsoft technology.

How it works

A search engine operates following these steps:

  1. Crawler web;
  2. Index;
  3. Search.

Web search engine works by storing information about web pages taken from HTML files. These pages are visited by a web crawler (also known as spider) - an automated web browser following each link in a site. Each page's content is analyzed to establish how it should be indexed (for example there are extracted words from titles or special fields called meta tags). Data about web ages are stored in a database to be used later in queries. The purpose of an index is to allow information to be found as soon as possible.

Some engines like Google, store pages' sources entirely or partially and information about these pages while other engines like AltaVista store each word of each page.

When a users inserts a query in a search engine (usually using a key word) the engine examines its index and offers a list of the most appropriate web pages according to its criteria.

The usefulness of a search engine depends on the relevance of the results. Although there are millions of web pages including a word or a phrase, some pages can be more relevant than others. Most engines use methods of result classification to offer the best results.

Personal tools
In other languages
EU flag
Co-funded by the European Union
Sigur.info - Internet mai sigur pntru copii
The network coordinator: www.saferinternet.org
The Safer Internet programme: http://ec.europa.eu/saferinternet
Partners: Positive Media Centrul Focus Salvati Copiii Romania