The creation of a search engine, or automation of Yandex.Server

I'm from Belarus, the largest Internet service provider is byfly. This provider provides its users with free access to all sites that are hosted within the country (guest resource).

Each user byfly in the Arsenal was a set of files with links to free resources available. So came the idea of creating a search engine for these resources, and in August 2009 it saw the first users. Attendance grew quite quickly and, at the peak of popularity, a resource was visited by about 34 000 unique users a day.



In the heart of the search engine Yandex works.Server. Here is it the control panel:


Once it became clear that having only this functionality (3 buttons is on/off search, on/off indexing and off Ya Server) sane search engine not to do.
So I had a bit of work and it turned out:



There have been numerous additional functions, such as:
the
    the
  • split index into resources and create for each of them, his mask and the rules of the index;
  • the
  • system automatically update indexes;
  • the
  • tracking abuse by webmasters;
  • the
  • automatic cleaning of the index from garbage documents;
  • the
  • search for videos and papers;
  • the
  • system health monitoring...

When an administrator in the new control panel takes any action (for example, wants to remove a site from the index), the system will generate a configuration file with the required settings:

the
function cfg_file_generate($adr, $option, $timeout, $delay, $rules) {
global $cfg_useragent;
$cfg = "<Webds>
$rules
<IndexedArea>
HttpPrefix http://$adr
Options $option
<HttpOptions>
Timeout $timeout
Delay $delay
<Headers>
User-Agent: $cfg_useragent
</Headers>
</HttpOptions>
</IndexedArea>
</Webds>";
return $cfg;
}

Then send the request to update the index in the standard control panel Yandex.Server. Re-indexing occurs according to the rules specified in the configuration file. The administrator will see the result of the operation in the modified control panel. All manipulations are carried out in the background through ajax requests.

At the peak of popularity in the day, there were more than 300,000 search queries, ~34 000 unique users, search index of more than 2 000 websites. For all needs is always more than enough hardware Intel 2.33 VPS (1 core), 1 GB of RAM.

The users themselves on the enthusiasm began to develop various add-ons for browsers, search, and even desktop version. Others, wanted so badly to get into the search index, even threatened :-)

Later, this provider has abolished the free access and the popularity slowly started to wane. Now keeps at level of 3 000 unique users a day.

search.pusk.by originally meant as a short-term project. I estimated the time of his life in 1 year, but it lasted longer :-)

Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Fresh hay from the cow, or 3000 icons submitted!

Knowledge base. Part 2. Freebase: make requests to the Google Knowledge Graph

Group edit the resources (documents) using MIGXDB