University of Szeged Klebelsberg Library
The aim of this lesson is to show how internet search engines work and also to provide practical help on how to use them efficiently and effectively.
Today, internet search engine results for literature mapping and descriptive research are used as much as traditional library resources. However, the effectiveness of searching on the internet is highly dependent on how someone uses keywords and operators (such as and, or, not) when searching for information.
Tarcsi Ádám, Abonyi-Tóth Andor és Horváth Győző (2012): WEB 2-es eszközök a társadalmi és marketingkutatások szolgálatában.
Eötvös Loránd Tudományegyetem Informatikai Kar, Budapest. Tankönyvtár
Google has been dominating the world of internet search engines since the early 2000s or so. As a search engine, Google hardly needs to be described here, considering that the name of the company that provides it has essentially become synonymous with the engine’s core function, and the term “googling” is already being used in most languages to denote the concept of internet search itself.
Still, other players on the market are worth mentioning, too.
If someone wishes to search on the internet exhaustively, it is important to be aware of other options besides Google but also to get to know all the features of Google itself in order to learn how to use it effectively, considering that the number of pages indexed by the search giant is around 130 trillion.
Undoubtedly, Google dominates the market of internet search engines, and given its reach, there is no way to avoid using it. Therefore it is important to be familiar with its existing features and to constantly learn how to use its new features. However, it is also useful to be aware of the ‘price’ of using Google (i.e., stored search and keyword data, paid ads, targeted content appearing in search results).
General search engines only reveal a fraction of what is available on the entire internet. According to some analysts, that fraction amounts to less than 10 percent. The primary reason for this is that there are various types of websites with various types of content. These may be categorized based on the level of access they provide to their content.
In this lesson, websites are distinguished in terms of their visibility to search engines as follows.
This is the part of the internet which is visible to regular search engines and, therefore, indexed by them.
This is the part of the internet which is not indexed by regular search engines. Content here includes:
This is another part of the internet where content is not indexed, and most of the content here is actually illegal or linked to illegal activities.
The following sections cover a few search engines that can explore certain regions of the deep web in some way. There is also an overview of some special search engines that are limited to searching the surface web but, at the same time, come with unique functions that surpass the functions of regular search engines in terms of exploring content.
Wolfram Alpha is a versatile search engine, which uses complex artificial intelligence to tap its knowledge base. Its algorithms analyze search terms on the basis of multiple factors. It is also very popular among students, because it can solve complex mathematical problems. Wolfram Alpha “knows the answer to everything”: from data about planes flying above Chicago at any given moment, through the complete works of Newton, to the scales of magnitude of earthquakes with exact dates and geographic coordinates.
Google Scholar is a search engine that searches only for scientific and scholarly publications. It allows users to search for various types of documents across any array of disciplines, e.g., articles, thesis, books. Google Scholar indexes content in the databases of academic publishers; on the websites or databases of academic societies and higher education institutions.
Internet Archive is a freely accessible digital library of images, texts, videos, and audio files.
Since 1996, it has also been archiving websites (i.e., web archiving) as part of a service that has come to be known as Wayback Machine. The archive stores not only the most recent versions of indexed sites (like Google does, for instance) but also previous versions. This allows users to choose from several so-called save points. Navigating among saved versions may be done with the help of a user-friendly calendar. It is important to note, however, that Wayback Machine stores mainly textual data, without large images and media content.
Carrot2 is similar to Wolfram Alpha in that it indexes public websites and returns search results based on open sources. The main strength of its Java-based search engine lies in the way search results may be represented and grouped. For example, thematic lists, graphs, and other visually appealing representations can be created. In addition, the resulting content can be easily exported in any format that is most ideal.
In order to carry out effective research, it is important to be aware of the possibilities offered by the various search engines. They can help us to enrich the information collected to produce a work that is as up to date as possible.