University of Szeged Klebelsberg Library
Now that we’ve learned about electronic databases, let’s take a look at eBooks.
In this lesson, we will learn the terminology related to eBooks, some formats, peculiarities of their use and copyright.
eBook terminology
An eBook* is like a traditional book, but in digital form. The term ‘eBook’ generally refers to an electronically created and distributed document (containing text and images) that is only accessible on a digital device and by using dedicated software. A real eBook is not bound in terms of its formatting, with its layout adapting to the screen of the particular device on which it is used (responsive*). In addition, certain aspects of the formatting of eBooks (font size and type, background color, etc.) can be adjusted by the reader.
More information: Glossary
What sparked the creation of the eBook format were two phenomena: digitization* and online distribution. While hypertext*-based books that mimic traditional books had been around since as early as the 1990s, eventually becoming available on the internet, the term ‘eBook’ itself has been in use only since portable devices with displays of the right size and quality appeared on the scene. In terms of hardware, the emergence of eBook readers with e-ink* displays was an important development.
digitization
The transfer of information stored on an analog carrier to a computer by using a special device, which converts traditional analog symbols into digits.
hypertext
A hypertext is a text in which certain parts of the text are electronically linked to other parts of the same text or to other texts. The internet itself, in its current form, is essentially based on such hypertext (HTML) documents.
e-ink
It is a technology specifically developed for displaying eBooks.
It is based on millions of tiny microcapsules that contain positively charged white pigments and negatively charged black pigments, all “floating” in a special transparent liquid. These white and black pigments become visible on the electronic paper depending on the electric charge applied.
Although the definition given above is, in fact, considered to be a sufficiently specific and precise definition of the concept of eBooks, in everyday life the term eBook is also used to refer to any digitally accessible content that is similar in its amount to the amount of content typical of books.
This means that if someone takes photos of a book with their phone and saves the images in a single PDF-file, then they might call that file an eBook. In fact, many people use the term even when they are simply referring to large documents created in a text editor. However, it is important to note that such documents only meet some of the criteria of being an eBook.
It is perhaps clear from the above that in order to get to the heart of the matter, certain distinctions should be made, especially, between the following formats.
In order to help understand the distinctions and differences between various formats, the following sections examine major eBook formats.
PDF, DjVu, EPUB
The Portable Document Format (PDF) is currently the most common standardized format for electronic documents, primarily designed for formatted, “print-ready” documents. PDF documents can only be edited to a very limited extent, but they can be viewed with a wide range of programs.
If a PDF document is created from an analog (i.e., printed) book, then it merely contains a series of images. However, in some cases, the scanned images might undergo character recognition, done by special software (OCR*) to extract text from them. The result is a so-called ‘layered’ PDF document, containing the scanned images of the book pages, with a layer of text below the images that can be searched, copied and pasted. However, this text usually remains unedited after optical character recognition, and, as a result, it may contain errors.
Of course, text documents originally created on a computer can be converted to PDF, retaining the graphical properties of the original document while allowing the text to be searched, copied and pasted.
Optical Character Recognition
Optical character recognition (OCR) is the electronic conversion of analogue text by computer. OCR software recognises the shapes of letters and assembles them into words.
EPUB stands for Electronic Publication and is a free, open source standard for electronic books. Files written in EPUB have the extension .epub. .epub files are actually ZIP packages in which the text content is carried by HTML*/XHTML files, with XML* files describing the structure, plus style sheets and image files. It is an open standard supported by a growing number of publishers and software developers.
And if you want to convert such files from RTF, HTML or other documents yourself, you can use Calibre, for example.
HTML
stands for ‘HyperText Markup Language’, which is a standard for hypertext documents on the internet.
XML
stands for ‘Extensible Markup Language’, which is designed primarily to allow structured text and information to be shared over the internet. XML files not only contain text but also structural information about the text.
Interesting fact
Although the DjVu file format bears many similarities with the PDF format, it is less commonly used. Nevertheless, it is useful in many ways. The name itself comes from the French term ‘déjà vu’ (which means “I’ve seen it before.”), and it refers to the fact that the electronic copy of a book saved in this format looks exactly like the original.
This format is specifically designed for the digital publication and online distribution of books scanned as images, offering highly sophisticated compression for images of pages (with even the texture of pages preserved to some extent, e.g., in the case of old books). Additionally, this format allows searchable texts generated by OCR to be layered ‘underneath’ such images. Viewing or printing DjVu files requires a DjVu reader and viewer program, or a browser plug-in.
Technology, eBook reader and editor
When it comes to eBooks, it is useful to be aware of certain technological issues related to how eBooks are created and how they are read and used.
E-ink is a technology specifically developed for displaying e-books. It is based on millions of tiny microcapsules that contain positively charged white pigments and negatively charged black pigments, “floating” in a special transparent liquid. These white and black pigments become visible on the electronic paper depending on the electric charge applied.
As a display, electronic paper consumes electricity only when there is a change in the content that is displayed. This means that an e-book reader only uses electricity when a page is turned, and while the same page is being viewed, the screen consumes no energy at all.
One of the issues to consider is related to the complexities of turning analog books into e-books. The key to this is OCR or optical character recognition. Technically, OCR is the electronic conversion of a type-written, handwritten, or printed (analog) text with the help of a computer or similar device. Using a scanned document or even a photo of a document, a character recognition tool, containing elements of artificial intelligence, recognizes letters in the document based on their shape, and then it constructs words using those recognized letters. In the process, it may also apply necessary corrections by relying on dictionaries.
Most layered PDF documents that are widely used are created from analog documents using scanning and optical character recognition. All of this is associated with responsiveness, an issue that has already been discussed here. ‘Responsiveness’ is a technical term (similar in meaning to reactive, adaptive, flexible, etc.), and it is used in relation to electronic documents, the content of which takes on a layout that adapts to the properties of the screen of the device on which they are used. This means that the layout of the given document is essentially flexible and transformable.
The basic difference between eBook formats is therefore the fixed-layout and dynamic-layout format. This is one of the reasons why it is not possible to single out one device as the best for reading eBooks. Another is the purpose for which we read eBooks and whether we need to use other programs to do so.
Let’s look at some options.
The eBook reader is a device that uses e-ink technology and is primarily designed for reading real eBooks. If you don't need to use other programs while reading, such a device can be a good choice because of its advantages.
1. It can run continuously for up to three or four weeks without recharging.
2. The screen resolution is much finer, really paper-like. It is also easier on the eyes to continue reading, especially if you can illuminate the page with a side light.
You can use different reading software on different devices, depending on your personal choice. We would like to draw your attention to the eBooks available for reading and borrowing in the SZTE Klebelsberg Library.
If you want to borrow eBooks from the SZTE Klebelsberg Library instead of reading them online, you can use Adobe Digital Editions. You can find out how to use it here.
The first thing to consider when creating an eBook is whether the software used to write the text allows it to be saved in a format that is itself an eBook format, or whether the text can be saved in an output format that can then be converted into an eBook.
Example
The open-source word processor LibreOffice can save files in the EPUB format. However, Apple Mac users do not have to worry either, as the word processor Pages, which comes pre-installed on their computers, is also capable of the same.
Converting existing text from an HTML, XML or Word format into an eBook is usually straightforward. The difficulty usually arises when creating an eBook from a PDF or other fixed format. The best solution may be to run the PDF through a good OCR program. This converts it to text, but then requires time-consuming corrections to the text and then editing of the document. So it is a lot of work to turn a scanned PDF into a real eBook.
Calibre is a solution for practically everything when it comes to dealing with eBooks. Calibre is capable of the following:
With the increasing popularity of eBooks, there is a growing demand for software that helps people create, edit, and manage high-quality eBooks by themselves. This makes the high functionality provided by Calibre all the more valuable, especially in view of the fact that it makes it easy to do all of these things on a wide range of platforms. An introduction video is created about the software.
The copyright protection of electronically published content has become technologically possible. The key to such protection is usually some kind of software-based solution that guarantees that only the rightful owner of a piece of purchased content can actually use the content. These technologies have become known collectively as DRM (digital rights management) technologies. However, the term ‘digital rights management’ refers not only to the whole range of technological solutions that offer technical and legal protection for digital content in general but also to a specific software-based way of implementing such protection.
The general purpose of digital rights management is to make it possible to identify digital works, to manage the rights to such works, and even to allow eBooks to be borrowed by readers. DRM solutions, therefore, ensure that the use of digital content is regulated; royalties are paid; use is monitored; and rights are enforced.
There are two types of DRM for eBooks.
The relevant Wikipedia article provides a sufficient overview of DRM solutions.