University of Szeged Klebelsberg Library
This lesson discusses some basic concepts related to e-books. In addition, the lessons that follow cover how to use them as well as related services.
This lesson focuses on three major topics.
E-books terminology
What is the difference between e-book formats?
Everyone has an idea about what an e-book is. For example, if three people talk to each other about e-books, they are likely to notice that actually each of them has a different understanding of what e-books are.
This section offers some guidance on the matter of what an e-book is. Is an e-book a digitized version of a book? Does it basically consist of the images of a paper-based book, like a normal book that has been photocopied? How can we search in an e-book or copy and paste parts of it? How can it fit on our phones and also be displayed on a large screen?
How does an e-book adapt to screen size?
What does it mean that an e-book is responsive*, and what is the importancee of the page layout?
Are the books that are read on e-book readers, tablets, computers or phones all e-books? How does a book become an e-book?
What are the advantages of e-books?
What should not be considered as an e-book? And what is it that is close to being an e-book but still isn’t one?
Is it true that, in 20 years, printed books and bookstores won’t even exist anymore? Will the smell of books really be a thing of the past along with nice and colorful book covers and fancy typography?
responsive
A document, including websites and offline electronic documents, that adapts to the screen properties of the device on which it is used.
In order to answer all these questions, some basic concepts have to be clarified first.
E-books terminology
An e-book* is like a traditional book, but in digital form. The term ‘e-book’ generally refers to an electronically created and distributed document (containing text and images) that is only accessible on a digital device and by using dedicated software. A real e-book is not bound in terms of its formatting, with its layout adapting to the screen of the particular device on which it is used. In addition, certain aspects of the formatting of e-books (font size and type, background color, etc.) can be adjusted by the reader.
More information: Glossary
What sparked the creation of the e-book format were two phenomena: digitization* and online distribution. While hypertext*-based books that mimic traditional books had been around since as early as the 1990s, eventually becoming available on the internet, the term ‘e-book’ itself has been in use only since portable devices with displays of the right size and quality appeared on the scene. In terms of hardware, the emergence of e-book readers with e-ink* displays was an important development.
digitization
The transfer of information stored on an analog carrier to a computer by using a special device, which converts traditional analog symbols into digits.
hypertext
A ‘hypertext’ is a text in which certain parts of the text are electronically linked to other parts of the same text or to other texts. The internet itself, in its current form, is essentially based on such hypertext (HTML) documents.
e-ink
It is a technology specifically developed for displaying e-books.
It is based on millions of tiny microcapsules that contain positively charged white pigments and negatively charged black pigments, all “floating” in a special transparent liquid. These white and black pigments become visible on the electronic paper depending on the electric charge applied.
Although the definition given above is, in fact, considered to be a sufficiently specific and precise definition of the concept of e-books, in everyday life the term e-book is also used to refer to any digitally accessible content that is similar in its amount to the amount of content typical of books.
This means that if someone takes photos of a book with their phone and saves the images in a single PDF-file (Portable Document Format), then they might call that file an e-book. In fact, many people use the term even when they are simply referring to large documents created in a text editor. However, it is important to note that such documents only meet some of the criteria of being an e-book.
It is perhaps clear from the above that in order to get to the heart of the matter, certain distinctions should be made, especially, between the following formats.
In order to help understand the distinctions and differences between various formats, the following sections examine all major e-book formats.
The Portable Document Format (PDF) is currently the most common standardized format for electronic documents, primarily designed for formatted, “print-ready” documents. PDF documents can only be edited to a very limited extent, but they can be viewed with a wide range of programs.
If a PDF document is created from an analog (i.e., printed) book, then it merely contains a series of images. However, in some cases, the scanned images might undergo character recognition, done by special software (OCR*) to extract text from them. This results in a so-called “layered” PDF document, containing the scanned images of the book pages, with a layer of text underneath the images. Even though this second layer is not directly visible, that text there can be searched as well as copied and pasted. However, this text usually remains unedited after optical character recognition, and, as a result, it may contain errors.
Of course, text documents originally created on a computer can be converted to PDF, retaining the graphical properties of the original document while allowing the text to be searched, copied and pasted.
Optical Character Recognition
OCR is the electronic conversion of an analog text with the help of a computer. The character recognition tool recognises letters in the document, and then it constructs words using these recognised letters.
Although the DjVu file format bears many similarities with the PDF format, it is less commonly used. Nevertheless, it is useful in many ways. The name itself comes from the French term ‘déjà vu’ (which means “I’ve seen it before.”), and it refers to the fact that the electronic copy of a book saved in this format looks exactly like the original.
This format is specifically designed for the digital publication and online distribution of books scanned as images, offering highly sophisticated compression for images of pages (with even the texture of pages preserved to some extent, e.g., in the case of old books). Additionally, this format allows searchable texts generated by OCR to be layered ‘underneath’ such images. Viewing or printing DjVu files requires a DjVu reader and viewer program, or a browser plug-in.
EPUB is an “industry-standard” file format for e-book reader devices, originally used by Barnes & Noble.
The name EPUB stands for ‘electronic publication’. EPUB is a free, open source standard for electronic books. EPUB files have .epub as their extension, but they are, in fact, ZIP packages, which contain HTML*/XHTML files (for textual content) and XML* files (for structural information), coupled with style sheets and image files. This open standard is supported by a growing number of publishers and software developers.
In its internal structure, the .mobi format is similar to the .epub format, in that it also consists of HTML/XHTML files. Originally used by MobiPocket Reader, it is a highly compressed format, which has significantly grown in recognition and popularity since Amazon.com made it the default format for its Kindle e-book readers.
However, the format is no longer supported by the company (since November 2023), and Kindle also use the EPUB format instead.
HTML
stands for ‘HyperText Markup Language’, which is a standard for hypertext documents on the internet.
XHTML
stands for ‘Extensible Markup Language’, which is designed primarily to allow structured text and information to be shared over the internet. XHTML files not only contain text but also structural information about the text.