SUPERSTARS AND OUTSIDERS IN ONLINE MARKETS

SUPERSTARS AND OUTSIDERS IN ONLINE MARKETS

 Data and methodology

Electronic books refer both to a title and to a reader of specific formats such as the Kindle reader.36 In this article, we use the word e-book for the content. Several formats coexist, proprietary or not: “.azm”, “.epub”, “.pdf”, “.txt”, etc. Each format has its own specific features (reading images, printing content, bookmarking, searching indexes, etc.). We use data from the Kindle store owned by Amazon.com that are readily available from the Internet. Indeed, Amazon keeps archives of the monthly top 100 best-selling print books (since 2000) and electronic books (since 2007). We ran automated PHP scripts to collect characteristics of print and electronic books (including rank, author, ISBN and so on) that entered at least once in the respective monthly top 100 list. Top print books were followed from July 2000 to July 2010, and top e-books from November 2007 (first available date of the archives) to July 2010. Amazon is the leading seller of e-books in the US. Overall, the dataset covers a period of 121 months for print books and 33 months for e-books. In the rest of the article, we mainly focus on the period where both print books and e-books are available, i.e. November 2007 to July 2010.

Data collection

The characteristics of the books are the following: title, author, ISBN (for the print format) or ASIN (for the electronic format) which stands for Amazon Standard Identification Number, monthly sales rank (from 1 to 100), genre, publication date, observation date, the average rating of customers, and price. It is important to stress several data issues related to online data collection of book characteristics. First, the same book can have a different title between its print and electronic versions; for instance, “Harry Potter 7” and “Harry Potter seven”. These two titles are the same and we treat them as such in our database by recoding them to a single title. Secondly, we face a similar problem for the name of the author of a title that might be spelled differently in each format: “J.K. Rowling” and “Joanne Rowling” for instance. We also recoded these fields to a single field. Thirdly, there are different versions of the same book like “soft paperback”, “hard paperback” versions, limited edition, audiobook edition, etc. These versions have different unique identifiers (ISBN or ASIN) but correspond to the same content. In what follows, we treat these different versions as a single informational content, which is what we are eventually interested in. At this point, we have a unique identifier for a book that is the combination of the author’s name and title. Fourthly, the publication date is sometimes misreported on Amazon websites. Indeed, Amazon.com reports a publication date 70 of a specific version of a print book or an e-book that depends on the edition. While this is a minor issue for recent titles, we need to correct publication dates of books that were reprinted or that exist in different versions. We define the publication date of a title as the earliest publication date of existing versions sold on the Amazon website. For books in the public domain, we use the publication date of the first edition collected from Wikipedia. We have excluded books that were published before the 19th century to avoid biases due to a small number of observations. For reference, 7 out of 2340 unique titles can be considered “historical”. Finally, books are assigned to genre categories on Amazon. We have grouped these genres in 6 categories for e-books: Practical (including self-help and hobbies) essays, science and biography (non-fiction); books for young readers; guidebooks and howto; fiction; reference books and textbooks. Print books have an additional category: “Comics and graphic novels”. A title that belongs to several categories is assigned to the category under which it had the highest sales rank. When Amazon does not assign a category to a title, we used keywords to assign a category. To sum up, the categories that we have assigned to books does not necessarily match the categories listed on the website. 4.3.2 Descriptive statistics The data collection process resulted in two data sets: one for e-books and one for prints. Overall, 1,244 e-books appeared at least once in the top 100 list of best-selling titles versus 1,097 print books. There are only a few cases of multiple versions of the same book as the total number of unique titles is fairly similar (1,238 for e-books and 1,041 for print books). As a consequence, the top 100 list of bestselling electronic titles is characterized by a higher turnover. In this paper, we are interested in both superstar and long tail properties of electronic commerce in order to test any cannibalization or market expansion effects. To do that, we therefore merge the two data sets on e-books and prints into one single data set and decide to create three categories of titles: “superstars”, i.e. titles that enter in both top 100 list of print and electronic books, “print preferred”, i.e. best sellers in print but not as e-books, and “digital outsiders” which are successful as e-books but either currently less successful in print or have no print equivalent. These three categories are mutually exclusive. We provide descriptive statistics in Table 1. 71 First, among the 1,861 unique titles in our dataset, 33.5% are classified as print preferred (623), 44.1% are digital outsiders (820) and 22.4% (418) can be considered as superstar titles according to our previous definition. Thus the extent of cannibalization is rather limited in the sense that 77.6% of all titles are only best-selling in a given format (print or electronic). Secondly, we count 79 digital outsiders (or 4.2% of all unique titles) that are only available in kindle edition (without a print equivalent). Out of these 79 titles, we find two categories of titles: classics and pure e-books. The first category of titles (27 e-books) are old titles released in print with a specific edition and which have not the exact counterpart on Amazon kindle store. The second category (54 titles), are specific to the kindle edition and often correspond to “Harlequin-like” e-books (see the Appendix A for a representative sample). Thirdly, the number of books by author is the largest for superstar titles. Moreover, we define the best rank as the best monthly sales rank achieved by a book during the period of observation. This best Print prefered Digital outsiders Total Format – – Print Kindle Number of unique books 623 820 1861 Number of unique authors 523 601 – Number of books by author 1,5 2,0 – Average date of release 2004 1999 – Minimum date of release 1900 1811 – Average rank of entry 59,8 53,1 51,1 44,4 – Average rank of exit 64,6 59,9 59,9 65,4 – Average best rank 51,3 48,8 38,6 35,5 – Average lifespan (in month) 2,7 2,0 3,5 3,5 11,7 Average number of comments by book 4,22 3,8 3,9 3,9 15,82 Average Amazon price 16,8 10,3 15 12 54,1 Average Amazon price: hardcover (n=402) 18,5 16,5 35 Average Amazon price: paperback (n=184) 12,9 9,8 22,7 Comics & Graphic Novels 11 0 11 Business & Investing 88 17 144 Non fiction 136 55 285 Children books 76 31 129 Fiction 65 634 925 Professional and Technical, Reference 38 9 47 Practical 201 57 294 Other 8 17 26 Categories 22 Superstar 418 288 2,7 2007 1905 0 39 94 226 0 36 1 72 rank is on average lower (better) for superstar titles than for print preferred and digital outsider titles. Similarly, superstar titles have a slightly lower rank of entry on average (average rank of entry) and stay longer in the monthly top 100 list (average lifespan). Fourthly, digital outsiders are on average older with an average release date in 1999 compared with print preferred titles (2004) and superstar titles (2007). If we analyze more closely the oldest titles in each category, we again find a digital outsider title first published in 1811 that made it to the monthly top 100 list of best selling electronic books, earlier than the oldest print preferred title (1900) or the oldest superstar title (1905). Fifthly, print preferred and digital outsiders belong to specific genres, print books having more varied genres. The “fiction” category (291 out of 1,041 or 28%) is the main category of print books, followed by “practical” (23%), and “non-fiction” (22%). Electronic books are predominantly “fiction” (860 out of 1238, or 69%), followed by “non-fiction” (12%) and “practical” (8%). The category “Professional, technical and reference” is more popular among print preferred books than digital outsiders. A really small fraction of print preferred titles correspond to “Comics”, currently unavailable in electronic format. Finally, electronic books are on average cheaper than print books: USD 10.3 for digital outsiders vs. USD 16.8 for print preferred titles. Similarly, superstar titles cost more in print format (USD 15) than in electronic format (USD 10). 

Formation et coursTélécharger le document complet

Télécharger aussi :

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *