Free Offline Access: A Primer on OA' (OA Prime)
From Peter Suber's May 2011 Issue of the SPARC Open Access Newsletter.
In the shorthand definition I like to use, OA literature is digital, online, free of charge, and free of most copyright and licensing restrictions. Most definitional squabbles focus on the fourth clause. Drop it and you have gratis OA. Keep it and you have libre OA.
Here I want to focus on the second clause. Imagine a body of literature that is OA in every respect except that it's offline. It's still digital, free of charge, and allows unrestricted use, but it's on a thumb drive rather than a network. If you had that thumb drive in your pocket or plugged into your machine, you'd have free *offline* access rather than free *online* access to that literature. If OA literature must be online, then this isn't OA. But it's interesting enough to name and discuss in its own right. Let's call it OA Prime (OA').
Here are 10 fairly obvious ways in which OA' is inferior to OA proper.
1. An OA' corpus won't be as current as the OA corpus. You may have many or most of the articles you care to read. But if the collection on your thumb drive was updated last month, then it won't include literature posted last week, yesterday, or this morning. You might be able to update your collection at will. But with OA', the burden is on you, while with OA proper the burden is distributed among all the providers with something to provide.
2. OA' won't give you all the benefits of dynamic works like wikis, blogs, discussion forums, tag libraries, RSS feeds, articles with comment sections, and many OA textbooks. Either you won't know whether you're reading the latest version or you will know that you're not.
3. Search will be limited to what's in your offline collection. You won't be able to search other offline collections until you get your hands on them. Likewise, if you have many thumb drives, you might have to run many separate searches to find what you already have. OA' literature is less visible than OA literature, even to those who who already have copies of what they want to search.
We could solve this problem by indexing offline collections. We could share the offline indices as widely as possible. We could even put them online for online search. The hitch is that while you could find what you wanted, you couldn't immediately read what you found. OA' literature less retrievable than OA literature.
Notice that the second case creates the interesting twist of providing OA metadata for OA' literature. We'd treat offline digital objects --collections of literature-- the same way we now treat offline analog objects, like archaeological or biological specimens. They would need an online digital representation or metadata simulacrum in order to be made searchable by people who don't have local access.
4. To keep two or machines in synch, or two or more people, someone must physically carry a physical object from one to the other. We'd give up the considerable benefits of automation and the speed of light and return to effort and sneakernet.
5. Collaboration will be limited to people who share your offline collection. In an email conversation with a colleague about a new article, you couldn't just send a URL and assume that your colleague would be able to click through to the text. In that respect OA' would be like TA. You could attach the article to your email, but that would move at least one step from OA' back to OA itself. OA proper carries the significant benefit that all users with internet connections have real-time access to the same literature, but OA' does not. Hence, OA proper supports more effective alerts, sharing, discussion, and collaboration.
6. Variation: You could only mash up OA' resources if you could pull them together on the same physical device or machine. For example, OA articles in PubMed Central link to relevant parts of other OA databases hosted by NIH. If PMC and those databases were separate OA' collections, this kind of cross-referencing, interlinking, or mashing up would not be possible. For the same reason, the firewalls between OA' collections would inhibit creative thinking about mashups and the synergies of connecting what we already have.
7. If you need something on someone else's thumb drive, you must find the person and negotiate. The info may be OA' for that person. But even if the drive belongs to a friend, your friend may be out of town or busy. Then there are all those non-friends. Content that is OA' for people possessing copies may carry a price for everyone else, or by chance just for an unlucky subset including you. You won't even know until you hunt down a possessor, shake hands, and start talking.
(Digression: I see the point of building the Svalbard Global Seed Vault. I do. I'm glad it exists and I'm glad it's just 800 miles from the north pole. But in a post-apocalyptic world when my neighbors and I need seeds for crops that will grow in our latitude, how do we get to Svalbard? And would the vault give seeds to any road warrior who shows up?)
8. Most usage metrics will become obsolete. What we now call downloads will be mere "reads", and no online turnstile or panopitcon will be able to count them. Individual thumb drives could carry software to count local reads, but each tally would be a vast undercount of total usage. Even if the tallies could be aggregated, the uncertainty of aggregating them all (or the certainty of not aggregating them all) would make the totals inaccurate and nearly useless.
9. OA' literature could be modified in desirable or undesirable ways, more or less ad lib. Plagiarists could reprint your articles under their names, and you might never know it. Incompetents and mischief-makers could modify an essay of yours to add to subtract the word "not", and you might never know it. The mangled copies could circulate for indefinitely long times to indefinitely large populations without competition from correct copies. OA proper allows authors to post correct copies accessible to anyone making an effort to find them.
This disadvantage has a flip side that counts as an advantage, at least when it's lawful. You could reprint a PDF-only article in HTML, XML, EPUB, or any other useful format. You could make all texts usable by read-aloud software for the visually impaired. You could translate a text into another language. Users could enjoy the benefits of fair use and escape its intimidating vagueness. Currently this vagueness chills many legitimate exercises of fair use, and in practice makes fair use inseparable from fear of liability and pressure to err on the side of nonuse or pressure to seek permission. OA' could remove the chill.
The same minus and plus could be extended to collections. One OA' collection on biology could underplay or even deny evolution and go uncorrected for its possessors for arbitrarily long. Another OA' collection on biology could focus on the science, omit religion, and go unassailed by creationists for arbitrarily long.
10. You might leave your thumb drive at home when going to work. You might lose it completely. You might drop it in the garbage disposal. Your dog might eat it.
Nothing on this list should be surprising. Think of it as a subset of the reasons why we love the internet and why we've been working for OA proper all these years.
However, it only takes a moment to see that OA' has some strengths of its own. Some may even be surprising. Here are 10 advantages to match the 10 disadvantages above.
1. You won't always have stable or adequate connectivity. You may be in an undeveloped region of the world or an underdeveloped region of the developed world. Offline access can be your deliverance.
Since 2000, WiderNet and the eGranary Digital Library have been delivering OA' on CDs and other physical media to bandwidth-poor parts of the world where OA itself would be impractical or useless.
eGranary is far from obsolete or out of business. It recently delivered 2 TB of OA' literature and software to institutions in Zambia, and installed an OA' library in Liberia running on a 12 volt battery.
2. You won't always have connectivity at all. You may be traveling on a 20th century plane, train, or bus. You may be unable to find a wifi connection, free of charge, in a quiet place, right now. Your ISP tower may be taken down by an ice storm. Your power plant may be shorted out by a tsunami. You may have exceeded your three strikes and been exiled from cyberspace.
Survivalists are already putting important information on CDs for offline access in case a global catastrophe brings down the net.
You may think survivalists are kooks. (All right, some of them are kooks.) But this project could be called offline preservation rather than survivalism. (Are librarians kooks?) Offline preservationists are taking out an insurance policy that covers us all. And there's nothing kooky about planning for a power-plant explosion, oil shortage, brownout, virus, cyberwar, ice storm, or tsunami.
Even today, even in good weather, even on the connected side of the digital divide, not everybody who wants connectivity has connectivity. Think about the public libraries in affluent countries providing internet access to patrons who don't have it elsewhere.
Likewise, there's nothing kooky about planning for censorship from your government, ISP, or employer. (More in #5 below.)
3. You won't always want connectivity. Think about when you go offline precisely to be unavailable. Or when you want no distractions or interruptions but still want to read. Or when skynet wants to pass your coordinates to a cyborg.
4. OA' literature is more secure. Lots of copies keep stuff safe. Of course the LOCKSS advantage also applies online. But there are two points to make here. First, the LOCKSS advantage will reappear offline, since any useful digital file will beget copies. Second, lots of offline copies keep stuff safer than lots of online copies, even if it's harder to find a copy when you need one.
If an OA provider runs out of funding as the result of a recession (NCBI's Sequence Read Archive, Austria's thesis repository) or lobbying (PubScience certainly, PubChem nearly), the same fate needn't befall OA'. On the contrary, OA' is insurance against that kind of defunding. Defunded OA literature can circulate as OA' literature until it can be uploaded again as OA. Even then it can continue as OA' to protect against future budget crises, defunding campaigns, or interruptions of service.
5. Sneakernet may be slower and more cumbersome than online networks, but it allows users to read, copy, and redistribute digital files without leaving digital fingerprints. It allows anonymous inquiry. (Remember print libraries?)
There's a corollary that I won't call an advantage, but I'll list it here for those who would: If your thumb drive includes copyrighted works without permission, your use of them will be undetectable and your swaps with others will be undetectable. Your offline P2P network will be slower than its online counterparts but less visible to tracking and less subject to takedown. If OA' takes the chill off fair use, it also takes the chill off outright infringement. Users would always have libre OA (that is, libre OA'), if not in law then in practice.
A related corollary counts as a compelling advantage: Swapping thumb drives of OA' literature bypasses censors and surveillance in oppressive countries.
6. OA' circulates less widely than OA proper, but in some circumstances this is an advantage. Authors wouldn't deliberately limit the circulation of their work this way unless they were targeting an underground audience. But some authors target an underground audience. New work that goes straight to OA', like movies going straight to video, would bypasses network-wide access (OA or TA) and be visible only to the intersecting circles of readers who swap physical devices. If Wikileaks circulated this way, it could reach the major newspapers and other media megaphones before it reached authorities who might want to interfere. (Wikileaks had its own methods for accomplishing the same end.) When I was in school, there was a movement of underground poetry that used the print equivalent of this deliberately limited circulation, even when wider forms of circulation were available. For authors with the standard hope of wide circuluation, this aspect of OA' needn't be a disadvantage; they only have to make their new work OA proper at roughly the same time they make it OA'.
7. Every new increment of OA literature can become a new increment of OA' literature. All we need is the connect-time and physical memory for a download. All our work for OA does double duty as work for OA'. The cost of shifting from OA to OA' will always be small compared to the cost of OA itself. Even at scale when the download costs are non-trivial (more below), we can have OA and OA' for close to the price of OA alone. If we're already committed to delivering OA, then OA' is almost a freebie we can put to use at will.
8. OA' literature can be certified free of viruses and malware. It isn't safe by default, but it can be made safe and kept safe. Once clean, OA' literature, data, and software can be kept in virtual quarantine, and read only by machines which themselves never communicate online. It can be both a cause and effect of safe computing.
Similarly, OA' can be free of DRM. It isn't free of DRM by default, but once freed it can be kept free.
9. A study at Johns Hopkins Bayview Medical Center found that medical residents with a thumb drive of links to "landmark scientific articles related to the subspecialties of internal medicine" read more of the articles than residents without the thumb drive. "90% agreed or strongly agreed that the USB syllabus stimulated them to read more primary literature. When asked whether the USB syllabus helped them to take better care of patients, 88% agreed or strongly agreed....Self-reported original articles read by housestaff increased from 3.4 per month at baseline to 4.5 per month by the end of the nine-month study period....45% increased their self-reported reading of original medical papers by more than three articles per month...." For copyright reasons, the investigators couldn't pre-load the thumb drives with the full-text articles. But once connected, users could download full-texts and read them offline.
If offline links can improve reading and research when users must first connect, click through, and download, then it seems that offline full-texts would trigger at least the same level of improvement.
10. Text and data-mining are faster on OA' files than OA files. Your processor and software have faster access to offline files than online files. This is why we already convert OA files to OA' files for large-scale processing even when we have access to fast networks.
For years BioMed Central has allowed users to download its entire corpus of peer-reviewed OA articles for offline use, such as text-mining. It knows that OA is not enough. For serious text-mining of a large corpus, users need file-access speeds only possible offline.
The downloadable BMC corpus not only acknowledges that OA' is superior to OA for intensive user processing, but that OA easily converts to OA'. I put this OA' advantage right up there with evading surveillance and censorship .
Because we can have OA' essentially at will, once we have OA, we needn't weigh up their strengths and weaknesses as if we had to choose just one form of free access. There are already serious, research-driven OA' projects, from eGranary to the BMC downloadable corpus. It's easy to predict that these projects will only grow, and will appeal to users in every niche where the strengths of OA' outweigh the weaknesses.
OA' isn't for everyone. On the contrary, for most people most of the time, OA will be far more useful. But when we need OA', it won't take much more effort than we've already given to OA, and we won't have to choose between them until we have an end use in mind.
Systematic OA' might have no business model and depend entirely on motivated volunteerism. Or a given project might be kickstarted with a grant and then spread virally. Or it might generate revenue with an efficient update service. It might supplement the update service with a Netflix-style delivery and swapping service. The update and delivery service could be Netflix-style in old postal form or in the new online streaming form.
Something like this is inevitable, driven in part by the advantages of OA' and in part by the steadily shrinking size and cost of memory. You can already carry your favorite 10,000 songs in your pocket. Before long you'll be able to carry your field's canonical literature in your pocket, or the canon plus the additions suggested by your 10,000 closest friends.
Consider BioMed Central's downloadable corpus again. When I checked last week, it consisted of 90,912 articles and the downloadable zip file was 1.7 GB.
Let's play with some numbers. By a common industry estimate, peer-reviewed scholarly journals publish 1.5 million new articles every year. If we assume that the average article is the same size as the average BMC article, then a zipped version of those 1.5 million articles would only require 28 GB. Amazon sells 32 GB thumb drives for about $50.
(BTW, the survivalists I mentioned above originally needed 4 CDs for their 13 GB of data. Today they could put it all on one smallish thumb drive.)
Amazon also sells 128 GB thumb drives for about $250. One of those would hold 1,499,994 zipped research articles. Let's round that off to 1.5 million. That's the total annual world output of peer-reviewed research literature in your pocket, and offline, for the cost of 3-5 hardback books.
Last month Sony released a 1 TB memory card "about the size of an iPhone". It's designed for video cameras, but could be your personal mirror of all the literature you might ever want to read or mine.
The new Sony card would hold more than 35 years' worth of the whole planet's journal literature. Actually it would hold much more, since the annual volume from years past is smaller than the annual volume today. But let's be conservative and ignore that.
If we assume that the literature in your field is only 5% of the total, a high estimate even for polymaths, and if you could download just the literature in your field, then you'd only need 1.4 GB to hold one year's worth. A 1 TB memory card could hold more than 700 years' worth of that literature.
If you don't know what to do with all that extra memory, you could unzip the files to speed up processing. Or you could supplement the journal literature with book literature. Or you could throw in your favorite 10,000 songs so that you don't have to carry two memory hoards.
And how long will it before you can put a 10 TB or 100 TB of solid-state memory in your pocket? Or as Geoffrey Nunberg put it last year, how long before you can fit everything ever written into your eyeglass frames?
You get the idea. Even today we have the technology, if not the permissions, for you to carry your field's research literature in your pocket. If you want multidisciplinary search and access, you could also carry all the literature from the neighboring half-dozen fields. With periodic updates you could top it off with recent articles, comments, and revisions.
If you're a researcher and could have this for essentially the cost of the thumb drive, would you want one? If you're library, would you want a set for lending and a set for the vault?
Is it starting to sound useful yet? Of course I'm neglecting the step of actually downloading all that literature when most of it is not OA. We don't have to imagine that the value of OA' depends on violating copyrights. We only have to recognize that the value of OA' rises as more literature becomes libre OA and free for downloading and reuse without infringement.
Properly conceived, an OA' collection isn't a pirated version of TA literature, but an offline version of OA literature. In that sense, permissions are not a problem. Instead, the problem shifts to the actual downloading. If you want just a field-specific selection, then we should add the selection problem to the download problem. These problems are non-trivial but solvable.
Consider the analogous problem of providing OA to work in the public domain. Permissions are not a problem there either. But for work that is in print and offline, the job of digitizing, uploading, and hosting is non-trivial. Around the world we're pouring enormous energy and money into those jobs.
Selecting and downloading OA literature into useful OA' collections is a smaller problem. It's solvable and worth solving. We could solve it in small versions today, at will, each in our own way, and work up to larger and larger solutions collaboratively. When will that start?
* Postscript. I started thinking these thoughts when two streams of rumination converged. I worry about reliable connectivity, and I marvel that the 8 GB memory card in my phone is about as big as my fingernail. I live in rural village of 900 people on the coast of Maine. Like most people who live here, I consider its beauty and undeveloped character part of its deep appeal, even if I'm often reminded of the trade-offs. Compared to the more developed world, there may be no noise, pollution, crime, or billboards. But broadband is scarce, and even stable dialup connections are scarce. We're within driving distance of every modern convenience, but driving distance from connectivity isn't really connectivity. Service isn't missing here, just uneven. Some neighbors have good broadband. It all depends on what hilltop you can see from your roof, or how close you live to a cable laid to serve larger populations elsewhere. The best connection I can buy is slow every day and down in bad weather. I spend about a week every month traveling to more developed parts of the world where connectivity is fast and trouble-free. If you live in one of those areas, you may not worry about connectivity, let alone apocalypse or skynet. (You may even think yourselves lucky, but you probably don't get to watch a blue heron fishing on the shoreline of a mirror-smooth estuary while you drink your morning coffee.) However, the surprising strengths of OA' don't appeal only to the disconnected. You may worry about surveillance, censorship, or text-mining.
For the work I do, and from the place where I do it, occasional and unpredictable lack of connectivity is a bigger problem than occasional and unpredictable lack of OA to the pieces I want to read. If a service would deliver me an updated, intelligently compiled, multi-GB thumbdrive every morning, like a print newspaper, with even 50% of the resources I'm likely to need that day, and if I could afford it, I'd pay for it. Or if I could make a one-time purchase of a thumb drive with all the literature in my field up to, say, 2010, with or without an update service, and if I could afford it, I'd pay for it.