This section explores some of the key areas for data collection and refinement, and suggests possible next steps.
Levels of federally funded R&D activity and publication counts
While there are good data on the levels and types of federal R&D spending, we have not yet been able to fully match these to outputs (i.e. the number of articles produced from federally funded R&D by field and by funding agency), nor have we been able to explore the rate of growth of article outputs by field and by funding agency.
Further work in this area might involve conducting a more thorough review of existing data sources and, where necessary, undertaking targeted consultation with funding agencies to establish more informed estimates. The key issue is to obtain R&D expenditure and article output data relating to the specific agencies affected by the proposed FRPAA open archiving mandate.
Archiving costs and practices
Archiving costs are a key input. Unfortunately, relatively little is known about archiving and preservation costs, and what is known suggests that they vary greatly from case to case (e.g. centralized versus institutional archives, by field of research, etc.). Our initial estimates are based on the most recent and detailed studies available (Ayris et al. 2008, NIH 2008 and arXiv 2010). We have also included a cost for author deposit, based on NIH reporting on NIHMS experiences (NIH 2008) and recent UK studies (Houghton and Oppenheim et al. 2009; Swan 2010a). These costs will also vary with archiving practices (e.g. author deposit versus automated publisher submission processes).
Further efforts in this area might involve working towards better quantifying the costs of offering persistent access to US federally funded research outputs in open archives, building on existing information on archiving costs by conducting a more thorough review of published sources on archiving costs around the world, with a special focus on reported costs in the US, and consultation with a representative sample of archive operators and managers in the US, in order to refine preliminary estimates to ensure that the final estimates are representative of the potential mix of archives that might be required.
Accessibility and efficiency metrics
The potential increases in accessibility and efficiency resulting from an open archiving mandate, such as that proposed by the FRPAA, are also key parameters. Our initial estimates are based on studies reporting access gaps and possible open access citation and download advantages, and to be conservative we use the lower bound impacts reported. While these provide some foundation for preliminary estimates, further data collection and the development of additional metrics are required before more robust estimates can be made.
In relation to accessibility, further work might involve: (i) undertaking more focused surveys of research users in the US and elsewhere to better establish the extent and significance of the access difficulties and gaps they face in accessing journal articles of the type emerging from federally funded R&D; and (ii) undertaking a fuller review of studies of the possible citation and download advantages resulting from the open online accessibility of research articles, with a focus on what proportion might be a sustainable advantage.
In relation to efficiency, further work might focus on identifying examples of efficiency gains through consultation with experts (e.g. rejection of journal and conference papers or funding applications because the work is duplicative, case study examples of the pursuit of blind alleys due to incomplete information, known examples of unnecessarily duplicative research, etc.).
As a part of the consultation with archive operators and managers in the US, it would also be worthwhile asking if they can identify particular cases where the use of openly available research articles and/or data has had an impact and, where possible, follow up on the examples to ascertain the extent of the impacts.
While base case values have been sourced from the literature and are well grounded, further work on developing and refining the model might focus on the underlying evidence base for key parameters, including:
System cost impacts
As well as direct cost impacts, enhanced accessibility is likely to have indirect, intended and unintended impacts on the cost of research and scholarly communication activities. These might include such things as increased research library costs as librarians are faced with demands to help users with an additional information channel, and declining publisher revenues if archiving were to lead to subscription cancellations. Hence, in addition to the analysis outlined above, it would be desirable to explore both the direct and indirect, system-wide cost impacts of open archiving.
This might involve building on the activity modeling and costing approach used by Houghton and Oppenheim et al. (2009) and in subsequent studies, who looked at the system-wide cost implications of alternative scholarly publishing models. Such an approach requires detailed research activity data, research library and funder information, and while there are a number of existing sources for such information it is likely that some additional data collection would be required in the pursuit of such an approach.
In each of these cases, to further prioritize efforts the reader should consider the relative strengths and weakness of the sources used in the preliminary analysis presented herein.