In in vivo research, the reporting of core items of study design is persistently poor, limiting assessment of study quality and study reproducibility. This observational cohort study evaluated reporting levels in the veterinary literature across a range of species, journals and research fields. Four items (randomisation, sample size estimation, blinding and data exclusion) were assessed as well as availability of study data in publicly accessible repositories. From five general and five subject-specific journals, 120 consecutively published papers (12 per journal) describing in vivo experimental studies were selected. Item reporting was scored using a published scale (items ranked as fully, partially or not reported) according to completeness of reporting. Papers in subject-specific journals had higher median reporting levels (50.0 per cent vs 33.3 per cent, P=0.007). In subject-specific journals, randomisation (75.0 per cent vs 41.7 per cent, P=0.0002) and sample size estimation (35.0 per cent vs 16.7 per cent, P=0.025) reporting was approximately double that of general journals. Blinding (general 48.3 per cent, subject-specific 50.0 per cent, P=0.86) and data exclusion (general 53.3 per cent, subject-specific 63.3 per cent, P=0.27) were similarly reported. A single paper made study data readily accessible. Incomplete reporting remains prevalent in the veterinary literature irrespective of journal type, research subject or species. This impedes evaluation of study quality and reproducibility, raising concerns regarding wasted financial and animal resources.
- journal impact factor
- reporting guidelines
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
A key component of high-quality studies is complete and transparent reporting.1 Limited reporting impedes interpretation of studies and experimental reproducibility.2–4 Disturbingly, the limited reporting of items associated with a risk of bias in study design has been associated with inflated effect sizes, reflecting an association between reporting and study quality. This has contributed to failures of translational research, unnecessary animal use and financial waste.2–8
Bias in research is broadly defined as a systematic error in results or inferences.9 Careful study design attempts to minimise the introduction of factors leading to bias and transparent reporting allows evaluation of such factors.8 Common sources of bias are failure to randomise, lack of blinding and undisclosed or unexplained exclusion of data from analysis.
Proper randomisation provides internal validity and limits selection bias, while blinding prevents detection and performance biases with investigators/care givers potentially influencing observations.10 11 Data handling (decisions surrounding data inclusions and exclusions) shapes the analysis, results and conclusions of a study, which supports external validity; the generalisability of findings to other populations.11 Therefore, the rationale for including or excluding subjects or data should be explicitly described. Sample size estimation is a critical component of study design. Smaller studies are less precise due to greater sampling variation and a sample size that is too small to identify an important treatment effect is more likely to lead to a false negative result.9 11 12
Despite the acceptance of these items as indicators of study quality, their reporting is poor in both laboratory animal and veterinary studies.7 13–17 The introduction and widespread endorsement of reporting guidelines has yielded limited improvements in reporting quality and the risk of bias remains high.13 14 16–20 It has been proposed that focusing on a universal set of core reporting standards could increase adoption by users and facilitate study evaluation.1 Such an approach, in conjunction with an editorial policy of enforced adherence to reporting standards, may improve reporting standards.21
Furthermore, focusing on the core items of randomisation, blinding, data exclusions and sample size estimation facilitates comparisons between studies employing different species across research domains.
The primary objectives of this study were to: (1) examine a current cross-section of the veterinary literature regardless of species or field of research and with a focus on key items reflective of the potential for bias (randomisation, blinding, data exclusions) and completeness of reporting (sample size estimation) and (2) to compare these reporting levels between general and subject-specific journals. Secondary objectives were to evaluate the accessibility of study data and explore the relationship between journal impact factor and item reporting. An observational cohort study was designed to test the hypotheses that items associated with completeness of reporting and risk of bias would be poorly reported overall (<50.0 per cent)22 and that there would be no significant difference between papers published in general and subject-specific journals.
Materials and methods
Literature search methods
An a priori sample size estimate was calculated using commercial software (Sergeant, ESG, 2018, Epitools epidemiological calculators, Ausvet, available at: http://epitools.ausvet.com.au). The calculation was performed for the comparison between journal types, based on detecting a difference between two proportions (difference of 25 percentage points between journal types). Alpha level was set at 0.05 with 80 per cent power using a two-tailed test. Sample size was set at 60 papers per journal type to allow for a whole number of papers to be selected from the predefined list of journals.
One hundred and twenty papers were selected from 10 veterinary journals: 5 general and 5 subject-specific journals. Journals of interest were selected from the Veterinary Sciences category of the Journal Citation Reports (2017 Journal Citation Reports (Clarivate Analytics, 2017), accessed October 3, 2017), with journals selected semi-objectively, taking into account impact factor and citation counts (preference for journals with higher values of each) and publication of clinical trials. The five general journals were: Equine Veterinary Journal, The Veterinary Journal, Preventive Veterinary Medicine, BMC Veterinary Research and Veterinary Record. The five subject-specific journals were: Journal of Veterinary Internal Medicine, Veterinary Surgery, Veterinary Anaesthesia and Analgesia, Veterinary Dermatology and Journal of Veterinary Emergency and Critical Care.
From each journal, 12 papers were selected, beginning the search with the most recently available (ie, chronological rather than randomised), including those published online as early access (accessed October 5, 2017). MR screened the titles and the abstracts according to predetermined inclusion/exclusion criteria. Full texts were retrieved if there was uncertainty about fulfilment of these criteria. The search continued until 12 qualifying papers were identified for each journal.
Inclusion and exclusion criteria
Papers were included if they were in English and described an in vivo experimental design (parallel or cross-over) with a comparison group (from either client-owned or research animals with natural or induced diseases). Reviews, descriptive/observational studies, case reports or series and in vitro experiments were excluded from analysis. No restrictions were applied to the field of research. As this study was based on published literature, ethical approval was not sought.
Papers were evaluated using a published operationalised checklist, applied to assess the items of interest (randomisation, blinding, sample size estimation and data exclusions).23 The published checklist was applied, with minor adaptations to reflect the application to clinical or experimental animals (the original checklist was designed for biomedical (laboratory animal) in vivo and in vitro research, table 1). Randomisation, blinding and sample size estimation items were categorised as either fully, partially or not reported. The data exclusion item was evaluated as three subitems (table 1). Papers were not assessed for methodological quality, that is, assessment was limited to evaluating completeness of reporting. For example, blinding would be classified as fully reported if there was a statement that blinding was not possible. In addition to the four core items, the availability of study data was evaluated (‘data deposition’ item, table 1).
A cohort of 15 of the selected papers was initially evaluated independently by two raters (MR and FRB) using the operationalised checklist. Their evaluations were compared in a group meeting with a third investigator (DP) who was blinded to paper authors and journal, and differences resolved by consensus. All remaining papers (full text, including any supplemental material) were then assessed by both raters independently. Raters were not blinded to paper authors or journal. Following review, any differences were resolved by consensus discussion with the third investigator (DP).
Data were tested for normality (D’Agostino-Pearson test) and appropriate parametric or non-parametric analyses applied. To create a picture of overall reporting, reporting of the four items was considered together (for each paper, a proportion was calculated from the number of items reported out of all possible items) and described (median percentage) for all papers combined. Overall reporting levels between journal types (general vs subject-specific) were compared with a Mann-Whitney U test (data not normally distributed). Individual items and subitems were compared between journal types with a z-test. Due to low reporting prevalence, a chi-squared test was used to assess ‘data deposition’ and the subitem ‘pre-establishing exclusion criteria’.
Reporting of individual journals was limited to descriptive statistics as the planned sample size (12 papers/journal) was insufficient to make statistical comparisons. The relationship between journal impact factor and item reporting was evaluated with a Pearson’s correlation coefficient. Statistical software was used for analyses (GraphPad Prism V.6.00 for Windows, GraphPad Software, San Diego, California, USA and SAS V.9.3, SAS Institute, Cary, North Carolina, USA). Values of P<0.05 were considered to be statistically significant and 95 per cent CIs are presented for differences between journal types. The data supporting the results are available in the Harvard Dataverse: https://doi.org/10.7910/DVN/O1XFGR
Overall reporting levels were low (all papers combined, n=120 papers), with a median of 50.0 per cent of items fully reported and 18.4 per cent of items partially reported (figure 1A and B). This reflected the low levels of fully-reported individual items and subitems, which ranged from 4.9 per cent to 68.3 per cent. Levels of partial reporting were also low, less than 40.0 per cent for all items and subitems (figure 1).
Fourteen papers did not fully report any of the four items, but all papers partially reported at least one item. A single paper fully reported all items. The highest and lowest median values of fully-reported items were in the Journal of Veterinary Internal Medicine (75.0 per cent) and BMC Veterinary Research (26.7 per cent), respectively (figure 2).
For all items combined, full reporting occurred more often in papers published in subject-specific journals (50.0 per cent) than those published in general journals (33.3 per cent, P=0.007).
Consequently, partial reporting levels were greater for general journals (20.0 per cent; subject-specific: 16.7 per cent, P=0.048).
In comparing the reporting of individual items between subject-specific and general journals, randomisation and sample size estimation were fully reported approximately twice as often in subject-specific journals. In contrast, blinding was fully reported to a similar degree for both journal types (table 2). Partial reporting levels were similar between journal types for sample size estimation and blinding. Randomisation was reported approximately twice as often in general journals (table 2).
Reporting standards were broadly similar in both journal types for data exclusion subitems, with full reporting in approximately half to three-quarters of papers for the subitems ‘exclusion of samples or animals from the analysis’ and ‘defining exclusion criteria’ (table 2). The subitem ‘pre-establishing exclusion criteria’ was reported in fewer than five papers (table 2).
Data deposition was low, with 0.83 per cent (95 per cent CI 0.02 per cent to 4.6 per cent) of all papers meeting the criteria for full reporting and 9.2 per cent (95 per cent CI 4.7 per cent to 15.8 per cent) meeting the criteria for partial reporting. Examining journal types revealed that no papers in subject-specific journal (0/60) and one paper in a general journal (1/60) fully-reported data deposition (P=0.99, 95 per cent CI −1.6 per cent to 4.9 per cent). Partial reporting occurred more frequently in general journals (11/60 papers, subject-specific: 0/60 papers, P=0.0006, 95 per cent CI 8.5 per cent to 28.1 per cent).
No significant correlation was identified between journal impact factor and the percentage of fully-reported items (r=0.057, r2=0.003, P=0.54, figure 2).
This study showed that the frequency of full reporting of items that reflect a risk of bias and reporting completeness borders between poor and moderate, using a proposed threshold of 50.0 per cent.22 Unexpectedly, statistically significant differences in reporting were identified between general and subject-specific journals. Considering the importance of complete reporting, these observed differences should not be overinterpreted in the face of suboptimal reporting levels.
Complete reporting of randomisation, including a description of allocation method, was similar to the level reported in a recent study focusing on reporting of randomisation in veterinary clinical trials, in which approximately half of trials identifying themselves as randomised did not report the method of randomisation.10 These findings suggest an improvement from earlier studies of veterinary clinical trials (published between 1989–199014 and 2006–200816 17) in dogs, cats and livestock, in which only 12 per cent–20 per cent of trials in which randomisation was applied reported the allocation method used. Of concern, where purported randomisation methods are described, 13 per cent of trials (8/62) used methods that are non-random.10 This is similar to the proportion of non-random randomisation methods reported 10 years earlier, suggesting that many veterinary researchers remain unfamiliar with core concepts of study design, emphasising the importance of explicitly stating randomisation methods.24 The rates observed here were approximately six times higher than those observed in in vivo biomedical studies, in which the same assessment scale was applied.21
Full reporting of sample size estimation has improved compared with the low rates (0 per cent-5 per cent) observed in reports from the veterinary literature published during the previous three decades14 16 17; however, the results presented here, similar to those of Giuffrida, highlight that much remains to be done.25 Where study results are negative, the absence of any discussion of sample size estimation prevents interpretation of the findings, greatly limiting the value of such studies alongside the potential waste of resources (financial and animal).12 25
The reporting of blinding in this study approximated the upper end of the range reported previously for the veterinary small animal and livestock clinical trial literature (25.0 per cent–60.0 per cent).14 16 17 Again, this reflects limited improvement despite the repeated demonstration of poor reporting quality.
For data exclusions, the rates observed here were in line with those previously reported, although reporting rates vary, perhaps reflecting differences in study methodology and population sampled.14 16 17 Reporting of blinding and data exclusions from analysis were approximately double those reported by Macleod.21 In comparison with the study by Macleod, the higher rates of reporting observed here for the four items suggests a systematic difference in reporting behaviour between veterinary clinical trials and in vivo biomedical research.21 The reasons for this are unclear but could reflect a pressure to focus on novel findings in biomedical research, with an emphasis on the substantial volume of data often presented to the detriment of space devoted to reporting methods, or a reluctance to provide detailed supplementary materials.
The obvious consequences of incomplete reporting are to limit reproducibility in research and impede critical evaluation of published work. Additionally, and of particular concern, is evidence that incomplete reporting of items with a risk of bias is associated with inflated effect sizes.3 5 8 26–28 That is, the failure to report an item associated with a risk of bias can be an indication of a deficit in study design and conduct. Evidence for inflated effect sizes is limited in the veterinary literature, although an association between non-reporting of items associated with a risk of bias and an increase in positive results has been reported.16 17 28 This raises important questions regarding the ethical use of animals in research and fiscal responsibility.
Numerous reporting guidelines have been developed to address reporting deficits and they have received widespread support from biomedical and veterinary journals.29
For example, the ARRIVE (Animals in Research: Reporting In Vivo Experiments) and CONSORT (Consolidated Standards of Reporting Trials) guidelines apply to many veterinary studies and the REFLECT (Reporting Guidelines for Randomized Controlled Trials for Livestock and Food Safety) and STROBE-Vet (Strengthening the Reporting of Observational Studies in Epidemiology-Veterinary) guidelines are specific to veterinary medicine.30–33 Unfortunately, despite the number of guidelines available adherence to reporting guidelines is low, indicating that journal support or endorsement, without some mechanism to enforce adherence, is insufficient.13 15 34–37 To the authors’ knowledge, introduction of a mandatory reporting checklist is the only approach that has been shown to improve reporting quality.20 38
Data accessibility reflects transparency in research, supports verification of results and analysis, facilitates systematic reviews and meta-analyses and is increasingly requested by biomedical journals as a condition of publication. There may be instances when data access should be limited (risk of revealing personal or security-related information or data with commercial value), but these limitations seldom apply to publicly funded research.39 Based on author guidelines (accessed April 23, 2018), four of the journals studied (BMC Veterinary Research, Veterinary Journal, Preventive Veterinary Medicine and Veterinary Record) encouraged data access through the use of repositories, although it was not mandatory. Furthermore, while data repositories can be easily found online, veterinary journals could do more to suggest repositories that meet their data policy requirements, including being recognised and trusted by the scientific community. Approximately 18.0 per cent of papers (all published in BMC Veterinary Research) included a statement that data were available on contacting the author; however, author compliance to requests for data access may be low.40
Journal impact factor is calculated from the ratio between citations received (to articles published in the preceding two years) to the total number of articles published over the same period. It is said to reflect the mean number of citations received by a paper published in that journal, but this misrepresents the skewed citation distribution observed in journals, leading to the common misconception that journal impact factor reflects the quality of individual papers, a case of judging a book’s contents by its cover.41–48 The small sample of papers from each journal limits interpretation, although there was no discernible correlation between reporting quality and journal impact factor and this is consistent with the findings of larger studies.41 44 Interestingly, the only paper that fully reported all items was published in the journal with the lowest journal impact factor.
The four items evaluated in this study represent a minimal requirement, suggested as universally applicable to experimental studies and allowing for a rapid assessment of the risk of bias and error. Focusing on these items should not be viewed as detracting from the use of more complete guidelines. The raters were not blinded to author name(s) and institution(s) when evaluating papers but it is unlikely that knowledge of the identity of the authors and institutions had an impact on evaluations as papers were reviewed independently using the checklist and both raters are trainees in the early stages of their careers, with limited knowledge of the authors and institutions represented.
In studies where treatment effects are markedly different, efforts to blind observers could be limited. Maintaining blinding during data analysis could offset resulting bias, particularly if the person performing the analysis was not involved in data collection. This was not assessed in this study as the authors adhered to the published checklist used.
Two papers of the 120 selected (1.7 per cent) were made available online as uncorrected proofs following acceptance. It is possible that further changes, influencing the assessment of reporting, were made to these papers before final publication, although it is highly unlikely that they would affect interpretation of the findings. The small sample of papers from each journal and the narrow impact factor range across veterinary medicine may have limited identifying a link between journal impact factor and reporting quality; however, it does not appear that journal impact factor has an important influence on reporting quality.39
The quality of reporting across veterinary medicine remains low, with limited improvements in reporting standards over the last three decades. Within the context of this well-established problem and considering the ready availability of reporting guidelines, the observed differences in reporting between general and subject-specific journals are inconsequential. These findings are concerning as they reveal a considerable lack of transparency in study reporting, the consequences of which are to limit evaluation of published work and attempts to reproduce results. These results should not be interpreted as a comment on the quality of the studies evaluated; however, a potential link between poor reporting and inflated effect sizes deserves further study considering the implications (financial and ethical) of animal research.
The authors would like to thank Dr Guy Beauchamp for statistical support and Vivian Leung for graphical design (Faculty of Veterinary Medicine, Université de Montréal).
Funding Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (ID: 424022-2013, awarded to DSJP). MR receives a stipend from the Fondation J-Louis Lévesque.
Disclaimer The funders had no role in study design, data collection and analysis or decision to publish.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Data are freely available via link at end of methods section.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.