29/11/2022 - Last update 20/04/2023

Critical evaluation of the search results in the literature

[reading time: 5 minutes]

The capacity to understand the reliability, the accuracy and the quality of a scientific study is an issue of great importance, with no ready-made solutions, which, to be resolved, requires multiple indications and strategies to be applied case by case together with common sense.

Many professional figures might need to carry out a search in the literature: the physician who wants to gather more in-depth information on the therapies applied in a specific pathology, the researchers engaged in the writing of a systematic review, a patient who wishes to be informed, etc.

There are many criteria to facilitate the critical evaluation of scientific studies. The most common and generic are dictated by common sense, for example, it is appropriate to ensure that the study is indexed on the search engines and written by known authors also mentioned in the literature.

Other criteria allow the creation of a hierarchy of the studies, separating them between primary and secondary literature and classifying them according to their methodological rigor. One of the most diffused conceptual schemes is the pyramid of evidence.

Amongst the other elements to consider for a critical evaluation of the evidence gathered from the literature there are not only the criteria of internal validity, that are, in a nutshell, those regarding the rigor of the method applied, but also other aspects, which have to be borne in mind. For example, clinical relevance, which estimates the entity and precision of the benefit obtained (not described further here), the applicability or generalizability, that is the measures within which the results can be applied to the single patient, and the external validity, which concerns the consistency and reproducibility of the research, that is if its results can be confirmed by other studies.

Internal validity

To facilitate a systematic process of critical evaluation of the studies and to avoid any risk of bias, that are the confounding factors or distortion factors, numerous tools have been created, often based on checklists or open questions. Each tool contributes to the solution of specific aspects of the issue and none of them can provide a universal key. Here are mentioned some of the most known:

  • AGREE II, of which there is an Italian translation, developed to evaluate the methodological rigor and the transparency of the Guidelines.
  • CASP checklists, which are checklists elaborated starting from 1993 in Oxford, diversified by type of study (RCT, systematic reviews, qualitative studies, cohort studies, diagnostic studies, case-control studies and other studies). For example, the CASP checklist for RCT consists of 11 questions to which the answers can be “yes”, “no” or “can’t tell”. Section A is about the validity of the study design, section B is about the methodology, section C concerns the results and section D the applicability at the local level.
  • Assessing risk of bias in a randomized trial, is a tool provided by the Cochrane Collaboration, the renowned international organization founded in 1993 with the aim to gather and summarize accurate and updated scientific evidence on the effect of the health intervention. This tool is designed for those who write systematic reviews and recommends to carry out the evaluation of the risk of bias for each of the studies included in the review to be analyzed.
    By applying these criteria a “scoring” can be assigned diversified into three levels, that are “low risk of bias”, “some concerns” and “high risk of bias”. Amongst the various domains to be taken under consideration we find the bias arisen by the randomization process, bias due to deviations from intended interventions, to the missing outcomes, to the measurement of the outcomes, selection of the reported results and the reporting.
  • GRADE, is a working group established in the year 2000 in order to reduce the confusion coming from the multiple existing systems and to give a transparent score to the quality of evidence and recommendations. Thanks to its handbook it allows the user to assign four levels of quality (high, moderate, low, very low) on the basis of different factors.

The quality of the proof is calculated for each single outcome and allows the user to define “to what extent one can trust that the estimate of a benefit/year can be used in favor/against recommending the use of an intervention”.

The quality judgment according to the four categories mentioned above can be upgraded and downgraded by one or more levels depending on the presence of limitations, uncertainty, or impressions in the various categories.

With the GRADE method it is possible to reach an evaluation of the global quality of the proofs on the basis of the outcomes defined as essential, balancing the risks and benefits coming from the intervention.

The quality review may lead to an estimation of the strength of a recommendation, which may have four values: positive or negative recommendation, strong or weak respectively.The weak recommendation expresses the presence of uncertainty in the risk/benefit ratio, so the patient’s condition should be carefully considered.


Last, but not least, it is useful to consider the concept of efficacy and effectiveness, that is the question of the transferability of theoretical results in the clinical practical domain. Often the studies have to respect rigorous methodological criteria difficult to reproduce in the practice and therefore have negative repercussions on the applicability of the results to the general clinical context.

For example, if the population has been selected on the basis of very stringent criteria of eligibility, this will reduce the transferability to patients with comorbidities, who take other drugs, are elderly or belong to some minorities.

Amongst the numerous tools that consent to evaluate these aspect the following are reminded:

  • RE-AIM, aimed at encouraging greater attention to some essential aspects of the programmes, such as external validity, and also used to transfer the results of the research into practice and to promote the application of the programmes in the context of the “real world”.
  • PRECIS-2, which evaluates (with a score from one to five) nine single aspects of a study (eligibility, recruitment, setting, organization, flexibility of adherence, follow-up, primary outcome and primary analysis) and graphically visualizes the pragmatics of the study on a wheel.

For a more exhaustive narration of the topics mentioned here please refer to the volume edited by Cerritelli and Lanaro1.


  1. Cerritelli F, Lanaro D. Elementi di ricerca in osteopatia e terapie manuali. Napoli: Edises, 2018.





Find out more

Are you an osteopath?

Register and enjoy the membership benefits. Create your public profile and publish your studies. It's free!

Register now

School or training institution?

Register and enjoy the membership benefits. Create your public profile and publish your studies. It's free!

Register now

Do you want to become an osteopath? Are you a student?

Register and enjoy the membership benefits. Create your public profile and publish your studies. It's free!

Register now