A small first simulation study showed that there was the same trend for comparative ratios. The first simulation study used a large matrix called “5 x 10”). In this context, a high interrat√© agreement, based on the sum of the PARD approaches, corresponded to a fair agreement in accordance with the other measures. The second simulation study was designed to study the number of dies required to obtain a valid result. It can be seen that in the “3 x 5,” “3 x 6” and “3 x 7” matrices scenario, the estimates were quite robust, particularly in the case where the logistic growth curve model would be used, even for the smallest sample sizes. Future research must answer the question of whether this is due to the fact that the PARD approach with simulated dies systematically overestimates the internship agreement or because other measures systematically underestimate the Interrater agreement in the ranking. This method can, for example, be used to identify assessors, classify candidates differently in assessment centres or wherever objects need to be assessed and an agreement needs to be reviewed. We calculated the four measurements described above to assess the pair match between the three ranking metrics in the frequency configuration, and summarize them for each pair of ranking metrics and each chord identifier using the median and first and third quartile. The hierarchy has been compared to that of its frequentist counterpart to check how many times they disagree. Network meta-analysis (NMA) is increasingly being used by policy makers and clinicians to answer one of the key questions in medical decision-making: “Which treatment works best for the current state?” 1 2 The relative treatment effects estimated in NMA can be used to establish grading metrics: which measure the performance of an intervention on the results studied and thus generate a hierarchy of treatment of the most preferred option compared to the least preferred option.3 4 According to our knowledge, this is the first empirical study that assesses the degree of correspondence between the classification metric processing hierarchies in NMA and gives more information on the characteristics of different methods.

In this context, it is important to note that neither the objective nor the results of this empirical evaluation imply that a hierarchy for a certain metric works better or is more accurate than another ranking metric. The reason this type of comparison cannot be made is that each ranking metric responds to a particular problem in the processing hierarchy.