ISMRM 2015 Tractography challenge - Evaluation

Background information

The original goal of the Challenge was to evaluate all submissions using the exact same metrics and techniques as in the original Tractometer system and paper.

However, during the initial evaluation phase, it became evident that the classical techniques were too restrictive when used with datasets that really simulate real-world conditions. For example, the classical approach uses masks of the endpoints of bundles to determine if streamlines are valid connections. However, with a high number of close bundles, and endpoints masks being only 1 voxel thick, most of the streamlines were not classified as valid, even if a visual inspection showed them to be quite close to the groundtruth bundles.

In order to have evaluation results that more closely match the observed reality of the submitted datasets, we developed an improved "relaxed" scoring technique. All results that are now viewable on this website were obtained with that improved technique. Details of the technique will be discussed later on, and will be presented in an upcoming paper.

Global connectivity metrics definitions

Global connectivity metrics were used. All definitions and description are given in the Cote et al. Tractometer paper. We detail them here, updating them in the context of the relaxed scoring technique.

Valid Bundles (VB)
The number of valid bundles that were correctly reconstructed in the contestant's submission and that exist in the groundtruth data. In the context of the challenge, there were 25 groundtruth bundles.

Valid connections (VC)
The percentage of streamlines that were part of the Valid Bundles.

Invalid Bundles (IB)
The number of bundles that seemed realistic but were not matched to a known groundtruth bundle. Those are bundles that can be extracted from the submitted dataset, but do not match any existing bundle.

Invalid connections (IC)
The percentage of streamlines that were part of the Invalid Bundles.

No connection (NC)
The percentage of streamlines that were not assigned to VC or IC. They normally are very short streamlines, or streamlines that are alone in their shape and position, meaning that when clustered, they still are singletons.

Note that, in the classical Tractometer technique, there was another metric called Valid connections through a wrong path (VCWP). Since the updated technique doesn't rely on masks to classify streamlines, this category cannot be defined. Instead, those streamlines are normally assigned to one IB that reaches both endpoints but have a shape that is too different from the groundtruth bundle.

Fidelity metrics definitions

Additionnal metrics were implemented for the challenge evaluation. The so-called Fidelity metrics aim to give a general overview of the coverage of VB over their groundtruth counterparts. This aims to complement the global connectivity metrics, since a submission can find a specific valid bundle, but have a very poor coverage, since it may have found only a few streamlines to represent that bundle.

Bundle overlap (OL)
Proportion of the voxels within the volume of a ground truth bundle that is traversed by at least one valid streamline associated with the bundle. This value shows how well the tractography result recovers the original volume of the bundle.

Bundle overreach (OR)
Fraction of voxels outside the volume of a ground truth bundle that is traversed by at least one valid streamline associated with the bundle over the total number of voxels within the ground truth bundle. This value shows how much the valid connections extend beyond the ground truth bundle volume.

Angular error scores

  • Local Angular Error
    The mean voxel-wise angular error between the main local tractogram fiber directions and the respective ground truth fiber directions. Missing directions are penalized with the maximum 90 degree error.
    Values are split in 2 different categories: voxels containing 1 fiber population, and voxels with crossing fiber populations.