Introduction
Framework
Categories
Data
Data format
Submission format
Reference standard
Evaluation
Ranking

Introduction

Today, medical image analysis papers require solid experiments to prove the usefulness of the proposed methods. However, experiments are often performed on data selected by the researchers, which can come from different institutions, scanners and populations. Moreover, researchers often use different evaluation measures, which make published methods difficult to compare. This has resulted in a growing interest in grand challenges in medical image analysis. The goal of a grand challenge is to compare different algorithms for a particular task on the same (clinically representative) data, using the same evaluation protocol (see also http://www.grand-challenge.org/).

Several (automatic) medical image analysis techniques have been proposed to extract coronary centerlines (http://coronary.bigr.nl/centerlines), but not many methods are available to segment the artery lumen, and even fewer methods are available to robustly detect and quantify lesions in computed tomography angiography (CTA) data. Also, up to now, no standardized evaluation methodology has been published to reliably evaluate and compare the performance of the existing or newly developed coronary artery lumen segmentation, and stenoses detection and quantification algorithms. This evaluation framework will provide such a large-scale standardized evaluation methodology and reference database for the quantitative evaluation of coronary artery lumen segmentation algorithms and coronary artery stenoses detection and quantification algorithms.

Here, 48 multi-center multi-vendor CTA datasets, with corresponding quantitative coronary angiography (QCA) reference standard, are described and made available. Well-defined measures are presented and different methods are made available to extract statistics from the evaluation results.

Framework

It is possible to participate in one of the three following challenges:

  1. Coronary artery stenoses detection
  2. Coronary artery stenoses detection / quantification
  3. Coronary artery stenoses detection / quantification and coronary artery lumen segmentation

The focus of this framework is on stenoses detection, therefore each team has to participate in this task. Because it is expected that some of the methods can also output, next to stenoses position, the stenosis grade or the lumen segmentation, we also provide the possibility to (optionally) evaluate the stenosis grade, as well as the coronary lumen segmentation.
This page briefly describes respectively the tasks to be performed, the data used, the reference standard, the evaluation criteria, and rankings.

Categories

Cardiac image analysis algorithms need well-defined input to, for example, detect and quantify lesions or to segment vessels. Depending on the amount of user-interaction, we discern two different categories of algorithms:

Category 1: automatic methods
Only the input CTA images are provided to participants.

Category 2: semi-automatic methods
In addition to the CTA images, a list of points is provided for each patient. Two points at the ostia are provided, as well as a point at the end of each vessel, i.e. if there are 6 vessels in the coronary tree, a list of 8 points will be provided.

To demonstrate that (semi-)automatic detection of stenoses is possible, it is very important to allow every team to submit (semi-)automatic results, and not only those who already have a ready-to-use method to extract centerlines from CTA images.

Three teams who participated in the centerline extraction challenge of the Rotterdam Coronary Artery Algorithm Evaluation Framework and thus who developed an automatic algorithm to extract centerlines agreed to collaborate and to make their centerline available to the participants of the proposed challenge.
The teams are:

  • The VRVis team (Vienna, Austria)
    Contact: Dr. Katja Bühler, Dr. Erald Vucini
  • The Rcadia team (Haifa, Israel)
    Contact: Dr. Roman Goldenberg
  • The LUMC/Medis team (Leiden, Netherlands)
    Contact: Pieter Kitslaar

All the centerlines are made available to the participants of the proposed challenge. The participants can use these centerlines as input for their method and submit the combined method to category 1 or 2 (depending on the automation of the respective centerline extraction method). The participant must use the same centerline extraction algorithm for all the patients.

Data

Multi-center multi-vendor CTA data

For the workshop, 48 multi-center multi-vendor CTA datasets of symptomatic patients are used. Eighteen of the CTA images, with associated reference standard quantifications obtained from quantitative coronary angiography (QCA) and consensus reading of CTA, are available for optimizing the algorithms (i.e. training set). The remaining thirty datasets are used to validate the algorithms (i.e. testing sets); for this set the reference standard is not shared with the participants.

The 48 datasets were acquired with CT scanners from different vendors and sites, as presented in Table 1

table_infoCTAdata

Table 1 - Information about the 48 multi-center multi-vendor CTA datasets.

Patients selection

Patients were selected such that they would be representative of the population undergoing CTA examination for the assessment of obstructive coronary artery disease.

Accordingly to the literature, the guideline is to perform CTA on patients with low to intermediate pretest likelihood of having coronary artery disease:
 

“Especially in the context of ruling out stenosis in patients with low to intermediate pretest likelihood of disease, CT coronary angiography may develop into a clinically useful tool. CT  coronary angiography is reasonable for the assessment of obstructive disease in symptomatic patients.”
Budoff et al., “Assessment of Coronary Artery Disease by Cardiac Computed Tomography: A scientific statement to the AHA”, Circulation 2006.

Moreover, with respect to the alternative diagnostic algorithm of A. Weustink (Figure 1), low-risk patients with a coronary calcium score (CCS, Agatston score) between 0 and 400 are more often referred for a CTA test. In fact, the presence of extensive coronary calcium (CCS>400) seriously limits the reliability of CTA and, consequently, an ischemic test would be more effective than a CTA in patients with a high CCS.

diagnosticAlgorithm

Figure 1 - Alternative diagnostic algorithm
(A.C.Weustink, “The role of multi-slice computed tomography
in stable angina management: a current perspective”, Neth Heart J, 2011)

CCScategories

Table 2 – Coronary calcium score categories and risks.

For our study, the patients were thus selected with respect to the CCS as presented in the Table 3.

CCSdistribution

Table 3 –Coronary calcium score (CCS) selection criterion -
Ideal distribution (per vendor) of patients over the different CCS categories.

If there were not enough patients in a certain CCS category (Table 2), patients were chosen from one of the adjacent categories.

The coronary lesions, in both the 18 training datasets and the 30 testing datasets, are also distributed over:

  • Four stenosis categories (20%-49%; 50%-69%; 70%-94%; 95%-100%)
    1%-19% stenoses are considered as normal vessel.
  • Three plaque types (calcified, non-calcified, mixed)
  • Four main vessels (LM, LAD, RCA, LCX)

All lesions in vessels with a diameter superior to 1.5mm were included.

Patients exclusion criteria were:

  • previous history of percutaneous coronary stent placement
  • coronary artery bypass surgery
  • pacemaker
  • CTA of non-diagnostic image quality (motion artifacts)

Public availability of data

Teams must register and sign a data-confidentiality form to obtain the data. The participants will only have access to the CTA data. The QCA data will not be shared. The reference standard obtained from this data will only be provided for the training data. The provided data may only be used for the evaluation of the stenoses detection and quantification methods through the evaluation framework. CTA datasets and associated reference standard provided by the organizers may not be given or distributed under any circumstances to persons other than the person registered for the team.

Data format

Directory structure

The training data and testing data are stored in archives with directories for each dataset. The directories uniquely describe the datasets. The training datasets are numbered 00 to 17 and are stored in the directories dataset00 to dataset17. The testing set are stored in the directories dataset18 to dataset47. Each directory datasetXX contains a DICOM folder with the image files (CTA image), named exportXXXX.dcm, and 17 directories, one for each of the modified 17-segment American Heart Association; these are named seg01, seg02, seg03, seg04, seg17, seg05… seg16. For the training datasets, these directories also contain reference standard files with lesion positions and quantifications, as well as the reference standard lumen segmentations.

Image data format

All image data are stored in (anonymized) Dicom format.
Note: the pixel type is unsigned short; a gray value (GV) of 0 corresponds to -1024 Hounsfield units (HU) and 1024GV corresponds to 0HU (i.e. HU(x) = GV(x) - 1024).

QCA reference standard input files (for training datasets)

Each directory datasetXX contain a file reference_QCA.txt that contain the QCA stenoses detection and quantification reference standard. For each of the modified 17-AHA segment, the QCA diameter stenosis is reported (G); it is a number between 0 (healthy) and 100 (occluded).
A typical reference_QCA.txt file looks like this:

seg_01 0
seg_02 0
seg_03 34
seg_04 0
seg_17 -1
seg_05 28
seg_06 63
seg_07 100
seg_08 -1
seg_09 0
seg_10 -1
seg_11 0
seg_12 0
seg_13 0
seg_14 37
seg_15 -1
seg_16 58
with -1 refering to <1.5mm diameter segments either in CTA or in QCA or segment that are absent (seg_10,seg15,seg_17), and to segments that are distale to a complete occlusion(seg_07). These segments are excluded from the evaluation.

CTA reference standard input files (for training datasets)

For each dataset, each directory segXX contains a file reference_CTA.txt that contains the CTA stenoses detection and quantification reference standard. The files contain the world position of the x-, y-, and z-coordinate of each path point of the segment centerline, the segment number (between 1 and 17), the stenosis number (between 1 and N, N being the number of stenoses >20%; 0 are healthy positions), the stenosis type (0 for no plaque, 1 for soft, 2 for calcified, 3 for mixed plaque), and the category grade G derived from the consensus reading of CTA. Each category is assigned a specific number: healthy (G=0), mild (G=1), moderate (G=2), severe (G=3), occluded (G=4). Every point is on a different line in the file, starting with the most proximal point and ending with the most distal point of the segment.
A typical reference_CTA.txt file looks like this, with N the number of path points for the considered segment seg_06 of the patient:

x1

y1

z1

6

0

0

0

x2

y2

z2

6

0

0

0

x3

y3

z3

6

0

0

0

x4

y4

z4

6

5

2

G5

x6

y6

z6

6

5

2

G5

... ... ...

6

5

2

G5

x25

y25

z25

6

5

2

G5

x26

y26

z26

6

0

0

0

x27

y27

z27

6

0

0

0

... ... ...

6

0

0

0

x58

y58

z58

6

0

0

0

x59

y59

z59

6

6

1

G6

... ... ...

6

6

1

G6

x74

y74

z74

6

6

1

G6

x75

y75

z75

6

0

0

0

... ... ...

6

0

0

0

xN

yN

zN

6

0

0

0

Lumen segmentation (for training datasets)

For each of the modified 17-AHA segment having at least one significant stenosis (i.e. >50% diameter reduction), as well as for three random segments having either no or only non-significant (i.e. <50%) stenoses (ideally one in LAD, RCA and LCX), three lumen segmentations, from three independent and blinded observers, are provided and stored in the corresponding segXX directory of the patient. Three reference files reference_lumen_obs1.vtk, reference_lumen_obs2.vtk and reference_lumen_obs3.vtk contain the 3D geometrical model of a healthy or diseased lumen vessel segment. The information will be stored using the VTK polygonal data file format, which stores the 3D points for an object together with their connectivity.

Point files (for submission to Category 2)

If a participant submits to category 2 (semi-automatic), datasetXX will additionally contain a:

  • point_start.txt file in seg01 and seg05 directories, containing respectively the starting point of the RCA and LM artery.
  • point_end.txt file in seg04, seg17, seg08, seg09, seg10, seg12, seg14, seg15, seg16 directories, containing the ending point of each vessel, if the vessel is present in the considered dataset.

The point_start.txt and point_end.txt  files contain three values, corresponding with the x-,y- and z-(world )coordinate  of the respective point.

Centerline input files (optional)

If the participants wish to use the centerlines extracted using one of the four available centerline extraction methods, he may additionally download our centerline package.
This package contains two folders: one for automatically extracted centerlines (Automatic/) and one for centerlines extracted using some additional manual refinement (ManuallyCorrected/).
Then, each Team_XXXX repository contains a folder for each dataset (datasetXX/), and each dataset directory contains as many vessel folder (vesselXX/) as detected by the team algorithm, with a file named result.txt, with the paths for each vessel.
The files result.txt will contain the world position of the x-, y-, and z-coordinate of each path point. Every point is on a different line in the file starting with the most proximal point and ending with the most distal point of the vessel. A typical centerline.txt file looks as follows, with N the number of points of the path.

x1 y1 z1
x... y... z...
xN yN zN

Note: the same centerline extraction method should be used for all the datasets and vessels; it is not allowed to pick centerlines from different extraction package.

Submission format

Participants should upload an archive (.rar, .tar.gz, .tar or .zip files are supported) containing a directory for each of the 18 training datasets (dataset00 to dataset17) and 30 testing datasets (dataset18 to dataset47) in the first layer of the archive. Please respect strictly the submission format, otherwise, your submission may fail.

Stenosis detection

Each of the dataset directories should contain a file called stenoses.txt that contains the world position of the x-, y-, and z-coordinate of each significant stenosis (i.e. >50% diameter reduction) detected by the algorithm.
A typical stenoses.txt file looks like this, with S the total number of stenoses for this patient:

x1 y1 z1
x... y... z...
xS yS zS

Important note: please report one (central) point per stenosis, and not all the point located along the centerline of the stenosis. S should be smaller than 48; having more than 48 stenoses per patient is not plausible.

Stenosis detection and quantification

Each of the dataset directories should contain a file called stenoses.txt that contains the world position of the x-, y-, and z-coordinate of each stenosis, the estimated CTA diameter stenosis GCTA, as well as the estimated QCA diameter stenosis GQCA. The grades GCTA and GQCA are a number between 0 (healthy) and 100 (occluded).
A typical stenoses.txt file looks like this, with S the total number of stenoses for this patient:

x1 y1 z1 GCTA1 GQCA1
x... y... z... GCTA... GQCA...
xS yS zS GCTAS GCTAS

Important note: please report one (central) point per stenosis, and not all the point located along the centerline of the stenosis. S should be smaller than 48; having more than 48 stenoses per patient is not plausible.

Stenosis detection and quantification + lumen segmentation

Each of the dataset directories should contain a file called stenoses.txt as describe in the previous paragraph Stenosis detection and quantification.
Lumen segmentation results should also be submitted in VTK Polygonal data file format. For the testing data, it will be unknown which segments will be evaluated; therefore participants should provide one mesh of the complete coronary tree as datasetXX/segmentation.vtk.
A typical segmentation.vtk file looks like this:

# vtk DataFile Version X.Y
vtk output
ASCII
DATASET POLYDATA
POINTS 425 float
90.909 112.412 30.5784
90.8238 112.107 30.083
90.9645 111.606 30.1362
91.1363 111.428 30.5314
91.1949 111.659 30.8882
91.1413 112.206 31.1384
...
153.125 117.118 56.4028
152.652 117.092 56.1709
POLYGONS 811 3244
3 0 6 1
3 1 6 2
3 2 6 7
3 2 7 8
...
3 418 423 424
3 418 424 413
3 413 424 419

Reference standard

  1. Coronary artery stenoses detection / quantification

    • Quantitative coronary angiography (QCA) data:

    • The algorithms are compared to the results of the per-segment QCA analysis.

      The 2D-QCA percentage of diameter stenosis is determined using minimal diameters. QCA involves injection of contrast agents into the artery, followed by X-ray imaging in multiple planes and assessment of arterial lumen diameter. First, on each acquired X-ray imaging planes, the vessel boundaries are delineated. Then, the segments-of-interest (i.e. containing stenoses) are identified. Finally, quantitatively useful information is derived, such as the percent of diameter-stenosis. The percent of diameter-stenosis reflects the degree of vessel narrowing relative to an assumed “normal” vessel diameter immediately adjacent, which serves as the reference diameter, as follows (for each X-ray imaging planes):

      with dm the minimal diameter, and dr the reference “normal” diameter.
      The final percent of diameter-stenosis is the minimal Sd obtained over all the X-ray imaging in planes.


      Figure 3 – Quantitative coronary angiography (QCA)
      For each acquired X-ray imaging planes, the minimal luminal diameters (dm) are measured and compared to the diameter of the "normal" vessel immediately adjacent which serves as the reference diameter (dr). Given the minimal (projected) diameter, the percent stenosis can be calculated.

      One experienced cardiologists (Dr. Koen Nieman, MD, PhD) unaware of the results of CTA received the coronary artery angiograms (CAGs), identified and analyzed all coronary segments using a modified 17-segment American Heart Association classification on a workstation at the Erasmus Medical Center Rotterdam. Segments were visually classified as normal (visually <20% narrowing) or as having nonsignificant or significant coronary obstruction (visually >20% narrowing). The stenoses in segments visually scored as having >20% narrowing were quantified by a validated QCA algorithm (CAAS, Pie Medical, Maastricht, the Netherlands). Stenoses were evaluated in the worst angiographic view and classified as significant if the lumen diameter reduction exceeded 50%.

    • Computed tomography coronary angiography (CTA) data:
    • For each patient, Expert2, Expert3 & Expert4 visually inspected the CTA image of the 48 patients. If a non-significant (>20% narrowing) or significant stenosis (>50% narrowing) was detected by eye-balling, the stenosis position, type and degree (5 grade classification; 0-20% healthy, 20%-50% mild, 50%-70% moderate, 70%-94% severe, 95%-100% occluded) were reported.

      Given the 3 observer's grades, a reference standard (RS) is created following the protocol of Figure 3.

      CTA_ReferenceStd

      Figure 3 –Protocol to create the CTA stenoses detection and quantification reference standard (RS), given the 3 observer's grades.

  2. Coronary artery lumen segmentation
    • Computed tomography coronary angiography (CTA) data:

    • For each patient, Expert2, Expert3 & Expert4 manually annotated the CTA lumen boundary of all diseased segments (i.e. modified 17-AHA segments which contain at least one significant stenosis), as well as 3 healthy segments (i.e. modified 17-AHA segments with either no disease or only non-significant stenoses), distributed over the 3 main arteries. A unique centerline was used by the 3 observers, for each segment.

Evaluation

In this section, we describe the specific evaluation measures that are used to rank the different algorithms for:

  1. the detection of significant coronary artery stenoses.
  2. the quantification of the degree of coronary artery stenoses.
  3. the coronary artery lumen segmentation.

1. Detection of stenoses

Participants report (per patient) the location of all detected significant lesions. Evaluation is performed only at >1.5mm diameter vessels.

  • QCA reference standard : segment-based analysis

    Each lesion detected by the participants is assigned to one of the 17-AHA segments; the participants do not have to indicate himself to which segment each lesion belongs to.
    A stenosis detected and quantified as being in the “healthy” (0-20%) or “mild” (20-49%) is assigned to the “non-significant” (NS) detection category
    A stenosis detected and quantified as being in the “moderate” (50-69%), “severe” (70-94%) or “occluded” (95-100%) is assigned to the “significant” (S) detection category.
    Then, each of the 17-AHA coronary artery segments will be assigned to be:

    • a true negative detection, if no significant stenoses (i.e. >50% diameter reduction stenosis) has been detected in the segment by either the algorithm or the reference.
    • a true positive detection, if at least one significant stenosis (i.e. >50% diameter reduction stenosis) has been detected in the segment by both the algorithm and the reference.
    • a false negative detection, if no significant stenoses (i.e. >50% diameter reduction stenosis) has been detected in the segment by the algorithm while the QCA reference indicates >50% stenosis for the considered segment.
    • a false positive detection, if at least one significant stenosis (i.e. >50% diameter reduction stenosis) has been detected in the segment by the algorithm while the QCA reference indicates <50% stenosis for the considered segment.

  • CTA consensus reference standard : lesion-based analysis
  • The stenoses considered here are the ones from the union of the stenoses detected by the participant and the ones of the CTA reference standard.

    The true/false positive/negative are defined as follows:

    • true negative detection, if 1) a non-significant reference stenosis (i.e. <50% diameter reduction stenosis) has not been detecetd by the algorithm, or if 2) a non-significant stenosis is reported by the algorithm and match either no of non-significant reference stenosis.
    • true positive detection, if a significant stenosis (i.e. >50% diameter reduction stenosis) has been detected by the algorithm and matches a significant reference stenosis.
    • false negative detection, if a significant reference stenosis (i.e. >50% diameter reduction stenosis) has either not been detected by the algorithm or been detected by the algorithm as being non-significant (i.e. <50% diameter reduction stenosis).
    • false positive detection, if a significant stenosis (i.e. >50% diameter reduction stenosis) has been detected by the algorithm and matches either no or a non-significant (i.e. <50% diameter reduction stenosis) reference stenosis.

The metrics that will be used to evaluate the performances of the detection algorithms are the sensitivity and PPV, overall patients.

2. Quantification of the degree of stenosis

Participants report the percentage of lumen diameter reduction for each detected lesion. The stenosis grade is a value in the range [0 . . . 100], where 0 implies no stenosis and 100 implies a fully occluded vessel.

  • QCA reference standard : segment-based analysis, discrete degree

    The lumen diameter reduction of each of the 17-AHA segments is the maximum of the lumen diameter reduction of his lesions.
    The metrics used to evaluate the performances of the quantification algorithms are the absolute average difference and the RMS difference over the estimated and true lumen diameter reduction, overall patients.

  • CTA consensus reference standard : lesion-based analysis, 5 grades categories
  • The stenoses considered here are the ones from the union of the stenoses detected by the participant and the ones of the CTA reference standard.
    Each stenosis detected by the participants is assigned to the corresponding stenosis grade (mild, moderate, severe, occluded).
    The metric used to evaluate the performances of the quantification algorithms is the (linearly) weighted Kappa value, overall patients.

3. Lumen segmentation

Participants provide a 3D geometrical model of the coronaries. A distinction is made between healthy and diseased vessel segments. The accuracy of the segmentation is assessed by comparing the 3D model with a segmentation obtained by averaging the three manual observers, using the following measures:

  • Accuracy measures:
    • Root mean squared (RMS) distance between manual and automatic 3D surfaces.
    • Hausdorff distance between manual and automatic 3D surfaces.
  • Overlap measure:
    Given the 3D geometrical model of the coronaries, the volume overlap (3D Dice coefficient) of the segmentation as compared to the reference standard is computed.
The manual segmentations is created for all stenotic segments (i.e. segments with at least one significant stenosis in CTA) and for three other (healthy) positions in the image. The participants are blinded for the specific sections that are evaluated; therefore participants should provide one geometric model of the complete coronary tree.

Ranking

The teams are ranked based on their algorithm performance. The evaluations measures are ranked and each method is assigned a number ranging from 1 (best) to N (worst), where N is the number of evaluated algorithms. The average rank (over all the performance measures) will determine the final ranking; a rank of 1.0 means that an algorithm performs best for all measures and all subjects/vessels.
There will be three different rankings:

  • one ranking for detection of significant lesions
  • one ranking for quantification of lesions
  • one ranking for lumen segmentation

Detection of lesions ranking
The detection algorithms are ranked based on the (overall patients) sensitivity and PPV achieved as compared to CTA (per-lesion) and QCA (per-segment) reference standard, which results in 2x2=4 ranks per method. For each method, an average detection rank is obtained by averaging the 4 different ranks; the method with the lowest average rank is ranked first.

Quantification of lesions ranking
The quantification algorithms are ranked on their average absolute difference and RMS difference of the degree of stenosis (as compared to QCA reference standard, per-segment) and on their weighted kappa coefficient (as compared to CTA consensus reading, per-lesion). Then, an average rank is obtained by averaging the 3 ranks, with a weight of 2 for the weighted kappa rank, in order to compensate for the fact that only one evaluation metric is computed for CTA; the method with the lowest average rank is ranked first.

Lumen segmentation ranking
The segmentation algorithms are ranked based on the overlap, the average absolute radius difference, and the average Hausdorff distance (average over the 3 observer's references), while making distinction between segments having non-significant and significant stenoses. For each method, a per-patient average is obtained by averaging the 3 ranks if the patient do not present any significant lesion, or the 6 ranks if the patient presents at least one significant lesion. Each method will be assigned the average of these numbers and these averages will be ranked to get the final method ranking as follows:

Table_DetectionRankingExample