Introduction
Workshop
Framework
Categories
Data
Data format
Submission format
Reference standard
Evaluation
Ranking

Introduction

Today, medical image analysis papers require solid experiments to prove the usefulness of the proposed methods. However, experiments are often performed on data selected by the researchers, which can come from different institutions, scanners and populations. Moreover, researchers often use different evaluation measures, which make published methods difficult to compare. This has resulted in a growing interest in grand challenges in medical image analysis. The goal of a grand challenge is to compare different algorithms for a particular task on the same (clinically representative) data, using the same evaluation protocol (see also http://www.grand-challenge.org/).

Several (automatic) medical image analysis techniques have been proposed to extract coronary centerlines (http://coronary.bigr.nl/centerlines), but not many methods are available to segment the artery lumen, and even fewer methods are available to robustly detect and quantify lesions in CTA data. Also, up to now, no standardized evaluation methodology has been published to reliably evaluate and compare the performance of the existing or newly developed coronary artery lumen segmentation, and stenoses detection and quantification algorithms. This evaluation framework will provide such a large-scale standardized evaluation methodology and reference database for the quantitative evaluation of coronary artery lumen segmentation algorithms and coronary artery stenoses detection and quantification algorithms. At this point, we have formalized our ideas about the evaluation framework, but the proposal is still subject to changes and open for discussion.

Here, 48 multi-center multi-vendor CTA datasets, with corresponding quantitative coronary angiography (QCA) reference standard, will be described and made available. Well-defined measures are presented and different methods will be made available to extract statistics from the evaluation results.

The results of this evaluation framework will be reported in:

  1. A technical paper. This paper will be submitted to a high-impact technical journal such as Medical Image Analysis (MedIA) or IEEE Transactions on Medical Imaging (TMI).
    Objectives: to demonstrate the feasibility of dedicated algorithms for 1) (semi-)automated coronary lumen segmentation and 2) (semi-)automated detection and quantification of stenosis on computed tomography angiography (CTA) in comparison with quantitative coronary angiography (QCA) and CTA consensus reading.
  2. A clinical paper. This paper will be submitted to a high-impact clinical journal such as the Journal of American College in Cardiology (JACC).
    Objectives: to determine the accuracy of (semi-)automated algorithms and visual inspection of CTA to detect and quantify stenoses on computed tomography angiography (CTA), in comparison with quantitative coronary angiography (QCA).

Each team that submits results to the challenges may have up to two authors on the two papers we intend to write.

paperOutputs

Figure 1 - Overview of how the results
of this evaluation framework will be reported in technical and clinical papers.

Workshop

The evaluation framework will be launched during the 3D Cardiovascular Imaging: a MICCAI segmentation challenge workshop that will be organized during the 15th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), which will be held on October 1st, 2012 in Nice Sophia Antipolis, Côte d'Azur, France. This workshop also includes a challenge on Right Ventricle Segmentation on cardiac MRI.

The workshop will start with the presentation and distribution of the onsite testing datasets and the start of the challenge. For the remainder of the morning session, participants will run their algorithms on this data and concurrently a poster session will take place. The organizers will evaluate the results with reference data during the lunch break. After lunch, we intend a set of selected talks (based on the pre-workshop results) and possibly an invited lecture. Following these presentations, we will have a poster session, where the participants can explain their algorithm, possibly including on-site demos. The afternoon session will conclude with a presentation of the challenge results, and a town hall discussion of maximally 1 hour.

Important dates (tentatives):
Training data (18/48) ready for download   March, 19th 2012
Open for submissions    May, 14th 2012
Testing data (24/48) ready for download June, 4th 2012
Submission period
Testing results and (max.) 8-page paper describing the method
June, 4th 2012 to
June, 29th 2012
Notification of acceptance to participants July, 6th 2012
On-site testing data (6/48) ready for download  October, 1st 2012
Workshop October, 1st 2012

Framework

It is possible to participate in one of the three following challenges:

  1. Coronary artery stenoses detection
  2. Coronary artery stenoses detection / quantification
  3. Coronary artery stenoses detection / quantification and coronary artery lumen segmentation

The focus of this framework is on stenoses detection, therefore each team has to participate in this task. Because we expect that some of the methods can also output, next to stenoses position, the stenosis grade or the lumen segmentation, we will also provide the possibility to (optionally) evaluate the stenosis grade, as well as the coronary lumen segmentation.
This page briefly describes respectively the tasks to be performed, the data used, the reference standard, the evaluation criteria, and rankings.

Categories

Cardiac image analysis algorithms need well-defined input to, for example, detect and quantify lesions or to segment vessels. Depending on the amount of user-interaction, we discern two different categories of algorithms:

Category 1: automatic methods
Only the input CTA image will be provided to participants.

Category 2: semi-automatic methods
In addition to the CTA images, a list of points will be provided for each patient. Two points at the ostia will be provided, as well as a point at the end of each vessel, i.e. if there are 6 vessels in the coronary tree, a list of 8 points will be provided.

To demonstrate that (semi-)automatic detection of stenoses is possible, it is very important to allow every team to submit (semi-)automatic results, and not only those who already have a ready-to-use method to extract centerlines from CTA images.

Three teams who participated in the centerline extraction challenge of the Rotterdam Coronary Artery Algorithm Evaluation Framework and thus who developed an automatic algorithm to extract centerlines agreed to collaborate and to make their centerline available to the participants of the proposed challenge.
The teams are:

  • The VRVis team (Vienna, Austria)
    Contact: Dr. Katja Bühler, Dr. Erald Vucini
  • The Rcadia team (Haifa, Israel)
    Contact: Dr. Roman Goldenberg
  • The LUMC/Medis team (Leiden, Netherlands)
    Contact: Pieter Kitslaar

All the centerlines will be made available to the participants of the proposed challenge. The participants can use these centerlines as input for their method and submit the combined method to category 1 or 2 (depending on the automation of the respective centerline extraction method). The participant must use the same centerline extraction algorithm for all the patients.

Data

Multi-center multi-vendor CTA data

For the workshop, 48 multi-center multi-vendor CTA datasets of symptomatic patients will be used. Twenty-four of the CTA images, with associated reference standard quantifications obtained from quantitative coronary angiography (QCA) and consensus reading of CTA, will be available for optimizing the algorithms (i.e. training set). The remaining forty datasets will be used to validate the algorithms (i.e. testing set); for this set the reference standard will not be shared with the participants.
For the technical journal paper, 48 datasets seem to be a good compromise between the feasibility of the workshop and the significance of (technical) results. For the clinical journal paper, we would like to include more datasets. The number of required datasets will be calculated using statistical power analysis and the results of the workshop. New datasets will be processed after the workshop (end of 2012).

The 48 datasets will be acquired with CT scanners from different vendors and sites, as presented in Table 1.

table_infoCTAdata

Table 1 - Information about the 48 multi-center multi-vendor CTA datasets.

Patients selection

Patients were selected such that they would be representative of the population undergoing CTA examination for the assessment of obstructive coronary artery disease.

Accordingly to the literature, the guideline is to perform CTA on patients with low to intermediate pretest likelihood of having coronary artery disease:
 

“Especially in the context of ruling out stenosis in patients with low to intermediate pretest likelihood of disease, CT coronary angiography may develop into a clinically useful tool. CT  coronary angiography is reasonable for the assessment of obstructive disease in symptomatic patients.”
Budoff et al., “Assessment of Coronary Artery Disease by Cardiac Computed Tomography : A scientific statement to the AHA”, Circulation 2006.

Moreover, with respect to the alternative diagnostic algorithm of A. Weustink (Figure 2), low-risk patients with a coronary calcium score (CCS, Agaston score) between 0 and 400 are more often referred for a CTA test. In fact, the presence of extensive coronary calcium (CCS>400) seriously limits the reliability of CTA and, consequently, an ischemic test would be more effective than a CTA in patients with a high CCS.

diagnosticAlgorithm

Figure2 - Alternative diagnostic algorithm
(A.C.Weustink, “The role of multi-slice computed tomography
in stable angina management: a current perspective”, Neth Heart J, 2011)

CCScategories

Table 2 – Coronary calcium score categories and risks.

For our study, the patients were thus selected with respect to the CCS as presented in the Table 3.

CCSdistribution

Table 3 –Coronary calcium score (CCS) selection criterion -
Ideal distribution (per vendor) of patients over the different CCS categories.

If there were not enough patients in a certain CCS category (Table 2), patients were chosen from one of the adjacent categories.

The coronary lesions, in both the 18 training datasets and the 30 testing datasets, will be distributed over :

  • Four stenosis categories (20%-49%; 50%-69%; 70%-94%; 95%-100%)
    1%-19% stenoses are considered as normal vessel.
  • Two plaque types (calcified, non-calcified)
  • Four main vessels (LM, LAD, RCA, LCX)

All lesions in vessels with a diameter superior to 1.5mm will be included.

Patients exclusion criteria were:

  • previous history of percutaneous coronary stent placement
  • coronary artery bypass surgery
  • pacemaker
  • CTA of non-diagnostic image quality (motion artifacts)

Public availability of data

Teams must register and sign a data-confidentiality form to obtain the data. The participants will only have access to the CTA data. The QCA data will not be shared. The reference standard obtained from this data will only be provided for the training data. The provided data may only be used for the evaluation of the stenoses detection and quantification methods through the evaluation framework. CTA datasets and associated reference standard provided by the organizers may not be given or distributed under any circumstances to persons other than the person registered for the team.

Data format

Directory structure

The training data and testing data will be stored in archives with directories for each dataset. The directories uniquely describe the datasets. The training datasets will be numbered 00 to 17 and will be stored in the directories dataset00 to dataset17. The testing set will be stored in the directories dataset18 to dataset41. The on-site testing set of 6 datasets will be stored in the directories dataset42 to dataset47. These images will be provided at the beginning of the workshop and should be processed during workshop. Each directory datasetXX will contain a DICOM folder with the image files (CTA image), named exportXXXX.dcm, and 17 directories, one for each of the modified 17-segment American Heart Association; these are named seg01, seg02, seg03, seg04, seg17, seg05… seg16. For the training datasets, these directories will also contain reference standard files with lesion positions and quantifications, as well as the reference standard lumen segmentations.

Image data format

All image data will be stored in (anonymized) Dicom format.
Note: the pixel type is unsigned short; a gray value (GV) of 0 corresponds to -1024 Hounsfield units (HU) and 1024GV corresponds to 0HU (i.e. HU(x) = GV(x) - 1024).

QCA reference standard input files (for training datasets)

Each directory datasetXX will contain a file reference_QCA.txt that will contain the QCA stenoses detection and quantification reference standard. For each of the modified 17-AHA segment, the QCA diameter stenosis is reported (G); it is a number between 0 (healthy) and 100 (occluded).
A typical reference_QCA.txt file looks like this:

seg_01 0
seg_02 0
seg_03 34
seg_04 0
seg_17 -1
seg_05 28
seg_06 63
seg_07 100
seg_08 -1
seg_09 0
seg_10 -1
seg_11 0
seg_12 0
seg_13 0
seg_14 37
seg_15 -1
seg_16 58
with -1 refering to <1.5mm diameter segments either in CTA or in QCA or segment that are absent (seg_10,seg15,seg_17), and to segments that are distale to a complete occlusion(seg_08). These segments are excluded from the evaluation.

CTA reference standard input files (for training datasets)

For each dataset, each directory segXX will contain a file reference_CTA.txt that will contain the CTA stenoses detection and quantification reference standard. . The files will contain the world position of the x-, y-, and z-coordinate of each path point of the segment centerline, the segment number (between 1 and 16), the stenosis number (between 1 and N, N being the number of stenoses >20%; 0 are healthy positions), the stenosis type (0 for no plaque, 1 for soft, 2 for calcified, 3 for mixed plaque), and the category grade G derived from the consensus reading of CTA. Each category is assigned a specific number: healthy (G=0), mild (G=1), moderate (G=2), severe (G=3), occluded (G=4). Every point is on a different line in the file, starting with the most proximal point and ending with the most distal point of the vessel.
A typical reference_CTA.txt file looks like this, with N the number of path points for the considered segment seg_06 of the patient:

x1

y1

z1

6

0

0

0

x2

y2

z2

6

0

0

0

x3

y3

z3

6

0

0

0

x4

y4

z4

6

5

2

G5

x6

y6

z6

6

5

2

G5

... ... ...

6

5

2

G5

x25

y25

z25

6

5

2

G5

x26

y26

z26

6

0

0

0

x27

y27

z27

6

0

0

0

... ... ...

6

0

0

0

x58

y58

z58

6

0

0

0

x59

y59

z59

6

6

1

G6

... ... ...

6

6

1

G6

x74

y74

z74

6

6

1

G6

x75

y75

z75

6

0

0

0

... ... ...

6

0

0

0

xN

yN

zN

6

0

0

0

Lumen segmentation (for training datasets)

For each of the modified 17-AHA segment having at least one significant stenosis, as well as for three random segments having non-significant stenoses (ideally one in LAD, RCA and LCX), three lumen segmentation (i.e. from three independant and blinded observers) will be provided and stored in the corresponding segXX directory of the patient. Three reference files reference_lumen_obs1.vtk, reference_lumen_obs2.vtk and reference_lumen_obs3.vtk will contain the 3D geometrical model of a healthy or diseased lumen vessel segment. The information will be stored using the VTK polygonal data file format, which stores the 3D points for an object together with their connectivity.

Point files (for submission to Category 2)

If a participant submits to category 2 (semi-automatic), datasetXX will additionally contain a:

  • point_start.txt file in seg01 and seg05 directories, containing respectively the starting point of the RCA and LM artery.
  • point_end.txt file in seg04, seg17, seg08, seg09, seg10, seg12, seg14, seg15, seg16 directories, containing the ending point of each vessel.

The point_start.txt and point_end.txt  files contain three values, corresponding with the x-,y- and z-(world )coordinate  of the respective point.

Centerline input files (optional)

If the participants wish to use the centerlines extracted using one of the four available centerline extraction methods, he may additionnally download our centerline package.
This package contains two folders: one for automatically extracted centerlines (Automatic/) and one for centerlines extracted using some additionnal manual refinement (ManuallyCorrected/).
Then, each Team_XXXX repository contains a folder for each dataset (datasetXX/), and each dataset directory contains as many vessel folder (vesselXX/) as detected by the team algorithm, with a file named result.txt, with the paths for each vessel.
The files result.txt will contain the world position of the x-, y-, and z-coordinate of each path point. Every point is on a different line in the file starting with the most proximal point and ending with the most distal point of the vessel. A typical centerline.txt file looks as follows, with N the number of points of the path.

x1 y1 z1
x... y... z...
xN yN zN

Note: the same centerline extraction method should be used for all the datasets and vessels; it is not allowed to pick centerlines from different extraction package.

Submission format

Participants should upload an archive (.rar, .tar.gz, .tar or .zip files are supported) containing a directory for each of the 18 training datasets (dataset00 to dataset17) and 24 testing datasets (dataset18 to dataset41).

Stenosis detection

Each of the dataset directories should contain a file called stenoses.txt that contains the world position of the x-, y-, and z-coordinate of each stenosis.
A typical stenoses.txt file looks like this, with S the total number of stenoses for this patient:

x1 y1 z1
x... y... z...
xS yS zS

Stenosis detection and quantification

Each of the dataset directories should contain a file called stenoses.txt that contains the world position of the x-, y-, and z-coordinate of each stenosis, the estimated CTA diameter stenosis GCTA, as well as the estimated QCA diameter stenosis GQCA. The grades GCTA and GQCA are a number between 0 (healthy) and 100 (occluded).
A typical stenoses.txt file looks like this, with S the total number of stenoses for this patient:

x1 y1 z1 GCTA1 GQCA1
x... y... z... GCTA... GQCA...
xS yS zS GCTAS GCTAS

Stenosis detection and quantification + lumen segmentation

Each of the dataset directories should contain a file called stenoses.txt as describe in the previous paragraph Stenosis detection and quantification.
Lumen segmentation results should also be submitted in VTK Polygonal data file format. For the testing data, it will be unknown which segments will be evaluated; therefore participants should provide one mesh of the complete coronary tree as datasetXX/lumen_segmentation.vtk.

Reference standard

  1. Coronary artery stenoses detection / quantification

    • Quantitative coronary angiography (QCA) data:

    • The algorithms will be compared to the results of the per-segment QCA analysis.

      The 2D-QCA percentage of diameter stenosis is determined using minimal diameters. QCA involves injection of contrast agents into the artery, followed by X-ray imaging in multiple planes and assessment of arterial lumen diameter. First, on each acquired X-ray imaging planes, the vessel boundaries are delineated. Then, the segments-of-interest (i.e. containing stenoses) are identified. Finally, quantitatively useful information is derived, such as the percent of diameter-stenosis. The percent of diameter-stenosis reflects the degree of vessel narrowing relative to an assumed “normal” vessel diameter immediately adjacent, which serves as the reference diameter, as follows (for each X-ray imaging planes):

      with dm the minimal diameter, and dr the reference “normal” diameter.
      The final percent of diameter-stenosis is the minimal Sd obtained over all the X-ray imaging in planes.


      Figure 3 – Quantitative coronary angiography (QCA)
      For each acquired X-ray imaging planes, the minimal luminal diameters (dm) are measured and compared to the diameter of the "normal" vessel immediately adjacent which serves as the reference diameter (dr). Given the minimal (projected) diameter, the percent stenosis can be calculated.

      One experienced cardiologists (Dr. Koen Nieman, MD, PhD) unaware of the results of CTA will receive the coronary artery angiograms (CAGs), identify and analyze all coronary segments using a modified 17-segment American Heart Association classification on a workstation at the Erasmus Medical Center Rotterdam. Segments will be visually classified as normal (visually <20% narrowing) or as having nonsignificant or significant coronary obstruction (visually >20% narrowing). The stenoses in segments visually scored as having >20% narrowing will be quantified by a validated QCA algorithm (CAAS, Pie Medical, Maastricht, the Netherlands). Stenoses will be evaluated in the worst angiographic view and classified as significant if the lumen diameter reduction exceeded 50%.

    • Computed tomography coronary angiography (CTA) data:
    • For each patient, Expert2, Expert3 & Expert4 will visually inspect the CTA image of the 48 patients. If a non-significant (>20% narrowing) or significant stenosis (>50% narrowing) is detected by eye-balling, the stenosis position, type and degree (5 grade classification; 0-20% healthy, 20%-50% mild, 50%-70% moderate, 70%-94% severe, 95%-100% occluded) are reported.

      Given the 3 observer's grades, a reference standard (RS) is created following the protocol of Figure 3.

      CTA_ReferenceStd

      Figure 3 –Protocol to create the CTA stenoses detection and quantification reference standard (RS), given the 3 observer's grades.

  2. Coronary artery lumen segmentation
    • Computed tomography coronary angiography (CTA) data:

    • For each patient, Expert2, Expert3 & Expert4 will manually annotate the CTCA lumen boundary of all diseased segments (i.e. modified 17-AHA segments which contain at least one significant stenosis), as well as 3 healthy segments (i.e. modified 17-AHA segments with either no disease or only non-significant stenoses), distributed over the 3 main arteries. A unique centerline will be used by the 3 observers, for each segment.

Evaluation

In this section, we will describe the specific evaluation measures that will be used to rank the different algorithms for:

  1. the detection of coronary artery stenoses.
  2. the quantification of the degree of coronary artery stenoses.
  3. the coronary artery lumen segmentation.

1. Detection of stenoses

Participants report (per patient) the location of all detected lesions. Evaluation will be performed only at >1.5mm diameter vessels.

  • QCA reference standard : segment-based analysis

    Each lesion detected by the participants will be assigned to one of the 17-AHA segments; the participants do not have to indicate himself to which segment each lesion belong to.
    A stenosis detected and quantified as being in the “healthy” (0-20%) or “mild” (20-49%) is assigned to the “non-significant” (NS) detection category.
    A stenosis detected and quantified as being in the “moderate” (50-69%), “severe” (70-94%) or “occluded” (95-100%) is assigned to the “significant” (S) detection category.
    Then, each of the 17-AHA coronary artery segments will be graded as being:

    • significantly diseased, if at least one significant stenosis (i.e. >50% diameter reduction stenosis) has been detected in the segment.
    • non-significantly diseased, if no stenosis has been detected in the segment, or if only non-significant stenoses (i.e. <50% diameter reduction) have been detected.

  • CTA consensus reference standard : lesion-based analysis
  • The stenoses considered here are the ones from the union of the stenoses detected by the participant and the ones of the CTA reference standard.

The metrics that will be used to evaluate the performances of the detection algorithms are the sensitivity and PPV, overall patients.

 

2. Quantification of the degree of stenosis

Participants report the percentage of lumen diameter reduction for each detected lesion. The stenosis grade is a value in the range [0 . . . 100], where 0 implies no stenosis and 100 implies a fully occluded vessel.

  • QCA reference standard : segment-based analysis, discrete degree

    The lumen diameter reduction of each of the 17-AHA segments is the maximum of the lumen diameter reduction of his lesions.
    The metrics that will be used to evaluate the performances of the quantification algorithms are the absolute average difference and the RMS difference over the estimated and true lumen diameter reduction, overall patients.

  • CTA consensus reference standard : lesion-based analysis, 5 grades categories
  • The stenoses considered here are the ones from the union of the stenoses detected by the participant and the ones of the CTA reference standard.
    Each stenosis detected by the participants will be assigned to the corresponding stenosis grade (mild, moderate, severe, occluded).
    The metric that will be used to evaluate the performances of the quantification algorithms are the weighted Kappa value, overall patients.

3. Lumen segmentation

Participants provide a 3D geometrical model of the coronaries. A distinction will be made between healthy and diseased vessel segments. The accuracy of the segmentation will be assessed by comparing the 3D model with a segmentations obtained by averaging the three manual observers, using the following measures:

  • Accuracy measures:
    • Root mean squared (RMS) distance between manual and automatic 3D surfaces.
    • Hausdorff distance between manual and automatic 3D surfaces.
  • Overlap measure:
    Given the 3D geometrical model of the coronaries, the volume overlap (3D Dice coefficient) of the segmentation as compared to the reference standard will be computed.
The manual segmentations will be made for all stenotic lesions and for five other (healthy) positions in the image. The participants are blinded for the specific sections that will be evaluated, therefore participants should provide one geometric model of the complete coronary tree.

Ranking

The teams will be ranked based on their algorithm performance. The evaluations measures will be ranked and each method will be assigned a number ranging from 1 (best) to N (worst), where N is the number of evaluated algorithms. The average rank (over all the performance measures) will determine the final ranking; a rank of 1.0 means that an algorithm performs best for all measures and all subjects/vessels.
There will be three different rankings:

  • one ranking for detection of lesions
  • one ranking for detection and quantification of lesions
  • one ranking for lumen segmentation

Detection of lesions ranking
The detection algorithms will be ranked based on the overall sensitivity and PPV (i.e., no distinction between degree, type and location of the stenosis) achieved as compared to CTA (per-lesion) and QCA (per-segment) reference standard, which results in 2x2=4 ranks per method. For each method, an average detection rank will be obtained by averaging the 4 different ranks. The method with the lowest average rank will be ranked first.

Table_DetectionRankingExample

Table 4 –Example of detection ranking results.

Detection/Quantification of lesions ranking
The quantification algorithms will be ranked on their average absolute difference and RMS difference of the degree of stenosis (as compared to QCA reference standard, per-segment) and on their weighted kappa coefficient (as compared to CTA consensus reading, per-lesion), which results in another 2+1=3 ranks per method. For each method, an average quantification rank will be obtained by averaging the 3 different ranks. The method with the lowest average rank will be ranked first.

Table_QuantificationRankingExample

Table 5 –Example of quantification ranking results.

Lumen segmentation ranking
The segmentation algorithms will be ranked based on the overlap, the average absolute radius difference, and the average Hausdorff distance (average over the 3 observer's references), while making distinction between segments having non-significant and significant stenoses. This leads to 3x2=6 ranks.
For each method, a per-patient average will be obtained by averaging the 6 ranks. Each method will be assigned the average of these numbers and these averages will be ranked to get the final method ranking.

Table_LumenSegmentationRankingExample

Table 6 –Example of lumen segmentation ranking results.