Code

Participants will work with whole slide images (WSIs) in this challenge. WSIs are quite different than, for example, .jpg or .png images. First, WSIs are stored in a large pyramidal data structure and comprise about 100k x 100k pixels. This data structure contains different levels of image magnifications. For example, at level 0, the image is at full resolution. Each level higher in the pyramid, the image is usually downsampled by a factor of 2.

It is infeasible to annotate all pixels at high resolution due to the size of the image. Moreover, a WSI at full resolution is too large to be fed into a conventional convolutional neural network (CNN), considering regular GPU memory. Typically, an expert like a pathologist annotates sub-regions within the WSI. Subsequently, subregions of these annotations (i.e., patches) can be used to train a CNN to automate image analysis tasks like region segmentation or cell detection. Therefore, a popular approach is to train a CNN with subregions also known as 'patches' extracted from the WSIs.

Several Python libraries are available to extract patches from WSI's, like:

Within the context of this TIGER challenge, we have developed an opensource Python package specifically tailored to the types of (hybrid) annotations and tasks of this challenge. This package is named:

Wholeslidedata makes it easy to work with WSIs, extract patches, and extract annotations in shapely-based polygons and Numpy masks. Furthermore, the package contains a batch iterator capable of producing batches of data, including image (x) and label data (y), that can be directly fed into a CNN. Besides that, this package comes with various sampling strategies, preprocessing functionalities, and multi-core capabilities that significantly speed up batch/patch sampling. Given the hybrid nature of annotations provided in the TIGER training set (tissue compartments and cells), wholeslidedata facilitates data handling and model training in such a scenario since it was specifically designed to deal with this type of setting. This page provides examples of reading patches, setting a batch iterator, and links to writing segmentation results in multiresolution image format using wholeslidedata, ASAP, and openslide functionalities.

You can install the latest version of wholeslidedata with:
pip install git+https://github.com/DIAGNijmegen/pathology-whole-slide-data@main


Patch Reading

Wholeslidedata

With the wholeslidedata package you can extract patches from WSIs. But it also allows opening several annotation formats. Most interesting, it includes a batch iterator that can be used to extract patches of batches with corresponding labels for classification, segmentation, and detection. Extracting the batches can be done via several sampling strategies. Furthermore, it uses multiprocessing to speed up the process of creating the batches.

>>> from wholeslidedata.image.wholeslideimage import WholeSlideImage
>>> image = WholeSlideImage('path_to_image.tif')
>>> patch = image.get_patch(x, y, width, height, spacing)
# The wholeslidedata package can also open annotation files in several formats:
>>> from wholeslidedata.annotation.wholeslideannotation import WholeSlideAnnotation
>>> wsa = WholeSlideAnnotation('path_to_annotation.xml')
>>> annotations = wsa.select_annotations(x, y, width, height)

ASAP

The multiresolutionimageinterface Python package from ASAP can be used to extract patches from various vendors. Moreover, it can also open annotation files created with ASAP annotation software.  Furthermore, it includes a WSI writing capability.  Here below we show a very basic snippet that extracts a patch from a WSI.

>>> import multiresolutionimageinterface as mir
>>> reader = mir.multiresolutionimagereader()
>>> wsi = reader.open('path_to_image.tif')
>>> patch = wsi.getucharpatch(x,y,width,height,level)

Openslide

OpenSlide Python is a Python interface to the OpenSlide library. This package also allows for reading patches from WSIs and supports several formats. Here below is a code snippet that shows how to extract a patch.

>>> from openslide import openslide
>>> wsi = openslide('path_to_image.tif')
>>> patch = wsi.read_region((x, y), level,(width, heigth))


Batch Iterator

As mentioned, the wholeslidedata package comes with a BatchIterator that can be used to sample batches that can be fed into a CNN. It needs to be configured with a configuration file (.yml). Here below we will show the basic usage. Refer to the docs for more specifications.

>>> from wholeslidedata.iterators import create_batch_iterator
>>> training_iterator = create_batch_iterator(mode='training',
user_config='path_to_user_config.yml',
number_of_batches=10,
cpus=4)
>>> for x_batch, y_batch in tqdm(training_iterator):
pass

Please also check the following notebook tutorials special made for the tiger challenge:


Example algorithm

In TIGER, developed algorithms will be applied over the test set and are expected to fulfill three tasks, namely 1) tissue segmentation, 2) cell detection and 3) the calculation of a TILS score per WSI. Please also see the evaluation section and the submission requirements section for more details. To help participants to set up their inference pipeline, we have prepared a simple example algorithm that performs the three tasks required by the TIGER challenge. In the example, we apply a simple thresholding operation to mimic a segmentation step, as the main purpose of the example algorithm is just to show how to write segmentation masks, detection results, and a TILs score, in such a way that the evaluation process on grand-challenge can take these outputs and compute the performance. The example algorithm can be found here:


Baseline Method

Cyril de Kock has created the baseline method for this challenge. This method consists of the followings steps:
  • Segmentation of tumor, tumor-stroma, and rest tissue with the HookNet model
  • Based on the segmentation a tumor bulk is automatically generated
  • Inside of the tumor bulk, the tumor stroma is selected
  • Faster-RCNN is applied in the selected tumor stroma region inside of the tumor bulk
  • The ratio between the detections and the tumor-stroma is used to compute a TILS score
You can find the baseline method here.