Image Processing Pipeline

Sends

Receives

tile.raw

Introduction
Functionality
Nodes
Message Types
Python API
Config File

Introduction

The image processing pipeline is primarily written in C++ with some Python bindings and nodes. It utilizes the Intel Thread Building Blocks library for parallelism, and OpenCV for image processing. The overall data-flow is shown in the diagram below.

Functionality

Receive Tile Filepath

Uses: SubscribeRawTileNode

This node receives the metadata and the path to a raw tile to process from the message broker on the tile.raw topic, and passes the filename to the load tile node.

Load Tile

Uses: IMReadNode

This node receives the metadata and filename, and uses OpenCV to load a tile into CPU memory from SSD based storage.

Transfer to GPU

Uses: ToGPUNode

After the tile is loaded into CPU memory, it is transferred into GPU memory for more efficient processing.

Flip

Uses: FlipNodeGPU

On the GPU, the tile is horizontally flipped.

Flat-Field Correction

Uses:: FlatFieldNodeGPU

The flat-field correction is applied to the tile using brightfield and darkfield images stored on the SSD. The brightfield and darkfield file modification times are checked during each execution to check if they need to be reloaded.

CLAHE

Uses: CLAHENodeGPU

The Contrast Limited Adaptive Histogram Equalization algorithm is applied to the tile. The resulting image is sent to multiple functions for further analysis, including an FFT for calculating the focus score, calculating the minimum, maximum, and mean pixel values, histogram, clone for UI display, and further processing, starting with the lens correction.

Lens Correction

Uses: LensCorrectionNodeGPU

To remove distortion, a lens correction is applied. This is performed using two images stored on the SSD and reloaded whenever the file modification times are newer than the images stored in memory. The output is send to the tile matcher and a to cpu node for saving to disk.

Transfer to CPU Memory

Uses: FromGPUNode

In order to save the tile, it must first be transferred to CPU memory.

Save Tile

Uses: IMWriteNode

The processed tile can now be saved to disk.

Send Tile Filepath

Uses: PublishFileNode

The filepath of the processed tile can now be sent via the broker to other services on the tile.processed topic.

Tile Matcher

Uses: MatcherNodeGPU

This node performs template matching of each tile with its neighbors. It outputs both a minimap image (a down-sampled overview of the entire montage) and the metadata from the matching itself.

Send Transform

Uses: PublishTransformNode

This node sends the transform of the matched tile along with other metadata to the tile.transform topic via the broker.

Minimap to CPU Memory

Uses: FromGPUNode

Before it can be saved, the minimap must be moved to CPU memory.

Save Minimap

Uses: IMWriteNode

The minimap can is saved to disk.

Send Minimap

Uses: PublishFileNode

The path to the minimap can now be sent via the broker on the tile.minimap topic.

Min, Max, Mean

Uses: MinMaxMeanNodeGPU

This node calculates the minimum, maximum, and mean pixel values.

Send Min, Max, Mean

Uses: PublishMinMaxMeanNode

This node sends the minimum, maximum, and mean pixel values to other services via the broker on the tile.statistics.min_max_mean topic.

FFT

Uses: FFTNodeGPU

This node crops out the center of the tile and computes the Fast Fourier Transform.

Focus Score

Uses: FocusNodeGPU

This node uses the FFT data to create a metric for the quality of the focus.

Send Focus Score

Uses: PublishFocusNode

This node sends the focus score to other services via the broker on the tile.statistics.focus topic.

Clone

Uses: CloneNodeGPU

This node copies the image data, since the following node modifies it in-place.

Down-sample

Uses: ResizeNodeGPU

This node down-samples the partially processed tile for UI display.

Down-sampled Image to CPU Memory

Uses: FromGPUNode

The down-sampled image must be moved to CPU memory before it can be saved to disk.

Save JPEG

Uses: IMWriteNode

The down-sampled image should be saved using JPEG compression for the UI.

Send JPEG Filepath

Uses: PublishFileNode

The path to the JPEG compressed image can be published via the broker on the tile.jpeg topic.

Histogram Transfer to CPU Memory

Uses: FromGPUNode

Transfer the image to CPU memory before computing the histogram.

Calculate Histogram

Uses: HistogramNode

A small histogram image is created using this node.

Save Histogram

Uses: IMWriteNode

This node saves the histogram to SSD storage.

Send Histogram

Uses: PublishFileNode

This node sends the filepath to the histogram to other services using the broker on the tile.statistics.histogram topic.

Nodes

Each node can be provided keyword arguments as described below. Many non-input nodes accept a concurrency argument controlling how many copies of that node may be crated. Two predefined values are available in the TEM_graph.consts module, serial, and unlimited. All nodes should also allow their name to be set using the name keyword argument.

IMReadNode

Language: C++
Input: str_message
Output: mat_message
Arguments: name (string) = IMReadNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; flags (int) = IMREAD_GRAYSCALE: OpenCV flags for configuring the underlying OpenCV IMRead function.

This node uses OpenCV to read an image from disk. The path to the image must be provided in the input message and the output message contains the image data. Two flags are provided in the TEM_graph.consts module, IMREAD_GRAYSCALE, and IMREAD_ANYDEPTH.

IMWriteNode

Language: C++
Input: mat_message
Output: str_message
Arguments: name (string) = IMWriteNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; output_dir (string) = .: The directory to save images to.; extension (string) = .tiff: The file extension to use when saving images.; params (int vector) = [IMWRITE_TIFF_COMPRESSION, 1]: The parameters to pass to the underlying OpenCV IMWrite function.

This node uses OpenCV to write an image to disk. The input message contains the image data, and the output message will contain the path where the image was written. The image will be saved in the output_dir directory with the tile ID from the metadata as the filename and with the extension supplied in the arguments. The params are passed to the underlying OpenCV IMWrite() function to control different options, with the defaults here saving an uncompressed TIFF image. One parameter is defined in TEM_graph.consts, which is IMWRITE_TIFF_COMPRESSION.

ToGPUNode

Language: C++
Input: mat_message
Output: gpu_mat_message
Arguments: name (string) = ToGPUNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.

This node transfers OpenCV image data from CPU memory to GPU memory for further processing.

FromGPUNode

Language: C++
Input: gpu_mat_message
Output: mat_message
Arguments: name (string) = ToGPUNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.

This node transfers OpenCV image data from GPU memory to CPU memory for saving etc.

CloneNode

Language: C++
Input: mat_message
Output: mat_message
Arguments: name (string) = CloneNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.

This node copies the image data to a new location, so that nodes that modify it in-place do not accidentally cause other nodes to receive the modified data.

CloneNodeGPU

Language: C++
Input: gpu_mat_message
Output: gpu_mat_message
Arguments: name (string) = CloneNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.

This node has identical functionality to the CloneNode, but uses GPU processing.

FlipNode

Language: C++
Input: mat_message
Output: mat_message
Arguments: name (string) = FlipNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; axis (int) = horizontal: The axis to flip the image along.

This node flips an image in-place along a provided axis. The TEM_graph.consts module contains three values, horizontal, vertical, and both.

FlipNodeGPU

Language: C++
Input: gpu_mat_message
Output: gpu_mat_message
Arguments: name (string) = FlipNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; axis (int) = horizontal: The axis to flip the image along.

This node has identical functionality to the FlipNode, but uses GPU processing.

ResizeNode

Language: C++
Input: mat_message
Output: mat_message
Arguments: name (string) = FlipNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; scale (double) = 0.5: The scaling to apply to the image.; interpolation (int) = INTER_AREA: The interpolation method to use.

This node resizes the incoming image in-place. The TEM_graph.consts module contains interpolation constants, INTER_NEAREST, INTER_LINEAR, INTER_CUBIC, INTER_AREA, INTER_LANCZOS4, INTER_LINEAR_EXACT, and INTER_NEAREST_EXACT.

ResizeNodeGPU

Language: C++
Input: gpu_mat_message
Output: gpu_mat_message
Arguments: name (string) = FlipNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; scale (double) = 0.5: The scaling to apply to the image.; interpolation (int) = INTER_LINEAR: The interpolation method to use.

This node has identical functionality to the DownsampleNode, but uses GPU processing.

FlatFieldNode

Language: C++
Input: mat_message
Output: mat_message
Arguments: name (string) = FlatFieldNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; brightfield_path (string) = brightfield.tiff: The path to a brightfield image.; darkfield_path (string) = darkfield.tiff: The path to a darkfield image.

This node performs flatfield corrections using the supplied brightfield and darkfield images. The incoming image is modified in-place. If the brightfield or darkfield images have a modification date newer then when they were loaded, they are automatically reloaded.

FlatFieldNodeGPU

Language: C++
Input: gpu_mat_message
Output: gpu_mat_message
Arguments: name (string) = FlatFieldNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; brightfield_path (string) = brightfield.tiff: The path to a brightfield image.; darkfield_path (string) = darkfield.tiff: The path to a darkfield image.

This node has identical functionality to the FlatFieldNode, but uses the GPU for processing.

CLAHENode

Language: C++
Input: mat_message
Output: mat_message
Arguments: name (string) = CLAHENode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; clipLimit (double) = 2: The threshold for contrast limiting.; tileGridSize (int) = 16: The number of rows and columns the image will be split into.

This node performs Contrast Limited Adaptive Histogram Equalization in-place on incoming tiles.

CLAHENodeGPU

Language: C++
Input: gpu_mat_message
Output: gpu_mat_message
Arguments: name (string) = CLAHENode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; clipLimit (double) = 2: The threshold for contrast limiting.; tileGridSize (int) = 16: The number of rows and columns the image will be split into.

This node has identical functionality to the CLAHENode, but uses the GPU for processing.

MatcherNode

Language: C++
Input: mat_message
Output: mat_message
Arguments: name (string) = MatcherNode: The name for the node.

This node must be run single threaded. It receives incoming image tiles and fits them to the montage matching the montage_id in the metadata. It outputs a minimap (a down-sampled) image of the entire montage, along with the fit metadata. If the montage_id is a zero length string, the tile is a preview tile and no matching should be performed. In this case, a zero size image should be output.

MatcherNodeGPU

Language: C++
Input: gpu_mat_message
Output: gpu_mat_message
Arguments: name (string) = MatcherNode: The name for the node.

This node has identical functionality to the MatcherNode, but uses the GPU for processing.

MinMaxMeanNode

Language: C++
Input: mat_message
Output: int_vec_message
Arguments: name (string) = MinMaxMeanNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.

This node outputs a vector of the minimum, maximum, and mean pixel values of the input image.

MinMaxMeanNodeGPU

Language: C++
Input: gpu_mat_message
Output: int_vec_message
Arguments: name (string) = MinMaxMeanNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.

This node has identical functionality to the MinMaxMeanNode, but uses the GPU for processing.

FFTNode

Language: C++
Input: mat_message
Output: mat_message
Arguments: name (string) = FFTNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; dftSize (int) = 256: The pixel width and height of a square to crop out of the center of the image.

The FFT node calculates the magnitude spectrum of a square section of the incoming tile image that is dftSize wide and tall.

FFTNodeGPU

Language: C++
Input: gpu_mat_message
Output: gpu_mat_message
Arguments: name (string) = FFTNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; dftSize (int) = 256: The pixel width and height of a square to crop out of the center of the image.

This node has identical functionality to the FFTNode, but uses the GPU for processing.

FocusNode

Language: C++
Input: mat_message
Output: float_message
Arguments: name (string) = FFTNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; dftSize (int) = 256: The width and height of the incoming FFT magnitude spectrum.; frequencyStart (int) = 50: The lower bound of spatial frequencies to evaluate for focus.; frequencyEnd (int) = 251: The upper bound of spatial frequencies to evaluate for focus.

This node calculates a focus score based using a FFT magnitude spectrum.

FocusNodeGPU

Language: C++
Input: gpu_mat_message
Output: float_message
Arguments: name (string) = FFTNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; dftSize (int) = 256: The width and height of the incoming FFT magnitude spectrum.; frequencyStart (int) = 50: The lower bound of spatial frequencies to evaluate for focus.; frequencyEnd (int) = 251: The upper bound of spatial frequencies to evaluate for focus.

This node has identical functionality to the FocusNode, but uses the GPU for processing.

HistogramNode

Language: C++
Input: mat_message
Output: mat_message
Arguments: name (string) = HistogramNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; bins (int) = 256: The number of bins to use when calculating the histogram.; width (int) = 512: The width of the resulting histogram image.; height (int) = 200: The height of the resulting histogram image.

This node creates a histogram plot as an image.

SubscribeRawTileNode

Language: Python
Output: str_message
Arguments: host (string) = 127.0.0.1: The host of the message broker.; port (int) = 61616: The port to use to connect to the message broker.; username (string) = None: The username to use when connecting to the message broker.; password (string) = None: The password to use when connecting to the message broker.; wait_interval (float) = 0.1: The amount of time to wait in seconds when a new file is not availble to process.

PublishFileNode

Language: Python
Input: str_message
Output: str_message
Arguments: service (string) = None: The service name to provide to the broker. Must not be None.; host (string) = 127.0.0.1: The host of the message broker.; port (int) = 61616: The port to use to connect to the message broker.; username (string) = None: The username to use when connecting to the message broker.; password (string) = None: The password to use when connecting to the message broker.; name (string) = PublishFileNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.; topic (string) = None: The topic to publish the file path on. Must not be None.

This node publishes the path to a file (usually the output of am IMWriteNode) using the TEM_comms library. The node outputs the input data without modification.

PublishFocusNode

Language: Python
Input: float_message
Output: float_message
Arguments: service (string) = None: The service name to provide to the broker. Must not be None.; host (string) = 127.0.0.1: The host of the message broker.; port (int) = 61616: The port to use to connect to the message broker.; username (string) = None: The username to use when connecting to the message broker.; password (string) = None: The password to use when connecting to the message broker.; name (string) = PublishFocusNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.

This node publishes the focus score on the tile.statistics.focus topic using the TEM_comms library. The node outputs the input data without modification.

PublishMinMaxMeanNode

Language: Python
Input: int_vec_message
Output: int_vec_message
Arguments: service (string) = None: The service name to provide to the broker. Must not be None.; host (string) = 127.0.0.1: The host of the message broker.; port (int) = 61616: The port to use to connect to the message broker.; username (string) = None: The username to use when connecting to the message broker.; password (string) = None: The password to use when connecting to the message broker.; name (string) = PublishMinMaxMeanNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.

This node publishes the minimum, maximum, and mean pixel values on the tile.statistics.min_max_mean topic using the TEM_comms library. The node outputs the input data without modification.

PublishTransformNode

Language: Python
Input: mat_message
Output: mat_message
Arguments: service (string) = None: The service name to provide to the broker. Must not be None.; host (string) = 127.0.0.1: The host of the message broker.; port (int) = 61616: The port to use to connect to the message broker.; username (string) = None: The username to use when connecting to the message broker.; password (string) = None: The password to use when connecting to the message broker.; name (string) = PublishTransformNode: The name for the node.; concurrency (int) = unlimited: The maximum number of copies of the node to run.

This node publishes the transform and other matching data from the message metadata on the tile.transform topic using the TEM_comms library. The node outputs the input data without modification.

Message Types

All data types are defined as having some metadata, along with data. The common metadata includes the following keys:

tile_id (string): The unique ID of the tile
montage_id (string): The unique ID of the montage
row (int): The row in the montage where the tile is located
column (int): The column in the montage where the tile if located
overlap (int): The number of pixels of overlap between tiles

str_message

This message type consists of metadata and a string. This is often used for storing file paths. It has the following keys:

metadata (Metadata): The message metadata
data (string): The string data

mat_message

This message type consists of metadata and OpenCV image data. It has the following keys:

metadata (Metadata): The message metadata
data (cv::Mat): Image data

gpu_mat_message

This message type consists of metadata and OpenCV image data residing on the GPU. It has the following keys:

metadata (Metadata): The message metadata
data (cv::cuda::GpuMat): Image data on the GPU

float_message

This message type consists of metadata and a single floating point value. It has the following keys:

metadata (Metadata): The message metadata
data (float): A floating point value

int_vec_message

This message type consists of metadata and a vector of integers. It has the following keys:

metadata (Metadata): The message metadata
data (int vector): A vector of integers

Python API

Pipelines can be defined and run using a simple Python API. The library for doing so can be imported using,

import TEM_graph

A graph can then be created using,

graph = TEM_graph.graph()

Nodes are acessible in the TEM_graph.nodes submodule. For example, a CLAHE node could be created using,

CLAHE_node = TEM_graph.nodes.CLAHENode(
    clipLimit=3,
)

Alternatively, if this operation should be performed on a GPU, the GPU version of the node could be crated,

CLAHE_node = TEM_graph.nodes.CLAHENodeGPU(
    clipLimit=3,
)

Please note:

before this node could be used, the image data would have to be moved to GPU memory using the ToGPUNode.

Once the desired nodes are created, they can be connected together using the TEM_graph.make_edge() function. For example,

TEM_graph.make_edge(to_GPU_node, CLAHE_node)

would connect the output of the to_GPU_node to the input of the CLAHE_node. Once the graph is assembled, input nodes must be actived using the .activate() method of the input node.

After the graph is running, the graph.wait_for_all() function will wait for processing to be complete. Alternatively, the graph.cancel() function can be used to immediately halt data processing.

Config File

To further streamline pipeline creation, a YAML configuration file format was created. An example the following pipeline could be crated using a configuration file.

The configuration file to create this pipeline is:

nodes: # All nodes must be defined under this key
  input: # This is the name of the first node
    type: SubscribeRawTileNode # This is the name of the class that instantiates the node
    to: read # The name of the node which should recieve the output of this node
  read:
    type: IMReadNode
    to: # Multiple node names can also be specified
      - histogram
      - min_max_mean
  histogram:
    type: HistogramNode
    to: save
  save:
    type: IMWriteNode
    args: # Keyword arguments can be defined that will be provided to the node when it is initialized
      output_dir: /tmp/
    to: send_histogram
  send_histogram:
    type: PublishFileNode
    args:
      service: publish_histogram
      topic: tile.statistics.histogram
    # The 'to' key is not required
  min_max_mean:
    type: MinMaxMeanNode
    to: send_min_max_mean
  send_min_max_mean:
    type: PublishMinMaxMeanNode
    args:
      service: publish_min_max_mean

Once a pipeline configuration file has been written, a command line utility can be used to run the pipeline:

TEM_graph pipeline.yaml