Seeing Through the Noise: AI Improves Tumor Segmentation with Dynamic Imaging

Author: Denis Avetisyan


A new framework leverages the power of contrast-enhanced imaging and representation disentanglement to deliver more accurate and robust tumor analysis, even with incomplete data.

The research introduces TARDis, a segmentation framework designed to overcome the limitations of conventional U-Net architectures and existing incomplete modality methods by processing a flexible number of input modalities ($NN$) through a shared encoder, subsequently disentangling the information into modal-agnostic anatomical representations and modal-specific temporal features before decoding-a strategy intended to improve robustness and adaptability beyond systems reliant on fixed modality inputs and random masking techniques.
The research introduces TARDis, a segmentation framework designed to overcome the limitations of conventional U-Net architectures and existing incomplete modality methods by processing a flexible number of input modalities ($NN$) through a shared encoder, subsequently disentangling the information into modal-agnostic anatomical representations and modal-specific temporal features before decoding-a strategy intended to improve robustness and adaptability beyond systems reliant on fixed modality inputs and random masking techniques.

TARDis utilizes a conditional variational autoencoder to separate static anatomical features from dynamic time-attenuation curves, enhancing multi-modal tumor segmentation and classification.

Accurate tumor segmentation and diagnosis via multi-phase contrast-enhanced CT is often hindered by clinical limitations leading to incomplete data acquisition. To address this, we introduce TARDis-Time Attenuated Representation Disentanglement for Incomplete Multi-Modal Tumor Segmentation and Classification-a novel physics-aware framework that models missing imaging phases as points on a continuous time-attenuation curve. By explicitly disentangling static anatomical features from dynamic perfusion characteristics, TARDis effectively hallucinates missing data and achieves robust performance even with significant data sparsity. Could this approach pave the way for reduced radiation exposure in cancer screening while maintaining diagnostic precision?


The Inevitable Gaps: Why Complete Data is a Myth

Precise tumor segmentation using computed tomography (CT) scans is fundamental to both accurate diagnoses and effective treatment planning, yet the process is frequently compromised by incomplete data acquisition. Clinical realities – such as limitations in scan time, patient discomfort, or the need to minimize radiation exposure – often result in missing or sparsely sampled CT phases. This incomplete data hinders the ability to fully characterize tumor morphology, size, and internal structure, all of which are vital for delineating the tumor boundary with the necessary precision. Consequently, clinicians face challenges in making informed decisions regarding treatment strategies, potentially leading to suboptimal outcomes; the accuracy of radiation therapy planning, for instance, is directly dependent on a reliable tumor delineation derived from complete CT data.

Clinical realities often limit the acquisition of complete multi-phase computed tomography (CT) scans, presenting a considerable obstacle to accurate tumor segmentation. Traditional image analysis techniques, reliant on consistent data across all phases, falter when faced with missing information, leading to imprecise boundary delineations of cancerous tissues. This incompleteness introduces uncertainty into volumetric measurements and hinders precise assessment of tumor characteristics, such as growth rate or response to therapy. Consequently, clinicians may base critical treatment decisions – including radiation planning and surgical approaches – on potentially flawed segmentations, raising concerns about suboptimal patient care and necessitating the development of robust methods capable of handling incomplete CT data.

Multi-phase computed tomography (CT) captures the changing enhancement patterns of tissues over time, revealing crucial dynamic information for accurate tumor segmentation. This temporal dimension is particularly important for differentiating cancerous lesions from surrounding healthy tissue, as tumors often exhibit distinct contrast uptake and washout characteristics. However, incomplete data acquisition – a common occurrence due to clinical time constraints or patient factors – severely compromises the ability to leverage these dynamic features. The resulting loss of temporal context diminishes the effectiveness of segmentation algorithms, leading to inaccurate tumor boundaries and potentially impacting critical treatment planning decisions. Consequently, developing robust methods capable of effectively handling and reconstructing missing dynamic information remains a significant challenge in the field of medical image analysis.

Analysis of CT datasets reveals varying combinations of imaging modalities-non-contrast (N), arterial (A), portal venous (V), and delayed (D)-across the ChangHai and KiTS19 datasets.
Analysis of CT datasets reveals varying combinations of imaging modalities-non-contrast (N), arterial (A), portal venous (V), and delayed (D)-across the ChangHai and KiTS19 datasets.

TARDis: Accepting the Inevitable, Rebuilding the Signal

TARDis addresses limitations in CT image segmentation caused by incomplete data acquisition by decomposing the observed signal into static and dynamic components. The static component represents the underlying anatomical structure, while the dynamic component captures the time-varying contrast enhancement due to iodine uptake. By explicitly modeling and separating these components, TARDis reduces reliance on complete phase coverage; segmentation can be performed even with missing or incomplete data phases because the static anatomical representation remains robust. This disentanglement allows the system to effectively infer missing information and maintain accurate segmentation boundaries, improving performance in scenarios with sparse or irregular sampling patterns.

TARDis addresses the challenges posed by incomplete CT data acquisition by explicitly modeling the time-attenuation curve of contrast agents. This curve, representing the change in X-ray attenuation over time post-injection, is separated from the underlying baseline anatomy. By isolating and modeling this temporal signal, TARDis reduces reliance on complete phase coverage; segmentation performance is maintained even with missing or incomplete phases because the system can infer the expected attenuation values based on the modeled curve, rather than requiring direct observation of all time points. This decoupling of temporal dynamics from static anatomy improves robustness and reduces sensitivity to data gaps.

The TARDis architecture employs a shared encoder and embedding dictionary to maximize data efficiency in anatomical feature extraction and reconstruction. The shared encoder reduces the number of trainable parameters by processing input CT phases with a single feature extraction pathway. The embedding dictionary, a learned lookup table, then maps these encoded features to a compact anatomical representation. This allows the model to generalize from limited training data and effectively reconstruct complete anatomical features even with incomplete or sparsely sampled time series, minimizing the need for extensive datasets and computational resources. The learned embedding space facilitates the reconstruction of missing phases by interpolating within the anatomical feature space, rather than directly from raw image data.

TARDis leverages a shared encoder to process input volumes, combining a modal-agnostic path querying an embedding dictionary with a modal-specific path utilizing a Conditional Variational Autoencoder to regress relative time and reconstruct dynamic features, ultimately decoding these aggregated representations with a U-Net to generate comprehensive representations.
TARDis leverages a shared encoder to process input volumes, combining a modal-agnostic path querying an embedding dictionary with a modal-specific path utilizing a Conditional Variational Autoencoder to regress relative time and reconstruct dynamic features, ultimately decoding these aggregated representations with a U-Net to generate comprehensive representations.

Deconstructing the Signal: A Generative Approach

TARDis employs a Conditional Variational Autoencoder (CVAE) for the reconstruction of dynamic features within medical imaging data. This CVAE architecture is specifically designed to generate representations of temporal changes while being conditioned on both anatomical context and time. The training process leverages KL Divergence minimization, a technique that encourages the learned latent space distribution to remain close to a prior distribution, thereby promoting generalization and stable reconstructions. By conditioning the generative process on anatomy and time, the CVAE learns to predict how features evolve over time based on the underlying anatomical structure, allowing for the synthesis of plausible dynamic patterns.

Disentanglement Loss within the TARDis framework operates by minimizing the correlation between static anatomical features and dynamic, time-varying features during the reconstruction process. This is achieved through a specifically designed loss function that penalizes the presence of static information within the dynamic latent space and vice versa. By enforcing this separation, the model learns to represent anatomical structure and temporal changes as independent components, leading to more accurate reconstruction of dynamic features at specific time points. This approach improves the model’s ability to generalize to unseen data and reduces artifacts that might arise from confounding static and dynamic information during the reconstruction process.

Training of the TARDis framework employs both Cross-Entropy Loss and the Dice Similarity Coefficient to maximize segmentation performance. Cross-Entropy Loss functions as a pixel-wise classification loss, while the Dice Similarity Coefficient directly optimizes for overlap between predicted segmentations and ground truth annotations. In single-modality evaluation scenarios, this combined loss function has yielded a Segmentation Dice score of up to 0.86, indicating a high degree of accuracy in identifying and delineating anatomical structures. The Dice score, calculated as $2|X \cap Y| / (|X| + |Y|)$, quantifies the similarity between the predicted segmentation ($X$) and the ground truth ($Y$).

t-SNE plots demonstrate that disentangling features through separate reconstruction branches and the application of ranking and deformation loss constraints effectively separates dynamic and static features, and further clusters multi-modal dynamic features.
t-SNE plots demonstrate that disentangling features through separate reconstruction branches and the application of ranking and deformation loss constraints effectively separates dynamic and static features, and further clusters multi-modal dynamic features.

Beyond the Horizon: Embracing Multi-Modality and the Inevitable Next Framework

The architecture of TARDis is intentionally designed not as a rigid, standalone system, but as a flexible foundation for incorporating advancements in multi-modal learning. This allows seamless integration with state-of-the-art techniques such as RFNet, known for its efficient feature extraction, and the transformer-based mmFormer, which excels at capturing long-range dependencies between data modalities. Furthermore, TARDis readily accommodates more complex models like M3AE, facilitating deeper multi-modal feature learning, and M2FTrans, which leverages transformers for enhanced data fusion. By providing a versatile framework, TARDis encourages the exploration and implementation of novel approaches, ensuring its continued relevance as the field of multi-modal analysis evolves and new techniques emerge – until, of course, the next revolutionary framework arrives.

The incorporation of Magnetic Resonance Imaging (MRI) data represents a significant advancement in the precision of image segmentation, bolstering the robustness of the TARDis framework. By leveraging the detailed anatomical information provided by MRI scans, the system achieves a more nuanced understanding of complex structures, leading to improved delineation of boundaries and enhanced accuracy in identifying regions of interest. This integration proves particularly valuable in medical imaging applications, where precise segmentation is crucial for diagnosis, treatment planning, and monitoring disease progression. The enhanced accuracy facilitated by MRI data ultimately translates to more reliable and clinically relevant results, offering the potential for improved patient outcomes and more efficient healthcare delivery.

A significant advancement offered by TARDis lies in its capacity to generate reliable results even with incomplete imaging data, a capability poised to revolutionize clinical workflows. By requiring less comprehensive scans, patient comfort is markedly improved and scan times are substantially reduced – critical factors in broader accessibility and efficient healthcare delivery. Validation across multiple datasets demonstrates the efficacy of this approach; TARDis achieves an average Screening AUC of 0.979 on the Changhai dataset, indicating a high degree of accuracy in initial assessments. Furthermore, the system exhibits robust segmentation performance, reaching a Dice score of 0.825 on the C4KC-KiTS dataset and 0.860 for Whole Tumor Segmentation on the challenging BraTS18 dataset, solidifying its potential as a valuable diagnostic tool – a tool, naturally, that will one day be superseded.

Performance metrics on the BraTS18 dataset reveal that different combinations of modalities, as detailed in Table 4, significantly impact overall results.
Performance metrics on the BraTS18 dataset reveal that different combinations of modalities, as detailed in Table 4, significantly impact overall results.

The pursuit of elegant solutions in medical imaging invariably meets the harsh realities of clinical practice. This TARDis framework, with its attempt to disentangle static anatomy from dynamic contrast enhancement, feels…familiar. It’s a clever approach to handling incomplete multi-modal data, aiming for robustness where simpler methods falter. As David Marr observed, “Representation is the key.” This paper is, at its core, another attempt to represent complex biological realities in a way a machine can understand, hoping to avoid the inevitable pitfalls when production data deviates from idealized conditions. One can’t help but suspect that even successful disentanglement will eventually reveal new, unforeseen sources of error. Everything new is just the old thing with worse docs.

What’s Next?

The TARDis framework, with its attempt to separate the unchanging from the fleeting in tumor imaging, represents a familiar pattern. Elegant disentanglement is always appealing, until production data arrives. The promise of robustness against missing modalities feels less revolutionary upon encountering the sheer variety of acquisition failures, reconstruction artifacts, and operator inconsistencies that characterize real-world clinical scans. One suspects the ‘time-attenuation curve’ will prove more sensitive to scanner calibration than to subtle tumor characteristics.

Future work will inevitably focus on expanding the modalities integrated into this architecture-more data streams mean more opportunities for misalignment and more parameters to tune. The current emphasis on contrast enhancement is understandable, but ignores the fact that many clinical protocols minimize contrast agent use. A truly robust system will need to perform well with minimal input, not just when everything is optimal.

It is reasonable to anticipate that the disentangled representations, while theoretically appealing, will ultimately serve as expensive ways to complicate existing feature engineering pipelines. If code looks perfect, no one has deployed it yet. The true test of TARDis, and frameworks like it, will not be its performance on curated datasets, but its ability to survive the inevitable onslaught of edge cases and unforeseen data pathologies.


Original article: https://arxiv.org/pdf/2512.04576.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-12-07 18:41