poss_dataset_ids = dataset_info
.map(d => d.dataset_id)
.filter(d => results.map(r => r.dataset_id).includes(d))
poss_method_ids = method_info
.map(d => d.method_id)
.filter(d => results.map(r => r.method_id).includes(d))
poss_metric_ids = metric_info
.map(d => d.metric_id)
.filter(d => results.map(r => Object.keys(r.scaled_scores)).flat().includes(d))
Multimodal Data Integration
Alignment of cellular profiles from two different modalities
3 datasets · 5 methods · 2 control methods · 2 metrics
Info
Task info Method info Metric info Dataset info Results
Cellular function is regulated by the complex interplay of different types of biological molecules (DNA, RNA, proteins, etc.), which determine the state of a cell. Several recently described technologies allow for simultaneous measurement of different aspects of cellular state. For example, sci-CAR jointly profiles RNA expression and chromatin accessibility on the same cell and CITE-seq measures surface protein abundance and RNA expression from each cell. These technologies enable us to better understand cellular function, however datasets are still rare and there are tradeoffs that these measurements make for to profile multiple modalities.
Joint methods can be more expensive or lower throughput or more noisy than measuring a single modality at a time. Therefore it is useful to develop methods that are capable of integrating measurements of the same biological system but obtained using different technologies on different cells.
Here the goal is to learn a latent space where cells profiled by different technologies in different modalities are matched if they have the same state. We use jointly profiled data as ground truth so that we can evaluate when the observations from the same cell acquired using different modalities are similar. A perfect result has each of the paired observations sharing the same coordinates in the latent space.
Summary
Display settings
Filter datasets
Filter methods
Filter metrics
Results
Results table of the scores per method, dataset and metric (after scaling). Use the filters to make a custom subselection of methods and datasets. The “Overall mean” dataset is the mean value across all datasets.
Dataset info
Show
CITE-seq Cord Blood Mononuclear Cells
8k cord blood mononuclear cells sequenced by CITEseq, a multimodal addition to the 10x scRNA-seq platform that allows simultaneous measurement of RNA and protein (Stoeckius et al. 2017).
sciCAR Cell Lines
5k cells from a time-series of dexamethasone treatment sequenced by sci-CAR, a combinatorial indexing-based co-assay that jointly profiles chromatin accessibility and mRNA (Cao et al. 2018).
sciCAR Mouse Kidney
11k cells from adult mouse kidney sequenced by sci-CAR, a combinatorial indexing-based co-assay that jointly profiles chromatin accessibility and mRNA (Cao et al. 2018).
Method info
Show
Harmonic Alignment (log scran)
Repository · Source Code · Container · v1.0.0
Harmonic alignment embeds cellular data from each modality into a common space by computing a mapping between the 100-dimensional diffusion maps of each modality. This mapping is computed by computing an isometric transformation of the eigenmaps, and concatenating the resulting diffusion maps together into a joint 200-dimensional space. This joint diffusion map space is used as output for the task (Stanley et al. 2020)
Harmonic Alignment (sqrt CP10k)
Repository · Source Code · Container · v1.0.0
Harmonic alignment embeds cellular data from each modality into a common space by computing a mapping between the 100-dimensional diffusion maps of each modality. This mapping is computed by computing an isometric transformation of the eigenmaps, and concatenating the resulting diffusion maps together into a joint 200-dimensional space. This joint diffusion map space is used as output for the task (Stanley et al. 2020)
Mutual Nearest Neighbors (log CP10k)
Repository · Source Code · Container · v1.0.0
Mutual nearest neighbors (MNN) embeds cellular data from each modality into a common space by computing a mapping between modality-specific 100-dimensional SVD embeddings. The embeddings are integrated using the FastMNN version of the MNN algorithm, which generates an embedding of the second modality mapped to the SVD space of the first. This corrected joint SVD space is used as output for the task (Haghverdi et al. 2018)
Mutual Nearest Neighbors (log scran)
Repository · Source Code · Container · v1.0.0
Mutual nearest neighbors (MNN) embeds cellular data from each modality into a common space by computing a mapping between modality-specific 100-dimensional SVD embeddings. The embeddings are integrated using the FastMNN version of the MNN algorithm, which generates an embedding of the second modality mapped to the SVD space of the first. This corrected joint SVD space is used as output for the task (Haghverdi et al. 2018)
Procrustes superimposition
Repository · Source Code · Container · v1.0.0
Procrustes superimposition embeds cellular data from each modality into a common space by aligning the 100-dimensional SVD embeddings to one another by using an isomorphic transformation that minimizes the root mean squared distance between points. The unmodified SVD embedding and the transformed second modality are used as output for the task (Gower 1975)
Control method info
Show
Random Features
Repository · Source Code · Container · v1.0.0
20-dimensional SVD is computed on the first modality, and is then randomly permuted twice, once for use as the output for each modality, producing random features with no correlation between modalities (Open Problems for Single Cell Analysis Consortium 2022)
True Features
Repository · Source Code · Container · v1.0.0
20-dimensional SVD is computed on the first modality, and this same embedding is used as output for both modalities, producing perfectly aligned features from each modality (Open Problems for Single Cell Analysis Consortium 2022)
Metric info
Show
kNN Area Under the Curve
Let f(i) ∈ F be the scRNA-seq measurement of cell i, and g(i) ∈ G be the scATAC- seq measurement of cell i. kNN-AUC calculates the average percentage overlap of neighborhoods of f(i) in F with neighborhoods of g(i) in G. Higher is better (Stanley et al. 2020).
Mean squared error
Mean squared error (MSE) is the average distance between each pair of matched observations of the same cell in the learned latent space. Lower is better (Lance et al. 2022).
Quality control results
Show
✓ All checks succeeded!
Normalisation visualisation
Show
Authors
Cao, Junyue, Darren A. Cusanovich, Vijay Ramani, Delasa Aghamirzaie, Hannah A. Pliner, Andrew J. Hill, Riza M. Daza, et al. 2018. “Joint Profiling of Chromatin Accessibility and Gene Expression in Thousands of Single Cells.” Science 361 (6409): 1380–85. https://doi.org/10.1126/science.aau0730.
Gower, J. C. 1975. “Generalized Procrustes Analysis.” Psychometrika 40 (1): 33–51. https://doi.org/10.1007/bf02291478.
Haghverdi, Laleh, Aaron T L Lun, Michael D Morgan, and John C Marioni. 2018. “Batch Effects in Single-Cell RNA-Sequencing Data Are Corrected by Matching Mutual Nearest Neighbors.” Nature Biotechnology 36 (5): 421–27. https://doi.org/10.1038/nbt.4091.
Lance, Christopher, Malte D. Luecken, Daniel B. Burkhardt, Robrecht Cannoodt, Pia Rautenstrauch, Anna Laddach, Aidyn Ubingazhibov, et al. 2022. “Multimodal Single Cell Data Integration Challenge: Results and Lessons Learned.” bioRxiv. https://doi.org/10.1101/2022.04.11.487796.
Open Problems for Single Cell Analysis Consortium. 2022. “Open Problems.” https://openproblems.bio.
Stanley, Jay S., Scott Gigante, Guy Wolf, and Smita Krishnaswamy. 2020. “Harmonic Alignment.” In Proceedings of the 2020 SIAM International Conference on Data Mining, 316–24. Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611976236.36.
Stoeckius, Marlon, Christoph Hafemeister, William Stephenson, Brian Houck-Loomis, Pratip K Chattopadhyay, Harold Swerdlow, Rahul Satija, and Peter Smibert. 2017. “Simultaneous Epitope and Transcriptome Measurement in Single Cells.” Nature Methods 14 (9): 865–68. https://doi.org/10.1038/nmeth.4380.