CeNGEN – Open Problems in Single Cell Analysis

Info

openproblems_v1/cengen
Hammarlund et al. (2018)
778.17 MiB
02-02-2024
100955 cells × 22469 genes

Quick links

Source

Used in

Label Projection

Description

100k FACS-isolated C. elegans neurons from 17 experiments sequenced on 10x Genomics.

Preview

dataset is an AnnData object with n_obs × n_vars = 100955 × 22469 with slots:

obs: cell_type, tissue, batch, size_factors
var: feature_name, hvg, hvg_score
obsp: knn_connectivities, knn_distances
obsm: X_pca
varm: pca_loadings
layers: counts, normalized
uns: dataset_description, dataset_id, dataset_name, dataset_organism, dataset_reference, dataset_summary, dataset_url, knn, normalization_id, pca_variance

Reference

Name	Description	Type	Data type	Size
obs
`batch`	A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.	`vector`	`category`	100955
`cell_type`	Classification of the cell type based on its characteristics and function within the tissue or organism.	`vector`	`category`	100955
`size_factors`	The size factors created by the normalisation method, if any.	`vector`	`float32`	100955
`tissue`	Specific tissue from which the cells were derived, key for context and specificity in cell studies.	`vector`	`category`	100955
var
`feature_name`	A human-readable name for the feature, usually a gene symbol.	`vector`	`object`	22469
`hvg`	Whether or not the feature is considered to be a ‘highly variable gene’	`vector`	`bool`	22469
`hvg_score`	A ranking of the features by hvg.	`vector`	`float64`	22469
obsp
`knn_connectivities`	K nearest neighbors connectivities matrix.	`sparsematrix`	`float32`	100955 × 100955
`knn_distances`	K nearest neighbors distance matrix.	`sparsematrix`	`float64`	100955 × 100955
obsm
`X_pca`	The resulting PCA embedding.	`densematrix`	`float32`	100955 × 50
varm
`pca_loadings`	The PCA loadings matrix.	`densematrix`	`float32`	22469 × 50
layers
`counts`	Raw counts	`sparsematrix`	`float32`	100955 × 22469
`normalized`	Normalised expression values	`sparsematrix`	`float32`	100955 × 22469
uns
`dataset_description`	Long description of the dataset.	`atomic`	`str`	1
`dataset_id`	A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.	`atomic`	`str`	1
`dataset_name`	A human-readable name for the dataset.	`atomic`	`str`	1
`dataset_organism`	The organism of the sample in the dataset.	`atomic`	`str`	1
`dataset_reference`	Bibtex reference of the paper in which the dataset was published.	`atomic`	`str`	1
`dataset_summary`	Short description of the dataset.	`atomic`	`str`	1
`dataset_url`	Link to the original source of the dataset.	`atomic`	`str`	1
`knn`	Supplementary K nearest neighbors data.	`dict`		3
`normalization_id`	Which normalization was used	`atomic`	`str`	1
`pca_variance`	The PCA variance objects.	`dict`		2

Slot crossref data

`dataset.layers['counts']`

In R: dataset$layers[["counts"]]

Type: sparsematrix, data type: float32, shape: 100955 × 22469

Raw counts

`dataset.layers['normalized']`

In R: dataset$layers[["normalized"]]

Type: sparsematrix, data type: float32, shape: 100955 × 22469

Normalised expression values

`dataset.obs['cell_type']`

In R: dataset$obs[["cell_type"]]

Type: vector, data type: category, shape: 100955

Classification of the cell type based on its characteristics and function within the tissue or organism.

`dataset.obs['tissue']`

In R: dataset$obs[["tissue"]]

Type: vector, data type: category, shape: 100955

Specific tissue from which the cells were derived, key for context and specificity in cell studies.

`dataset.obs['batch']`

In R: dataset$obs[["batch"]]

Type: vector, data type: category, shape: 100955

A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.

`dataset.obs['size_factors']`

In R: dataset$obs[["size_factors"]]

Type: vector, data type: float32, shape: 100955

The size factors created by the normalisation method, if any.

`dataset.obsm['X_pca']`

In R: dataset$obsm[["X_pca"]]

Type: densematrix, data type: float32, shape: 100955 × 50

The resulting PCA embedding.

`dataset.obsp['knn_connectivities']`

In R: dataset$obsp[["knn_connectivities"]]

Type: sparsematrix, data type: float32, shape: 100955 × 100955

K nearest neighbors connectivities matrix.

`dataset.obsp['knn_distances']`

In R: dataset$obsp[["knn_distances"]]

Type: sparsematrix, data type: float64, shape: 100955 × 100955

K nearest neighbors distance matrix.

`dataset.uns['dataset_description']`

In R: dataset$uns[["dataset_description"]]

Type: atomic, data type: str, shape: 1

Long description of the dataset.

`dataset.uns['dataset_id']`

In R: dataset$uns[["dataset_id"]]

Type: atomic, data type: str, shape: 1

A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived.

`dataset.uns['dataset_name']`

In R: dataset$uns[["dataset_name"]]

Type: atomic, data type: str, shape: 1

A human-readable name for the dataset.

`dataset.uns['dataset_organism']`

In R: dataset$uns[["dataset_organism"]]

Type: atomic, data type: str, shape: 1

The organism of the sample in the dataset.

`dataset.uns['dataset_reference']`

In R: dataset$uns[["dataset_reference"]]

Type: atomic, data type: str, shape: 1

Bibtex reference of the paper in which the dataset was published.

`dataset.uns['dataset_summary']`

In R: dataset$uns[["dataset_summary"]]

Type: atomic, data type: str, shape: 1

Short description of the dataset.

`dataset.uns['dataset_url']`

In R: dataset$uns[["dataset_url"]]

Type: atomic, data type: str, shape: 1

Link to the original source of the dataset.

`dataset.uns['knn']`

In R: dataset$uns[["knn"]]

Type: dict, data type: ``, shape: 3

Supplementary K nearest neighbors data.

`dataset.uns['normalization_id']`

In R: dataset$uns[["normalization_id"]]

Type: atomic, data type: str, shape: 1

Which normalization was used

`dataset.uns['pca_variance']`

In R: dataset$uns[["pca_variance"]]

Type: dict, data type: ``, shape: 2

The PCA variance objects.

`dataset.var['feature_name']`

In R: dataset$var[["feature_name"]]

Type: vector, data type: object, shape: 22469

A human-readable name for the feature, usually a gene symbol.

`dataset.var['hvg']`

In R: dataset$var[["hvg"]]

Type: vector, data type: bool, shape: 22469

Whether or not the feature is considered to be a ‘highly variable gene’

`dataset.var['hvg_score']`

In R: dataset$var[["hvg_score"]]

Type: vector, data type: float64, shape: 22469

A ranking of the features by hvg.

`dataset.varm['pca_loadings']`

In R: dataset$varm[["pca_loadings"]]

Type: densematrix, data type: float32, shape: 22469 × 50

The PCA loadings matrix.

References

Hammarlund, Marc, Oliver Hobert, David M. Miller, and Nenad Sestan. 2018. “The CeNGEN Project: The Complete Gene Expression Map of an Entire Nervous System.” Neuron 99 (3): 430–33. https://doi.org/10.1016/j.neuron.2018.07.042.