Exploring Cellxgene data in Cellenics®
Cellxgene is an online resource which hosts hundreds of publicly available single cell RNA-seq datastes that can be easily be visualized and explored using online tools. Cellxgene Discover is one such tool that provides a practical way to explore these datasets. However, the functionality is restricted to exploring the dataset, and lacks more complex analysis options that would allow a deeper dive into the data, including filtering, integration, subsetting, and advanced downstream analysis and plotting. These tasks are can easily be performed in Cellenics®, a user-friendly cloud-based tool for single cell RNA-seq data analysis and visualization.
In this post I'll show you how to download datasets from Cellxgene and upload it into the Biomage-hosted community instance of Cellenics® for further exploration and analysis.
The first step is to download a dataset that you’re interested in from the Cellxgene database. You can find a list of the available datasets hosted by Cellxgene in the Dataset tab. Cellxgene provides two types of data download: AnnData v0.8 (.h5ad) and Seurat v4 (.rds). Internally, Cellenics® uses Seurat v4. Therefore, it is much easier to get data into the platform if we use the Seurat v4 format. Download the Seurat v4 format and follow along with the blog post. For this post, I’ve renamed the downloaded dataset to seurat.rds
and moved it into the same directory as the R script.
Load the file into a variable using the commands below.
# Load libraries
library("Seurat")
# Set working directory
setwd('/path/to/script/directory')
# Read dataset
data.seurat <- readRDS("seurat.rds")
# Set default assay
DefaultAssay(data.seurat) <- "RNA"
The Seurat upload requires that the Seurat object that’s uploaded has samples
slot. The closest information that is contained in cellxgene data that corresponds to that is is donor_id
, so let’s use that. Run the code below to insert donor_id
as samples
.
data.seurat$samples <- data.seurat$donor_id
The uploaded Seurat object also needs to have been run with FindVariableGenes
. Let’s run that on the data.
data.seurat <- FindVariableFeatures(data.seurat)
Some Cellenics® functions (e.g. ScType) requires the genes to use gene symbol. Meanwhile, cellxgene uses ENSEMBL ID. Run the code below to substitute the ENSEMBL ID with gene symbols.
dimnames(data.seurat[['RNA']]@counts) = list(data.seurat[['RNA']]@meta.features$name, colnames(data.seurat))
dimnames(data.seurat[['RNA']]@data) = list(data.seurat[['RNA']]@meta.features$name, colnames(data.seurat))
We’re done! Save the resulting data
saveRDS(data.seurat, "data.rds")
Now you can upload the resulting data into the Biomage-hosted community instance of Cellenics® using the Seurat upload functionality in the Data Management module. Check out the Biomage community forum if you have any questions.
If you’d like to learn more about scRNAseq, check out this blog post about our online course for analyzing scRNAseq data using Cellenics®.