Single-cell RNA sequencing data analysis: 6 tools you need to know
Can you imagine yourself on the brink of a ground-breaking discovery, except you don’t know it yet?
Those of you who work in the single cell RNA-sequencing (scRNA-seq) field may be familiar with this. Having laid the groundwork, prepared your samples, and sequenced your libraries, you now have your sequencing data. In front of you are thousands, if not millions, of cells and the data analysis process feels more daunting than climbing Mount Everest. What can you do with this data if you're not proficient in programming or don't have time to learn bioinformatics?
Fortunately, there is an increasing number of user-friendly single cell data analysis platforms available to you. In this article, we have compiled a list of the best single cell RNA-seq analysis software that will help you move your single cell research forward.
Which factors should you consider when choosing a data analysis tool?
In recent years, scRNA-seq has proven to be one of the most successful academic and clinical research methods. However, biological insight continues to be locked away in sequencing data. As a result, there is an increasing need for user-friendly bioinformatics software solutions. But how do you know which one is a good fit for you?
You should evaluate the type of application (e.g. cloud based or local install) and consider the running capabilities of your workstation.
Depending on your experience with scRNA-seq data analysis, you might consider the level of intuitiveness of the platform.
Think about your specific requirements for data analysis. What input formats and single cell technologies are accepted by the software? What options do you need for filtering and integration algorithms? Do you need single cell analysis software with automatic cell type prediction? What data plotting and customization options are you looking for?
You should also consider price and licensing. Some scRNA-seq data analysis tools are free, while others charge licensing fees. Users should not overlook the many robust free solutions available. The most expensive software doesn't necessarily have the most features.
In this article, we look at the common and unique features across the most popular scRNA-seq analysis tools currently out there to help you select a tool that best fits your needs.
Feature comparison overview
Our picks for the best user-friendly tools to analyze scRNA-seq data are:
Trailmaker
BBrowserX
Loupe Browser
Partek Flow
CELLxGENE
ROSALIND
A feature comparison table for these 6 tools is presented below:
Best scRNA-seq data analysis tools
Trailmaker™
Pros
- Free for all academic users and all Parse Biosciences customers
- Supports multiple scRNA-seq technologies
- Very user-friendly - clear and automated workflow; no coding required
- Hosted on the cloud
- Wide variety of fully-customizable plots for generating publication-ready figures
Cons
- Doesn't support multi-omics technologies
Trailmaker is a cloud-based single cell RNA-seq analysis software provided by Parse Biosciences that allows you to explore and analyze your scRNA-seq dataset from any technology without prior programming knowledge. Being in the cloud means users can analyze their dataset from anywhere in the world at any time without needing a powerful workstation.
Trailmaker’s user-friendly interface, automated workflow, and dedicated support resources mean that your single cell data analysis journey is fully supported.
Trailmaker’s Pipeline module enables the processing of Parse Biosciences Evercode™ WT FASTQ files. The wizard guides you through input of experimental details, upload of FASTQ files, either via the user interface or command line options, and genome selection (custom genomes can be added). The pipeline then runs in the cloud and sends a notification when it’s complete. Pipeline output files including reports and count matrices can be viewed and downloaded, with data automatically sent to the Insights module for downstream processing and analysis.
Trailmaker’s Insights module supports the exploration and visualization of scRNA-seq in a technology- and species-agnostic manner. The necessary filtering and integration steps are automated and fully adjustable. Exploration tools including automatic annotation using ScType, differential expression analysis, pathway analysis and trajectory analysis are available. A wide range of plots are available, each of which is fully customizable to the individual’s design preferences enabling the generation and export of publishable figures.
Data import. For FASTQ file processing, only Parse Biosciences Evercode WT data is supported. However, for downstream analysis and visualization using the Insights module, multiple data types are supported including: count matrices in the Parse Biosciences, 10x Genomics (barcodes.tsv, features.tsv and matrix.mtx) or BD Rhapsody (expression_data.st) formats, H5 files (matrix.h5), and Seurat objects (.rds). Trailmaker currently doesn't support multi-omics technologies.
Data processing. Trailmaker’s data processing workflow enables advanced data filtering and integration using automatically determined default parameters that can be adjusted by users. Filtering options include the removal of background/empty droplets, dead and dying cells based on mitochondrial content, doublets and multiplets, and poor quality cells. Data are filtered on a per sample basis with all quality control plots shown. Other available methods include: Harmony, Seurat v4, and fast MNN for data integration, log-transformation or SCTransform for data normalization, and user control over dimensionality reduction. UMAP and t-SNE projections are available for embedding with Leiden or Louvain clustering. All quality control plots are customizable.
Data exploration and plotting. Trailmaker has a wide variety of data exploration features. Custom cell sets can be created using selection tools or based on the expression of one or more genes. It's easy to rename clusters or recolor by sample, metadata, or gene. Standard analysis actions such as marker heatmap and UMAP are pre-loaded. Automatic cluster annotation is available for Human and Mouse species. Users can calculate differential expression between cell sets within a sample/group or compare a cell set between samples and groups. Differential expression results can be filtered further, for example, by selecting only upregulated genes. Users can perform pathway analysis on the list of differentially expressed genes using external services - pantherdb or enrichr. Trajectory analysis is also available in the Plots and Tables module.
Available plots: categorical and continuous embedding (e.g. UMAP or t-SNE showing gene expression, clusters or groups), frequency plot for cell set composition, volcano plot, dot (or bubble) plot, violin plot, heatmaps, and trajectory plot showing pseodotime. All plots are fully customizable and can be downloaded as publication-ready SVG or PNG files.
Data export and sharing. For a processed project, users can download the count matrices, Seurat object, data processing settings (as a .txt file), differentially expressed gene lists and normalized expression matrices (as CSV files). Users can share their analyses, enabling multiple users to collaborate.
Pricing. Trailmaker is available free of charge to all academic researchers and to all Parse Biosciences customers. This includes unlimited storage and processing of an unlimited number of projects. Industry users who are not current Parse customers should contact Parse directly.
BBrowserX
Pros
- Supports the import of a variety of data and technology types
- Supports multi-omics data: antibody-derived tags and TCR/BCR
Cons
- Limited data filtering and quality assessment plots
- Limited data integration
- Paid software
BioTuring’s BrowserX is an analytics platform for large-scale single cell datasets that’s designed for scientists without coding experience.
Previous versions of BBrowser were desktop tools that had specific system requirements, meaning that powerful workstations were required to analyze datasets of >100,000 cells. The more recent BBrowserX is now a cloud-based application.
Data import. BBrowserX supports five formats of single-cell data: CellRanger matrix output, CellRanger .h5 output, Scanpy object, Seurat object and TSV/CSV/TXT full matrix.
One of the unique features of BBrowserX is the available public datasets from the latest publications. So, users can analyze the available data and compare multiple datasets.
Data processing. The process from quality control to dimensionality reduction is applied to public and in-house datasets imported in count matrices, H5 files or TSV/CSV/TXT files. If the data has been processed (Seurat or Scanpy objects), filtering and batch effect correction will not be applied.
For imported data, BBrowserX allows users to define cut-offs for filtering, such as minimum and maximum number of reads, minimum and maximum number of detected genes, and maximum percentage of mitochondrial genes. Data normalization is only available by log transformation. Therefore, it could be said the quality assessment is limited in BBrowserX. The limited options together with the lack of visual plots of the data quality and thresholds is certainly a drawback, especially for those with bioinformatics knowledge or who need specific filtering or normalization methods. Users can also skip the filtering steps, if preferred.
Batch correction is available using the Harmony method only. Both t-SNE and UMAP are available for visualization, and clustering by the Louvain method is available.
Data exploration and plotting. BBrowserX supports automatic cell type prediction using BioTuring's comprehensive database of over a hundred million single cells to match cell type markers. BBrowserX supports finding differentially expressed and marker genes, and can also perform pathway analysis utilizing the biological pathway database WikiPathways and calculating the enrichment scores based on AUCell scores. The tool also allows trajectory analysis and plotting of pseudotime.
The cell search feature allows you to validate and extend your analysis by browsing through the entire database of millions of cells and pulling out the cell populations that share similar gene-expression profiles with your selected population.
Available plots: UMAP/t-SNE, feature plots in 2D and 3D, co-expression plots, heatmaps, bubble heatmap, violin plot, bar chart, Circos plot and pseudotime plot (using Monocle2).
There are good options for customizing the main plots (scatter and t-SNE/UMAP) – changing point opacity and size, color scheme, and so on. Most other plots are fixed in design and layout, except the color scheme can be changed before making a plot. Otherwise, you need to export the graph's data to tsv and reconstruct it outside BBrowserX. BBrowserX supports exporting graphs into .PNG or .SVG formats. Some plots can be exported as SVG files in gene query and the differential expression analysis dashboard.
Data export. Users can export many types of data to a TSV file: annotations, clonotypes, graph data, a list of marker genes, and differentially expressed genes. The expression matrix can be exported as a folder containing the matrix.mtx, features.tsv, and barcodes.tsv files or a Seurat object (.rds) or Scanpy object (.h5ad).
Pricing. BBrowserX is paid single-cell analysis software. Pricing is available on demand.
Loupe Browser
Pros
- Free for analyzing 10x Genomics Chromium™ datasets
- Supports integration with ATAC-seq, CITE-seq, and VDJ (TCR/BCR) sequencing data
Cons
- Data import is limited to .cloupe files for Chromium data
- Doesn’t support essential data processing steps such as filtering and integration
- Doesn’t support trajectory analysis
- Plots have limited customization options
Loupe Browser by 10x Genomics™ is a desktop tool that enables anyone to analyze and visualize Chromium datasets for free. Users can install the latest Loupe Browser on macOS or Windows 64-bit. The software requires a minimum of 4GB RAM and 16GB RAM for datasets over 100 000 cells. Loupe Browser has a simple interface centered around the main view panel.
Data import. The user can open any .cloupe file generated by other 10x Genomics software (such as Cell Ranger or 10x Cloud) to visualize their data. The required .cloupe files can also be generated in Seurat. Loupe Browser supports integration with ATAC-seq, CITE-seq, and VDJ sequencing data.
Data processing. It’s important to note that Loupe Browser does not offer data processing (filtering, integration, dimensionality reduction, etc.) and requires users to use additional 10x Genomics tools such as Cell Ranger or 10x Cloud and/or code-based tools such as Seurat for the pre-processing of raw data files. So, 10x Loupe Broswer is relatively constrained in its data analysis capabilities.
Regarding clustering, Loupe Browser provides three ways to display clusters – graph-based clustering, k-means clustering, and custom-created categories. A useful function of this software is the ability to split the projections (like t-SNE) by clusters in a single view.
When the user subsets the cells, it’s possible to recluster them. Reclustering entails setting thresholds for UMI counts, number of features, and mitochondrial fraction. Note that the user must manually input thresholds. Reclustering results in t-SNE and UMAP projections. Reclustering is only possible for datasets with less than 100 000 cells in Loupe Browser.
Data exploration and plotting. Users can perform differential expressions between clusters. The globally distinguishing method allows defining features that distinguish a selected cluster from every other cell in the dataset. The locally distinguishing method finds features highly expressed within clusters chosen compared to other designated groups. Loupe Browser also allows creating cell subsets by expression filtering or by manual selection.
Both automated and manual cell type annotations are supported in Loupe Browser. Users can also view the expression of particular genes in the projections with configurable parameters such as the minimum UMI value of the expression value.
Available plots are limited: UMAP/t-SNE, heatmap, violin plot, feature plot. Trajectory or pseudotime analysis is not supported. However, users can input custom trajectory projections generated by third-party tools. The plots are not fully customizable so it’s impossible to generate publication-ready figures. The user can adjust the color scheme and point size.
UMAP and t-SNE coordinates can be exported as CSV. Feature plots, UMAP, and t-SNE can also be exported as images.
Data export and sharing. Users can export categories in a CSV file with a list of barcodes and their associated cluster labels. The significant gene table and the currently active (selected) features can also be exported as CSV.
Data sharing is impossible in Loupe Browser, as files in the software are self-contained to the user’s local environment. Though, users can share datasets by sharing the .cloupe file.
Pricing. Free for analyzing Chromium data.
Partek™ Flow™
Pros
- Supports the import of a variety of data types
- Flexible data analysis pipeline entry points
- Offers a wide range of processing algorithms and data analysis options
Cons
- Requires powerful hardware to run
- Not intuitive: steep learning curve for non-bioinformaticians
- Paid software
Partek Flow is a software for analyzing next-generation and scRNA-seq data. It is a web-based application that users can install on a desktop computer or a computer cluster. Users can also run the software on the Amazon Web Services cloud, which requires additional technical knowledge to set up. Irrespective of where the server is, the user will interact with Partek Flow using a web browser. Google Chrome, Mozilla Firefox, Microsoft Edge, and Apple Safari browsers are currently supported.
Still, Partek Flow requires powerful hardware to run. Even for datasets with less than 100 000 cells, the system requirements are 64GB RAM, > 2TB storage for data, and > 100 GB storage for root partition. Partek Flow recommends accounting for 3-5 times more storage than required, which is an important consideration to keep in mind.
Due to the range of supported sequencing data types, Partek Flow interface can feel overwhelming and non-intuitive. The interface presents the user with appropriate task options, so first-time users without bioinformatics knowledge can import their data and perform downstream analysis. Partek Flow includes many statistical algorithms and various analysis options for data processing. Yet, these are organized in a task graph with data and task nodes, and analysis requires going to and from the task graph and data viewer and some knowledge of analysis processes and methods. This means that the user might need help with bioinformatics to perform downstream analysis confidently.
Data import. Partek Flow supports the data import of a wide range of file formats, including count matrices, FASTQ, Seurat objects (.rds), and H5 files. The software also supports multi-omics technologies scATAC-seq and CITE-seq in specific data analysis tasks, like finding multimodal neighbors.
An advantage of Partek Flow is the ability to enter the data analysis pipeline from various points – the user can start with raw data, aligned reads, count data, or normalized counts.
Data processing. Partek Flow offers several quality control tools. There are four quality assessment plots – counts per cell, detected features per cell, and percentage of mitochondrial and ribosomal counts per cell. Partek Flow allows users to filter data by features, groups, barcodes, and more. The thresholds for these are not set automatically and must be configured by the user. Again, bioinformatics knowledge is required to set appropriate thresholds.
The software supports several normalization methods, including log-transformation and SCTransform. Additionally, it provides general linear model, Harmony, and Seurat v3 methods for data integration. The platform provides graph-based and k-clustering with scatter plots, PCA, t-SNE, and UMAP visualizations. It also offers hierarchical clustering heat maps and violin plots with configurable parameters. Additionally, users can perform trajectory analysis with interactive 2D and 3D plots. The user can also calculate pseudotime from chosen starting points of trajectories with the Monocle3 method.
Data exploration and plotting. Users can perform gene set enrichment analysis with groups defined by Gene Ontology or another imported gene ontology source. Differential pathway expression analysis is also available with interactive KEGG pathway maps for additional information. Automatic cell type prediction is not available. However, if the user has attribute information about the cells in the dataset, they can use this to annotate cells. Marker genes for each cluster can be calculated using ANOVA.
Available plots: UMAP/t-SNE, heatmap, bubble map, scatter plot, dot plot, and volcano plot. Visualizations can be downloaded as publication-quality SVG files with customizable image size and resolution.
Data export and sharing. Users can export count matrix data (including filtered and normalized counts and more) to h5ad files.
Partek Flow supports multiple users on a server, and each user can be classified as an administrator or a regular user. This allows for direct data sharing.
Pricing. Partek Flow is a paid software. Pricing is available on demand.
CELLxGENE
Pros
Free of charge for academic users
An open source scRNA-seq data analysis software
Can support very large datasets with millions of cells
User-friendly cloud-based exploration of public datasets
Cons
Primarily a data visualization tool with very limited data processing capabilities
Exploring your own dataset requires installation in the desktop version which needs programming knowledge to install and run
CELLxGENE is an open-source, free-to-use scRNA-seq data analysis tool. There are several versions of the tool, depending on your use case. CELLxGENE Discover is available for searching and downloading any of the curated public datasets in the extensive database, while CELLxGENE Explorer is available for exploring and analyzing any of the individual public datasets.
If you want to analyze and explore your own dataset, you have to install the self-hosted desktop version of CELLxGENE, following the complicated installation and running instructions. To install CELLxGENE, you need Python 3.6+ and Google Chrome 61+, Edge 15+, or Firefox 60+ browser. CELLxGENE desktop explorer is launched via the command line in Python. Without coding knowledge, the installation and launch might pose a significant challenge.
CELLxGENE could be very useful to computational biologists who write their code via a mini Jupyter notebook-like interface. This provides additional capabilities beyond the set of plotting functions provided in the tool. For biologists who are seeking a user-friendly tool to analyze their own data, CELLxGENE might not be the best choice.
However, it’s essential to note that CELLxGENE is almost entirely a data exploration and visualization tool. Data processing (e.g. filtering, integration and normalization) are not supported.
Data import. As input, CELLxGENE takes an h5ad file that contains a pre-computed embedding and, optionally, additional metadata. Data upload (converting data into the correct format and data import) requires programming knowledge.
Multi-omics technologies are not supported. But, multi-omic public datasets are available in Discover and Explorer versions of the tool. Most of the datasets in the CELLxGENE database are human datasets.
Data processing. CELLxGENE doesn’t offer data processing steps. Users must do data processing functions such as filtering, integration, dimensionality reduction, and data normalization outside the CELLxGENE platform. This again might require either previous bioinformatics knowledge or time with a single-cell bioinformatics expert.
However, quality control metrics can be uploaded as metadata and used for filtering. To exclude outliers it’s possible to clip all continuous quality control data to a percentile range.
Data exploration and plotting. The desktop version offers the ability to annotate cells and recompute a new embedding based on a selection of cells. Cells can be selected and subset via the selection on the embedding, gene expression cut-offs, or based on categorical metadata such as time point or sex. Users can also compute differentially expressed and marker genes. Cell annotation is available as a new experimental feature, in the form of a PyPI package.
Data visualizations are limited to a few plots such as UMAP/t-SNE, and bivariate plots in the hosted version. Genes and gene sets can be used to color the embedding. The plot customization and export options are also limited. Trajectory analysis is not supported.
Many more visualization options are available with CELLxGENE VIP - an interactive visualization plugin.
Data sharing. CELLxGENE desktop is meant to be used by researchers on their local workstations. However, private links can be shared with collaborators or manuscript reviewers who haven’t installed CELLxGENE.
Pricing. Free.
ROSALIND
Pros
Deep data exploration available via 50+ knowledge bases
Virtual rooms allow real-time collaboration between researchers
Many visualization options available
Cons
- Primarily supports Chromium datasets
- Some limitations in data processing options
- Paid software
ROSALIND is a cloud-based bioinformatics platform designed for life science researchers. ROSALIND connects researchers to their data and team members as well as knowledge bases and team members to aid in interpretation. ROSALIND provides intuitive workflows and analysis interfaces for single cell data. It’s not exclusively scRNA-seq data analysis software.
Data import. ROSALIND accepts raw FASTQ files and processed counts data. It is optimized for 10x Genomics Chromium single-cell library kits. ROSALIND also supports the analysis of cell clusters created in the 10x Loupe Browser. The platform doesn’t support other single-cell technologies. However, comparisons with multi-omic data (ATAC-seq and ChIP-seq) are possible. ROSALIND also allows the import of public data from the National Center for Biotechnology Information, Short Read Archive, and Gene Expression Omnibus.
It’s worth noting that users need to pre-define various things before the data upload such as sample kit model, sample attributes, and analysis parameters.
Data processing. The quality control pipeline consists of automatic contamination detection, Q30 scores, ribosomal content, duplicate rates, gene coverage, sample correlation, and multidimensional scaling. Additionally, information on the number of cells, and average and median reads per cell are also available. ROSALIND provides Cell Ranger, Seurat, and k-means clustering methods.
However, while the plots on quality control are available, there are no apparent options for filtering and adjusting various quality control parameters. So, the data processing step can only be used to verify the experiments and each sample before beginning the interpretation.
Data exploration and plotting. The platform has integrated knowledge bases that allow for the exploration of pathways, cell types, and gene ontology. ROSALIND allows to compare cluster proportions and identify differentially expressed genes. The software also has assisted cell type identification based on found marker genes. Trajectory analysis is not available.
Important to note, that the analysis of samples requires Analysis Units that are included in specific subscriptions or can be additionally bought ($35/sample (in 50-Packs)).
Many customizable visualization options are available – UMAP/t-SNE, heatmap, volcano plot, MA plot, box plots, and more.
Data export and sharing. The main advantage is the collaborative functions of this analysis software. ROSALIND Spaces allows collaboration between researchers through virtual data rooms that allow to interactively explore shared experiments. Every update is instantly available to each participant. Real-time activity feeds and historical reports are also available.
All plots, diagrams, source, and result files are downloadable on ROSALIND.
Pricing. ROSALIND is a paid software with a free trial available. The pricing is found on their website.
Wrap up
Even though scRNA-seq analysis can seem daunting, it’s easy for anyone without coding experience to work with single-cell data when you have an appropriate tool. In doing so, you can unlock biological insight from your datasets within a few hours, and take the next step to advance your research project quickly and easily.
Importantly, there is no universal decision when choosing a single-cell data analysis software, as all of them have their pros and cons. However, we hope you have gained some insight into the best scRNA-seq data analysis tools currently available and are more confident in choosing one for your single-cell research.
Last updated: 22nd October 2024.