Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful technology for studying gene expression at cellular resolution, enabling the characterization of heterogeneous biological systems that cannot be resolved using bulk transcriptomic approaches. However, the complexity, dimensionality, and sparsity of single-cell data introduce significant computational challenges, making dedicated and carefully designed data processing pipelines essential for meaningful downstream analysis. This thesis presents a computational framework for the preprocessing and analysis of droplet-based single-cell RNA sequencing data, with a focus on the transformation of raw sequencing output into structured gene-by-cell expression matrices. Starting from FASTQ files, the proposed pipeline leverages a standardized preprocessing stage to address key steps such as read alignment, cell barcode assignment, UMI-based deduplication, and quality control, providing a reproducible and modular workflow for single-cell data processing. The framework is applied to a real-world scRNA-seq dataset derived from mouse retinal tissue, a well-established model system characterized by pronounced cellular heterogeneity. Multiple experimental conditions are considered, enabling the evaluation of the pipeline in a biologically complex and realistic setting. The resulting processed data support downstream analyses including cell-type characterization, differential gene expression, and pathway-level interpretation. Rather than emphasizing specific biological findings, this work highlights the central role of computational pipelines in shaping the quality and interpretability of single-cell RNA sequencing analyses. By framing scRNA-seq data analysis as a data engineering problem, the proposed framework contributes to reproducible, scalable, and transparent single-cell transcriptomic workflows, and provides a solid foundation for future extensions toward more advanced analytical and machine learning-based approaches in biomedical research.

Analysis Framework for single-cell data

FONTANESI, MARTINA
2024/2025

Abstract

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful technology for studying gene expression at cellular resolution, enabling the characterization of heterogeneous biological systems that cannot be resolved using bulk transcriptomic approaches. However, the complexity, dimensionality, and sparsity of single-cell data introduce significant computational challenges, making dedicated and carefully designed data processing pipelines essential for meaningful downstream analysis. This thesis presents a computational framework for the preprocessing and analysis of droplet-based single-cell RNA sequencing data, with a focus on the transformation of raw sequencing output into structured gene-by-cell expression matrices. Starting from FASTQ files, the proposed pipeline leverages a standardized preprocessing stage to address key steps such as read alignment, cell barcode assignment, UMI-based deduplication, and quality control, providing a reproducible and modular workflow for single-cell data processing. The framework is applied to a real-world scRNA-seq dataset derived from mouse retinal tissue, a well-established model system characterized by pronounced cellular heterogeneity. Multiple experimental conditions are considered, enabling the evaluation of the pipeline in a biologically complex and realistic setting. The resulting processed data support downstream analyses including cell-type characterization, differential gene expression, and pathway-level interpretation. Rather than emphasizing specific biological findings, this work highlights the central role of computational pipelines in shaping the quality and interpretability of single-cell RNA sequencing analyses. By framing scRNA-seq data analysis as a data engineering problem, the proposed framework contributes to reproducible, scalable, and transparent single-cell transcriptomic workflows, and provides a solid foundation for future extensions toward more advanced analytical and machine learning-based approaches in biomedical research.
2024
single-cell data
analysis framework
sequencing
processing
gene-expression
File in questo prodotto:
File Dimensione Formato  
Fontanesi.Martina.pdf

Accesso riservato

Dimensione 3.47 MB
Formato Adobe PDF
3.47 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14251/4743