Basic Usage ================ Workflow --------- .. image:: ./../_static/mainpipeline.BMP :alt: Title figure :width: 700px :align: center Usage -------------- The package provides functions for loading data, preprocessing data, reconstructing gene network, and visualizing the inferred GRNs. The main functions are: * Load and process data * Compute TF-gene similarities * Create modules * Perform motif enrichment and determine regulons * Calculate regulon activity level across cells * Visualize network and other results Example workflow ++++++++++++++++++++++ .. code-block:: from spagrn import InferRegulatoryNetwork as irn if __name__ == '__main__': #notice: to avoid concurrent bugs, please do not ignore this line! database_fn='mouse.feather' motif_anno_fn='mouse.tbl' tfs_fn='mouse_TFs.txt' # load Ligand-receptor data niches = pd.read_csv('niches.csv') # Load data data = irn.read_file('data.h5ad') # Preprocess data data = irn.preprocess(data) # Initialize gene regulatory network grn = irn(data) # run main pipeline grn.infer(database_fn, motif_anno_fn, tfs_fn, niche_df=niches, num_workers=cpu_count(), cache=False, save_tmp=True, c_threshold=0.2, layers=None, latent_obsm_key='spatial', model='danb', n_neighbors=30, weighted_graph=False, cluster_label='celltype', method='spg', prefix='project', noweights=False) All results will be save in a h5ad file, default file name is `spagrn.h5ad`. Visualization ++++++++++++++++++++++ SpaGRN offers a wide range of data visualization methods. 1. Heatmap ************ read data from previous analysis: ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: data = irn.read_file('spagrn.h5ad') auc_mtx = data.obsm['auc_mtx'] plot: ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: prn.auc_heatmap(data, auc_mtx, cluster_label='annotation', rss_fn='regulon_specificity_scores.txt', topn=10, subset=False, save=True, fn='clusters_heatmap_top10.pdf', legend_fn="rss_celltype_legend_top10.pdf") .. image:: ./../_static/E14-16h_hotspot_clusters_heatmap_top5.png :alt: Title figure :width: 400px :align: center 2. Spatial Plots ************ Plot spatial distribution map of a regulon on a 2D plane: ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: from spagrn import plot as prn prn.plot_2d_reg(data, 'spatial', auc_mtx, reg_name='Egr3') .. image:: ./../_static/Egr3.png :alt: Title figure :width: 300px :align: center If one wants to display their 3D data in a three-dimensional fashion: ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: prn.plot_3d_reg(data, 'spatial', auc_mtx, reg_name='grh', vmin=0, vmax=4, alpha=0.3) .. image:: ./../_static/grh_L3.png :alt: Title figure :width: 300px :align: center Hyperparameters -------------- =============================== ================================== =============================== ========= ============== =============== ========================================================================================= spatial co-expression methods spatial autocorrelation methods # nearest neighbor cells (K) # SVGs # Regulons # Target genes Detected TF list =============================== ================================== =============================== ========= ============== =============== ========================================================================================= Ixy Intersection of Hx, Ix, Cx, Gx 5 855 24 497 Dlx1, Dlx6, Emx1, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7, Lhx6, Lhx8,... Ixy Intersection of Hx, Ix, Cx, Gx 10 978 27 529 Alx4, Dlx1, Dlx6, Emx1, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Isl1, Klf7,... Ixy Intersection of Hx, Ix, Cx, Gx 15 1177 27 529 Dlx1, Dlx6, Emx1, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7, Lhx6, Lhx8,... Ixy Intersection of Ix, Cx, Gx 5 974 23 518 Dlx1, Dlx6, Emx1, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7, Lhx6, Lhx8,... Ixy Intersection of Ix, Cx, Gx 10 1056 26 575 Dlx1, Dlx6, Emx1, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7, Lhx6, Lhx8,... Ixy Intersection of Ix, Cx, Gx 15 1286 30 615 Creb3l1, Dlx1, Dlx6, Emx1, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7,... Cxy Intersection of Hx, Ix, Cx, Gx 5 856 23 482 Dlx1, Dlx6, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7, Lhx6, Lhx8, Lmx1a,... Cxy Intersection of Hx, Ix, Cx, Gx 10 978 25 536 Dlx1, Dlx6, Emx1, Eomes, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7,Lhx6,... Cxy Intersection of Hx, Ix, Cx, Gx 15 1177 28 612 Alx4, Dlx1, Dlx6, Emx1, Eomes, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7,,... Cxy Intersection of Ix, Cx, Gx 5 974 27 560 Alx4, Dlx1, Dlx6, Eomes, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7, Lhx6,... Cxy Intersection of Ix, Cx, Gx 10 1056 28 585 Dlx1, Dlx6, Emx1, Eomes, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2, Ikzf1, Klf7, Lhx6,... Cxy Intersection of Ix, Cx, Gx 15 1286 31 669 Alx4, Creb3l1, Dlx1, Dlx6, Emx1, Eomes, Erg, Ets1, Etv4, Fli1, Gbx1, Hivep2,,... =============================== ================================== =============================== ========= ============== =============== ========================================================================================= We have provided a detailed discussion of the three most important hyperparameters and their impact on the results. The number of nearest neighbor cells (K) used in the Gaussian kernel for capturing local heterogeneity and reducing the influence of high-order neighbors in spatial autocorrelation and co-expression: We evaluated three values for K: 5, 10, and 30. Our observations indicate the following effects: a) Increasing K leads to a larger number of regulons, but without significant differences in their significance. b) The results are relatively stable across different values, although the number of detected regulons increases with increasing K values. c) Considering the biological context, where each cell is typically surrounded by approximately 10 neighboring cells, we have fixed the pipeline to use K =10 as the default value. The choice of spatial autocorrelation methods: We compared the performance of the intersection gene set using different methods, Hx, Ix, Cx, and Gx. Our analysis revealed the following: a) The majority of the SVGs detected by these methods are shared, indicating a certain level of agreement among the methods. However, each method also uniquely detected some SVGs, suggesting that they may capture different aspects of spatial variation. b) The intersection of all four methods yields more precise regulon boundaries, resulting in a smaller number of regulons and clearer spatial patterns, compared to the intersection of three methods in most cases. Based on these findings, we have chosen to use the intersection of all four methods in our fixed pipeline. The choice of spatial co-expression methods: We compared the results obtained using two co-expression methods: bivariate Moran’s I Ixy and bivariate Geary’s C Cxy. Our analysis demonstrated the following: The differences in the output results between Moran’s I and Geary’s C are minimal. Specifically, the number of regulons, the TFs identified in the regulons, and the composition of target genes within regulons with the same TF are all similar. Based on these findings, we default to the Morlan’s I method for co-expression network inference in our fixed pipeline. Warning -------------- Note that it is recommended to utilize the intersection set of spatially specific genes generated by five different gene autocorrelation detection algorithms by default. The intersection strategy ensures a more robust and reliable identification of spatially specific genes. Throughout the manuscript, we have consistently employed the intersection of gene sets unless explicitly stated otherwise. To mitigate the potential overshadowing effect of large-sample cell types or functional regions on those with a small number of spots or cells, we strongly recommend adopting Moran's I co-expression method as the default approach, especially for complex organ and tissue structures. This method has proven to be effective in generating spatial GRNs specifically expressed in rare cell types or regions using various datasets. Additionally, users have the option to crop the area of interest, which can increase the sample size of cell types or functional regions with limited spots or cells. This approach has the potential to improve the ranking of specific co-expressed targets, further enhancing the accuracy of the analysis.