Journal: bioRxiv
Article Title: PVT1 splicing activity predicts genome-wide gene expression with miRNA regulatory signatures
doi: 10.1101/2025.07.25.666741
Figure Lengend Snippet: PVT1 splicing activity predicts gene expression in elastic net models. ( A ) Left panel, example of elastic net fit plot of observed values of gene expression (for SEC61B, ENSG00000106803.10) to elastic net-predicted values. Elastic net models for all 30,887 genes were trained in 10 x cross-validation using the splicing efficiency values at 34 PVT1 3’ splice sites as variables. Right panel, boxplot distributions of the glmnet coefficients across 10 folds (from 10 x cross-validation) for the 34 PVT1 3’ splice sites for this specific gene expression model. Positive glmnet coefficients at certain PVT1 3’ splice sites means that splicing efficiency at these sites contributes positively to target gene expression. Negative glmnet coefficients indicate negative contribution of PVT1 splicing efficiency to target gene expression. The bigger the absolute value of a glmnet coefficient, the greater the impact (either positive or negative) of PVT1 3’ splice site splicing efficiency on target gene expression. ( B ) Empirical cumulative distribution function (ECDF) of elastic net (glmnet) performance metrics, model trained using the 34 PVT1 3’ splice site splicing efficiencies, for all the 30,887 assayed genes. Left panel: ECDF of root-mean-square error (RMSE) values, blue line: mean (average) RMSE from 10 x cross-validation, and red line: minimum RMSE (designating best fold from 10 x cross-validation) per gene expression model. Right panel: ECDF of the coefficient of determination (R 2 ) values, blue line: mean R 2 across the 10 folds, and red line: R 2 from the best fold (defined as the fold with the minimum RMSE). Genes that satisfied i) best-fold R 2 ≥ 0.2, ii) a negative correlation between R 2 and RMSE, and iii) 10-fold mean RMSE < Q₃ + 1.5 × IQR were considered as significant, yielding a total of 365 genes predicted as significant when the elastic net models were trained across all tumor samples. ( C ) miR-200 target genes are significantly enriched among the 365 elastic net-predicted genes (Fisher’s exact test p-value 1.51e-14, odds ratio 3.12). Right panels, GO terms enrichment analysis using clusterProfiler , terms related to translation are enriched among the 365 elastic net-predicted genes. ( D ) K-means clustering of the glmnet coefficients from the PVT1-splicing based models. The 365 elastic-net predicted genes were clustered based on the elastic net glmnet coefficients into 6 clusters. miR-200 target genes are significantly enriched in clusters 1 and 3 (Fisher’s exact test p-value 8.14e-07, odds ratio 2.28), and translation-related terms are enriched in clusters 4 and 6 (defined by negative contribution of splice sites ss10, ss13, ss14, ss15, and positive contribution of splice sites ss2, ss4, ss6, ss11, ss12, ss29). ( E ) K-means clustering of 471 genes predicted in Basal subtype, using the PVT1 splicing-based elastic net model (glmnet) coefficients of the 34 splice sites as clustering parameters. These 471 genes were predicted as significant (passing the pre-defined cutoffs) by the PVT1 splicing-based elastic net models trained across 155 Basal subtype samples. Translation-related terms were enriched in clusters 1, 3 and 5, defined by positive contribution of splice sites ss4, ss6, ss7, ss33. miR-200 target genes were significantly enriched in cluster 4 (Fisher’s exact test p-value 2.6e-05, odds ratio 3.63) and 6 (Fisher’s exact test p-value 0.0001556, odds ratio 2.57). Clusters 4 and 6 share positive contributions of splice sites ss13, ss14, ss15. No other specific terms were enriched in clusters 2,4,6.
Article Snippet: Then, nested PCR reactions were performed with “PVT1 Com Fw” and “PVT1 Mid Exon Rv (R4)” or “PVT1 Last Exon Rv (R10)” (NEB, M0273) and got analysed by agarose gel electrophoresis (1%).
Techniques: Activity Assay, Gene Expression, Biomarker Discovery, Targeted Gene Expression