Publications
Book Book Chapter Journals Conferences and Announcements Other
Book
2018
- S. Bezjak, A. Clyburne-Sherin, P. Conzett, P. Fernandes, E. Görögh, K. Helbig, B. Kramer, I. Labastida, K. Niemeyer, F. Psomopoulos, T. Ross-Hellauer, R. Schneider, J. Tennant, E. Verbakel, H. Brinken, and L. Heller, Open Science Training Handbook. Zenodo, 2018.
Book Chapter
2023
- F. Psomopoulos, C. Goble, L. J. Castro, J. Harrow, and S. C. E. Tosatto, “A Roadmap for Defining Machine Learning Standards in Life Sciences,” in Artificial Intelligence for Science, WORLD SCIENTIFIC, 2023, pp. 399–410.
2022
- C. Galigalidou, L. Zaragoza-Infante, A. Chatzidimitriou, K. Stamatopoulos, F. Psomopoulos, and A. Agathangelidis, “Purpose-Built ImmunoinformaticsImmunoinformatics for BcR IGImmunoglobulin (IG)/TRT cell receptor (TR)Repertoire Data Analysis,” in Immunogenetics: Methods and Protocols, A. W. Langerak, Ed. New York, NY: Springer US, 2022, pp. 585–603.
The study of antigen receptor gene repertoires using next-generation sequencing (NGS) technologies has disclosed an unprecedented depth of complexity, requiring novel computational and analytical solutions. Several bioinformatics workflows have been developed to this end, including the T-cell receptor/immunoglobulin profiler (TRIP), a web application implemented in R shiny, specifically designed for the purposes of comprehensive repertoire analysis, which is the focus of this chapter. TRIP has the potential to perform robust immunoprofiling analysis through the extraction and processing of the IMGT/HighV-Quest output, via a series of functions, ensuring the analysis of high-quality, biologically relevant data through a multilevel process of data filtering. Subsequently, it provides in-depth analysis of antigen receptor gene rearrangements, including (a) clonality assessment; (b) extraction of variable (V), diversity (D), and joining (J) gene repertoires; (c) CDR3 characterization at both the nucleotide and amino acid level; and (d) somatic hypermutation analysis, in the case of immunoglobulin gene rearrangements. Relevant to mention, TRIP enables a high level of customization through the integration of various options in key aspects of the analysis, such as clonotype definition and computation, hence allowing for flexibility without compromising on accuracy.
2019
- A. Agathangelidis, F. Psomopoulos, and K. Stamatopoulos, “Stereotyped B Cell Receptor Immunoglobulins in B Cell Lymphomas,” in Lymphoma: Methods and Protocols, R. Küppers, Ed. New York, NY: Springer New York, 2019, pp. 139–155.
Comprehensive analysis of the clonotypic B cell receptor immunoglobulin (BcR IG) gene rearrangement sequences in patients with mature B cell neoplasms has led to the identification of significant repertoire restrictions, culminating in the discovery of subsets of patients expressing highly similar, stereotyped BcR IG. This finding strongly supports selection by common epitopes or classes of structurally similar epitopes in the ontogeny of these tumors. BcR IG stereotypy was initially described in chronic lymphocytic leukemia (CLL), where the stereotyped fraction of the disease accounts for a remarkable one-third of patients. However, subsequent studies showed that stereotyped BcR IG are also present in other neoplasms of mature B cells, including mantle cell lymphoma (MCL) and splenic marginal zone lymphoma (SMZL). Subsequent cross-entity comparisons led to the conclusion that stereotyped IG are mostly “disease-specific,” implicating distinct immunopathogenetic processes. Interestingly, mounting evidence suggests that a molecular subclassification of lymphomas based on BcR IG stereotypy is biologically and clinically relevant. Indeed, particularly in CLL, patients assigned to the same subset due to expressing a particular stereotyped BcR IG display remarkably consistent biological background and clinical course, at least for major and well-studied subsets. Thus, the robust assignment to stereotyped subsets may assist in the identification of mechanisms underlying disease onset and progression, while also refining risk stratification. In this book chapter, we provide an overview of the recent BcR IG stereotypy studies in mature B cell malignancies and outline previous and current methodological approaches used for the identification of stereotyped IG.
Journals
2024
- O. A. Attafi et al., “DOME Registry: implementing community-wide recommendations for reporting supervised machine learning in biology,” GigaScience, vol. 13, p. giae094, Dec. 2024, doi: 10.1093/gigascience/giae094.
Supervised machine learning (ML) is used extensively in biology and deserves closer scrutiny. The Data Optimization Model Evaluation (DOME) recommendations aim to enhance the validation and reproducibility of ML research by establishing standards for key aspects such as data handling and processing, optimization, evaluation, and model interpretability. The recommendations help to ensure that key details are reported transparently by providing a structured set of questions. Here, we introduce the DOME registry (URL: registry.dome-ml.org), a database that allows scientists to manage and access comprehensive DOME-related information on published ML studies. The registry uses external resources like ORCID, APICURON, and the Data Stewardship Wizard to streamline the annotation process and ensure comprehensive documentation. By assigning unique identifiers and DOME scores to publications, the registry fosters a standardized evaluation of ML methods. Future plans include continuing to grow the registry through community curation, improving the DOME score definition and encouraging publishers to adopt DOME standards, and promoting transparency and reproducibility of ML in the life sciences.
- S.-C. Fragkouli, D. Solanki, L. Castro, F. Psomopoulos, N. Queralt-Rosinach, D. Cirillo, and L. Crossman, “Synthetic data: how could it be used in infectious disease research?,” Future Microbiology, vol. 0, no. 0, pp. 1–6, 2024, doi: 10.1080/17460913.2024.2400853.
- N. Pechlivanis, G. Karakatsoulis, K. Kyritsis, M. Tsagiopoulou, S. Sgardelis, I. Kappas, and F. Psomopoulos, “Microbial co-occurrence network demonstrates spatial and climatic trends for global soil diversity,” Scientific Data, vol. 11, no. 1, p. 672, 2024, doi: 10.1038/s41597-024-03528-1.
- V. Makarov, C. Chabbert, E. Koletou, F. Psomopoulos, N. Kurbatova, S. Ramirez, C. Nelson, P. Natarajan, and B. Neupane, “Good machine learning practices: Learnings from the modern pharmaceutical discovery enterprise,” Computers in Biology and Medicine, vol. 177, p. 108632, 2024, doi: 10.1016/j.compbiomed.2024.108632.
- S. G. Sutcliffe et al., “Tracking SARS-CoV-2 variants of concern in wastewater: an assessment of nine computational tools using simulated genomic data,” Microbial Genomics, vol. 10, no. 5, 2024, doi: https://doi.org/10.1099/mgen.0.001249.
Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta–Omicron recombinant and a synthetic ‘novel’ lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1 % frequency, results were more reliable above a 5 % threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.
- G. I. Gavriilidis, V. Vasileiou, A. Orfanou, N. Ishaque, and F. Psomopoulos, “A mini-review on perturbation modelling across single-cell omic modalities,” Computational and Structural Biotechnology Journal, vol. 23, pp. 1886–1896, Dec. 2024, doi: 10.1016/j.csbj.2024.04.058.
- V. Makarov, C. Chabbert, E. Koletou, F. Psomopoulos, N. Kurbatova, S. Ramirez, C. Nelson, P. Natarajan, and B. Neupane, “Good machine learning practices: Learnings from the modern pharmaceutical discovery enterprise,” Computers in Biology and Medicine, vol. 177, p. 108632, Jul. 2024, doi: 10.1016/j.compbiomed.2024.108632.
2023
- I. Gkekas, A.-C. Vagiona, N. Pechlivanis, G. Kastrinaki, K. Pliatsika, S. Iben, K. Xanthopoulos, F. E. Psomopoulos, M. A. Andrade-Navarro, and S. Petrakis, “Intranuclear inclusions of polyQ-expanded ATXN1 sequester RNA molecules,” Frontiers in Molecular Neuroscience, vol. 16, Dec. 2023, doi: 10.3389/fnmol.2023.1280546.
- E. Sofou, G. Gkoliou, N. Pechlivanis, K. Pasentsis, K. Chatzistamatiou, F. Psomopoulos, T. Agorastos, and K. Stamatopoulos, “High risk HPV-positive women cervicovaginal microbial profiles in a Greek cohort: a retrospective analysis of the GRECOSELF study,” Frontiers in Microbiology, vol. 14, Nov. 2023, doi: 10.3389/fmicb.2023.1292230.
- K. A. Kyritsis, N. Pechlivanis, and F. Psomopoulos, “Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL),” Frontiers in Bioinformatics, vol. 3, Nov. 2023, doi: 10.3389/fbinf.2023.1275593.
- A. Iatrou et al., “N-Glycosylation of the Ig Receptors Shapes the Antigen Reactivity in Chronic Lymphocytic Leukemia Subset #201,” The Journal of Immunology, vol. 211, no. 5, pp. 743–754, Jul. 2023, doi: 10.4049/jimmunol.2300330.
- E. A. Huerta et al., “FAIR for AI: An interdisciplinary and international community building perspective,” Scientific Data, vol. 10, no. 1, Jul. 2023, doi: 10.1038/s41597-023-02298-6.
- E. Sofou, E. Vlachonikola, L. Zaragoza-Infante, M. Brüggemann, N. Darzentas, P. J. T. A. Groenen, M. Hummel, E. A. Macintyre, F. Psomopoulos, F. Davi, A. W. Langerak, and K. Stamatopoulos, “Clonotype definitions for immunogenetic studies: proposals from the EuroClonality NGS Working Group,” Leukemia, vol. 37, no. 8, pp. 1750–1752, Jun. 2023, doi: 10.1038/s41375-023-01952-7.
- A. Sachinidis, M. Trachana, A. Taparkou, G. Gavriilidis, P. Verginis, F. Psomopoulos, C. Adamichou, D. Boumpas, and A. Garyfallos, “Investigating the Role of T-bet+ B Cells (ABCs/DN) in the Immunopathogenesis of Systemic Lupus Erythematosus,” Mediterranean Journal of Rheumatology, vol. 34, no. 1, p. 117, 2023, doi: 10.31138/mjr.34.1.117.
- M. Tsagiopoulou, V. Chapaprieta, N. Russiñol, B. García-Torre, N. Pechlivanis, F. Nadeu, N. Papakonstantinou, N. Stavroyianni, A. Chatzidimitriou, F. Psomopoulos, E. Campo, K. Stamatopoulos, and J. I. Martin-Subero, “CHROMATIN ACTIVATION PROFILING OF STEREOTYPED CHRONIC LYMPHOCYTIC LEUKEMIAS REVEALS A SUBSET #8 SPECIFIC SIGNATURE,” Blood, Mar. 2023, doi: 10.1182/blood.2022016587.
- E. Vlachonikola et al., “T cell receptor gene repertoire profiles in subgroups of patients with chronic lymphocytic leukemia bearing distinct genomic aberrations,” Frontiers in Oncology, vol. 13, Feb. 2023, doi: 10.3389/fonc.2023.1097942.
- S. Hiltemann et al., “Galaxy Training: A powerful framework for teaching!,” PLOS Computational Biology, vol. 19, no. 1, p. e1010752, Jan. 2023, doi: 10.1371/journal.pcbi.1010752.
- R. M. Waterhouse, A.-F. Adam-Blondon, B. Balech, E. Barta, K. F. Heil, G. M. Hughes, L. S. Jermiin, M. Kalaš, J. Lanfear, E. Pafilis, A. C. Papageorgiou, F. Psomopoulos, N. Raes, J. Burgin, and T. Gabaldón, “The ELIXIR Biodiversity Community: Understanding short- and long-term changes in biodiversity,” F1000Research, vol. 12, p. 499, May 2023, doi: 10.12688/f1000research.133724.1.
2022
- M. Barker, N. P. Chue Hong, D. S. Katz, A.-L. Lamprecht, C. Martinez-Ortiz, F. Psomopoulos, J. Harrow, L. J. Castro, M. Gruenpeter, P. A. Martinez, and T. Honeyman, “Introducing the FAIR Principles for research software,” Sci. Data, vol. 9, no. 1, p. 622, Oct. 2022, doi: 10.1038/s41597-022-01710-x.
Research software is a fundamental and vital part of research yet significant challenges to discoverability, productivity,quality, reproducibility, and sustainability exist. Improving the practice of scholarship is a common goal of the open science, open source, and FAIR (Findable, Accessible, Interoperable and Reusable) communities and research software is now being understood as a type of digital object to which FAIR should be applied. This emergence reflects a maturation of the research community to better understand the crucial role of FAIR research software in maximising research value. The FAIR for Research Software (FAIR4RS) Working Group has adapted the FAIR Guiding Principles to create the FAIR Principles for Research Software (FAIR4RS Principles). The contents and context of the FAIR4RS Principles are summarised here to provide the basis for discussion of their adoption. Examples of implementation by organisations are provided to share information on how to maximise the value of research outputs, and to encourage others to amplify the importance and impact of this work.
- L. Zaragoza-Infante, V. Junet, N. Pechlivanis, S.-C. Fragkouli, S. Amprachamian, T. Koletsa, A. Chatzidimitriou, M. Papaioannou, K. Stamatopoulos, A. Agathangelidis, and F. Psomopoulos, “IgIDivA: immunoglobulin intraclonal diversification analysis,” Briefings in Bioinformatics, Aug. 2022, doi: 10.1093/bib/bbac349.
Intraclonal diversification (ID) within the immunoglobulin (IG) genes expressed by B cell clones arises due to ongoing somatic hypermutation (SHM) in a context of continuous interactions with antigen(s). Defining the nature and order of appearance of SHMs in the IG genes can assist in improved understanding of the ID process, shedding light into the ontogeny and evolution of B cell clones in health and disease. Such endeavor is empowered thanks to the introduction of high-throughput sequencing in the study of IG gene repertoires. However, few existing tools allow the identification, quantification and characterization of SHMs related to ID, all of which have limitations in their analysis, highlighting the need for developing a purpose-built tool for the comprehensive analysis of the ID process. In this work, we present the immunoglobulin intraclonal diversification analysis (IgIDivA) tool, a novel methodology for the in-depth qualitative and quantitative analysis of the ID process from high-throughput sequencing data. IgIDivA identifies and characterizes SHMs that occur within the variable domain of the rearranged IG genes and studies in detail the connections between identified SHMs, establishing mutational pathways. Moreover, it combines established and new graph-based metrics for the objective determination of ID level, combined with statistical analysis for the comparison of ID level features for different groups of samples. Of importance, IgIDivA also provides detailed visualizations of ID through the generation of purpose-built graph networks. Beyond the method design, IgIDivA has been also implemented as an R Shiny web application. IgIDivA is freely available at https://bio.tools/igidiva
- S. Laidou, D. Grigoriadis, S. Papanikolaou, S. Foutadakis, S. Ntoufa, M. Tsagiopoulou, G. Vatsellas, A. Anagnostopoulos, A. Kouvatsi, N. Stavroyianni, F. Psomopoulos, A. M. Makris, M. Agelopoulos, D. Thanos, A. Chatzidimitriou, N. Papakonstantinou, and K. Stamatopoulos, “The TΑp63/BCL2 axis represents a novel mechanism of clinical aggressiveness in chronic lymphocytic leukemia,” Blood Advances, vol. 6, no. 8, pp. 2646–2656, Apr. 2022, doi: 10.1182/bloodadvances.2021006348.
The TA-isoform of the p63 transcription factor (TAp63) has been reported to contribute to clinical aggressiveness in chronic lymphocytic leukemia (CLL) in a hitherto elusive way. Here, we sought to further understand and define the role of TAp63 in the pathophysiology of CLL. First, we found that elevated TAp63 expression levels are linked with adverse clinical outcomes, including disease relapse and shorter time-to-first treatment and overall survival. Next, prompted by the fact that TAp63 participates in an NF-κB/TAp63/BCL2 antiapoptotic axis in activated mature, normal B cells, we explored molecular links between TAp63 and BCL2 also in CLL. We documented a strong correlation at both the protein and the messenger RNA (mRNA) levels, alluding to the potential prosurvival role of TAp63. This claim was supported by inducible downregulation of TAp63 expression in the MEC1 CLL cell line using clustered regularly interspaced short palindromic repeats (CRISPR) system, which resulted in downregulation of BCL2 expression. Next, using chromatin immunoprecipitation (ChIP) sequencing, we examined whether BCL2 might constitute a transcriptional target of TAp63 and identified a significant binding profile of TAp63 in the BCL2 gene locus, across a genomic region previously characterized as a super enhancer in CLL. Moreover, we identified high-confidence TAp63 binding regions in genes mainly implicated in immune response and DNA-damage procedures. Finally, we found that upregulated TAp63 expression levels render CLL cells less responsive to apoptosis induction with the BCL2 inhibitor venetoclax. On these grounds, TAp63 appears to act as a positive modulator of BCL2, hence contributing to the antiapoptotic phenotype that underlies clinical aggressiveness and treatment resistance in CLL.
- N. Pechlivanis, M. Tsagiopoulou, M. C. Maniou, A. Togkousidis, E. Mouchtaropoulou, S. C. Chassalevris Taxiarchisand Chaintoutis, M. Petala, M. Kostoglou, T. Karapantsios, S. Laidou, E. Vlachonikola, A. Chatzidimitriou, A. Papadopoulos, N. Papaioannou, C. I. Dovas, A. Argiriou, and F. Psomopoulos, “Detecting SARS-CoV-2 lineages and mutational load in municipal wastewater and a use-case in the metropolitan area of Thessaloniki, Greece,” Scientific Reports, vol. 12, no. 1, p. 2659, Feb. 2022, doi: 10.1038/s41598-022-06625-6.
The COVID-19 pandemic represents an unprecedented global crisis necessitating novel approaches for, amongst others, early detection of emerging variants relating to the evolution and spread of the virus. Recently, the detection of SARS-CoV-2 RNA in wastewater has emerged as a useful tool to monitor the prevalence of the virus in the community. Here, we propose a novel methodology, called lineagespot, for the monitoring of mutations and the detection of SARS-CoV-2 lineages in wastewater samples using next-generation sequencing (NGS). Our proposed method was tested and evaluated using NGS data produced by the sequencing of 14 wastewater samples from the municipality of Thessaloniki, Greece, covering a 6-month period. The results showed the presence of SARS-CoV-2 variants in wastewater data. lineagespot was able to record the evolution and rapid domination of the Alpha variant (B.1.1.7) in the community, and allowed the correlation between the mutations evident through our approach and the mutations observed in patients from the same area and time periods. lineagespot is an open-source tool, implemented in R, and is freely available on GitHub and registered on bio.tools.
- T. G. Community, “The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update,” Nucleic Acids Research, vol. 50, no. W1, pp. W345–W351, Apr. 2022, doi: 10.1093/nar/gkac247.
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with \>230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.
- M. Tsagiopoulou, N. Pechlivanis, M. C. Maniou, and F. Psomopoulos, “InterTADs: integration of multi-omics data on topologically associated domains, application to chronic lymphocytic leukemia,” NAR Genomics and Bioinformatics, vol. 4, no. 1, Jan. 2022, doi: 10.1093/nargab/lqab121.
The integration of multi-omics data can greatly facilitate the advancement of research in Life Sciences by highlighting new interactions. However, there is currently no widespread procedure for meaningful multi-omics data integration. Here, we present a robust framework, called InterTADs, for integrating multi-omics data derived from the same sample, and considering the chromatin configuration of the genome, i.e. the topologically associating domains (TADs). Following the integration process, statistical analysis highlights the differences between the groups of interest (normal versus cancer cells) relating to (i) independent and (ii) integrated events through TADs. Finally, enrichment analysis using KEGG database, Gene Ontology and transcription factor binding sites and visualization approaches are available. We applied InterTADs to multi-omics datasets from 135 patients with chronic lymphocytic leukemia (CLL) and found that the integration through TADs resulted in a dramatic reduction of heterogeneity compared to individual events. Significant differences for individual events and on TADs level were identified between patients differing in the somatic hypermutation status of the clonotypic immunoglobulin genes, the core biological stratifier in CLL, attesting to the biomedical relevance of InterTADs. In conclusion, our approach suggests a new perspective towards analyzing multi-omics data, by offering reasonable execution time, biological benchmarking and potentially contributing to pattern discovery through TADs.
- A. Nicolaidis and F. Psomopoulos, “DNA coding and Gödel numbering,” Physica A: Statistical Mechanics and its Applications, vol. 594, p. 127053, 2022, doi: 10.1016/j.physa.2022.127053.
We consider a DNA strand as a mathematical statement. Inspired by the work of Kurt Gödel, we attach to each DNA strand a Gödel’s number, a product of prime numbers raised to appropriate powers. To each DNA chain corresponds a single Gödel’s number G, and inversely given a Gödel’s number G, we can specify the DNA chain it stands for. Next, considering a single DNA strand composed of N bases, we study the statistical distribution of g, the logarithm of G. Our assumption is that the choice of the mth term is random and with equal probability for the four possible outcomes. The ‘experiment’, to some extent, is similar to throwing N times a four-faces die. Through the moment generating function we obtain the discrete and then the continuum distribution of g. There is an excellent agreement between our formalism and simulated data. At the end we compare our formalism to actual data, to specify the presence of non-random fluctuations.
- O. Giraldo, R. Alves, D. Bampalikis, J. Fernandez, E. Martin del Pico, F. Psomopoulos, A. Via, and L. J. Castro, “A FAIRification roadmap for ELIXIR Software Management Plans.,” Research Ideas and Outcomes, vol. 8, p. e94608, 2022, doi: 10.3897/rio.8.e94608.
- C. Martinez-Ortiz, C. Goble, D. Katz, T. Honeyman, P. Martinez, M. Barker, L. J. Castro, N. Chue Hong, M. Gruenpeter, and J. Harrow, “How does software fit into the FDO landscape?,” Research Ideas and Outcomes, vol. 8, p. e95724, 2022, doi: 10.3897/rio.8.e95724.
2021
- M. Osathanunkul, N. Sawongta, W. Pheera, N. Pechlivanis, F. Psomopoulos, and P. Madesis, “Exploring plant diversity through soil DNA in Thai national parks for influencing land reform and agriculture planning,” PeerJ, vol. 9, p. e11753, Aug. 2021, doi: 10.7717/peerj.11753.
- A. C. Dimopoulos, K. Koukoutegos, F. E. Psomopoulos, and P. Moulos, “Combining Multiple RNA-Seq Data Analysis Algorithms Using Machine Learning Improves Differential Isoform Expression Analysis,” Methods and Protocols, vol. 4, no. 4, 2021, doi: 10.3390/mps4040068.
RNA sequencing has become the standard technique for high resolution genome-wide monitoring of gene expression. As such, it often comprises the first step towards understanding complex molecular mechanisms driving various phenotypes, spanning organ development to disease genesis, monitoring and progression. An advantage of RNA sequencing is its ability to capture complex transcriptomic events such as alternative splicing which results in alternate isoform abundance. At the same time, this advantage remains algorithmically and computationally challenging, especially with the emergence of even higher resolution technologies such as single-cell RNA sequencing. Although several algorithms have been proposed for the effective detection of differential isoform expression from RNA-Seq data, no widely accepted golden standards have been established. This fact is further compounded by the significant differences in the output of different algorithms when applied on the same data. In addition, many of the proposed algorithms remain scarce and poorly maintained. Driven by these challenges, we developed a novel integrative approach that effectively combines the most widely used algorithms for differential transcript and isoform analysis using state-of-the-art machine learning techniques. We demonstrate its usability by applying it on simulated data based on several organisms, and using several performance metrics; we conclude that our strategy outperforms the application of the individual algorithms. Finally, our approach is implemented as an R Shiny application, with the underlying data analysis pipelines also available as docker containers.
- M. Tsagiopoulou, A. Togkousidis, N. Pechlivanis, M. C. Maniou, A. Batsali, A. Matheakakis, C. Pontikoglou, and F. Psomopoulos, “miRkit: R Framework Analyzing miRNA PCR Array Data,” BMC Research Notes, vol. 14, no. 376, Sep. 2021, doi: 10.1186/s13104-021-05788-1.
- I. Walsh et al., “DOME: recommendations for supervised machine learning validation in biology,” Nature Methods, Jul. 2021, doi: 10.1038/s41592-021-01205-4.
- S. Ntoufa, M. Gerousi, S. Laidou, F. Psomopoulos, G. Tsiolas, T. Moysiadis, N. Papakonstantinou, L. Mansouri, A. Anagnostopoulos, N. Stavrogianni, S. Pospisilova, K. Plevova, A. M. Makris, R. Rosenquist, and K. Stamatopoulos, “RPS15 mutations rewire RNA translation in chronic lymphocytic leukemia,” Blood Advances, vol. 5, no. 13, pp. 2788–2792, Jul. 2021, doi: 10.1182/bloodadvances.2020001717.
- N. Pechlivanis, A. Togkousidis, M. Tsagiopoulou, S. Sgardelis, I. Kappas, and F. Psomopoulos, “A Computational Framework for Pattern Detection on Unaligned Sequences: An Application on SARS-CoV-2 Data,” Frontiers in Genetics, vol. 12, May 2021, doi: 10.3389/fgene.2021.618170.
- M. Tsagiopoulou, M. C. Maniou, N. Pechlivanis, A. Togkousidis, M. Kotrová, T. Hutzenlaub, I. Kappas, A. Chatzidimitriou, and F. Psomopoulos, “UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction,” Frontiers in Genetics, vol. 12, May 2021, doi: 10.3389/fgene.2021.660366.
- K. Gemenetzi, F. Psomopoulos, A. A. Carriles, M. Gounari, C. Minici, K. Plevova, L.-A. Sutton, M. Tsagiopoulou, P. Baliakas, K. Pasentsis, A. Anagnostopoulos, R. Sandaltzopoulos, R. Rosenquist, F. Davi, S. Pospisilova, P. Ghia, K. Stamatopoulos, M. Degano, and A. Chatzidimitriou, “Higher-order immunoglobulin repertoire restrictions in CLL: the illustrative case of stereotyped subsets 2 and 169,” Blood, vol. 137, no. 14, pp. 1895–1904, Apr. 2021, doi: 10.1182/blood.2020005216.
- M. Velegraki, N. Papakonstantinou, L. Kalaitzaki, S. Ntoufa, S. Laidou, M. Tsagiopoulou, N. Bizymi, A. Damianaki, I. Mavroudi, C. Pontikoglou, and H. A. Papadaki, “Increased proportion and altered properties of intermediate monocytes in the peripheral blood of patients with lower risk Myelodysplastic Syndrome,” Blood Cells, Molecules, and Diseases, vol. 86, p. 102507, Feb. 2021, doi: 10.1016/j.bcmd.2020.102507.
- M. Gerousi, F. Psomopoulos, K. Kotta, M. Tsagiopoulou, N. Stavroyianni, A. Anagnostopoulos, A. Anastasiadis, M. Gkanidou, I. Kotsianidis, S. Ntoufa, and K. Stamatopoulos, “The Calcitriol/Vitamin D Receptor System Regulates Key Immune Signaling Pathways in Chronic Lymphocytic Leukemia,” Cancers, vol. 13, no. 2, 2021, doi: 10.3390/cancers13020285.
It has been proposed that vitamin D may play a role in prevention and treatment of cancer while epidemiological studies have linked vitamin D insufficiency to adverse disease outcomes in various B cell malignancies, including chronic lymphocytic leukemia (CLL). In this study, we sought to obtain deeper biological insight into the role of vitamin D and its receptor (VDR) in the pathophysiology of CLL. To this end, we performed expression analysis of the vitamin D pathway molecules; complemented by RNA-Sequencing analysis in primary CLL cells that were treated in vitro with calcitriol, the biologically active form of vitamin D. In addition, we examined calcitriol effects ex vivo in CLL cells cultured in the presence of microenvironmental signals, namely anti-IgM/CD40L, or co-cultured with the supportive HS-5 cells; and, CLL cells from patients under ibrutinib treatment. Our study reports that the calcitriol/VDR system is functional in CLL regulating signaling pathways critical for cell survival and proliferation, including the TLR and PI3K/AKT pathways. Moreover, calcitriol action is likely independent of the microenvironmental signals in CLL, since it was not significantly affected when combined with anti-IgM/CD40L or in the context of the co-culture system. This finding was also supported by our finding of preserved calcitriol signaling capacity in CLL patients under ibrutinib treatment. Overall, our results indicate a relevant biological role for vitamin D in CLL pathophysiology and allude to the potential clinical utility of vitamin D supplementation in patients with CLL.
2020
- A. Agathangelidis, C. Galigalidou, L. Scarfò, T. Moysiadis, A. Rovida, M. Gounari, F. Psomopoulos, P. Ranghetti, A. Galanis, F. Davi, K. Stamatopoulos, A. Chatzidimitriou, and P. Ghia, “Infrequent ‘chronic lymphocytic leukemia-specific’ immunoglobulin stereotypes in aged individuals with or without low-count monoclonal B-cell lymphocytosis,” Haematologica, vol. 106, no. 4, pp. 1178–1181, Jun. 2020, doi: 10.3324/haematol.2020.247908.
- A. Vardi et al., “T-Cell Dynamics in Chronic Lymphocytic Leukemia under Different Treatment Modalities,” Clinical Cancer Research, vol. 26, no. 18, pp. 4958–4969, 2020, doi: 10.1158/1078-0432.CCR-19-3827.
Purpose: Using next-generation sequencing (NGS), we recently documented T-cell oligoclonality in treatment-naı̈ve chronic lymphocytic leukemia (CLL), with evidence indicating T-cell selection by restricted antigens.Experimental Design: Here, we sought to comprehensively assess T-cell repertoire changes during treatment in relation to (i) treatment type [fludarabine-cyclophosphamide-rituximab (FCR) versus ibrutinib (IB) versus rituximab-idelalisib (R-ID)], and (ii) clinical response, by combining NGS immunoprofiling, flow cytometry, and functional bioassays.Results: T-cell clonality significantly increased at (i) 3 months in the FCR and R-ID treatment groups, and (ii) over deepening clinical response in the R-ID group, with a similar trend detected in the IB group. Notably, in constrast to FCR that induced T-cell repertoire reconstitution, B-cell receptor signaling inhibitors (BcRi) preserved pretreatment clones. Extensive comparisons both within CLL as well as against T-cell receptor sequence databases showed little similarity with other entities, but instead revealed major clonotypes shared exclusively by patients with CLL, alluding to selection by conserved CLL-associated antigens. We then evaluated the functional effect of treatments on T cells and found that (i) R-ID upregulated the expression of activation markers in effector memory T cells, and (ii) both BcRi improved antitumor T-cell immune synapse formation, in marked contrast to FCR.Conclusions: Taken together, our NGS immunoprofiling data suggest that BcRi retain T-cell clones that may have developed against CLL-associated antigens. Phenotypic and immune synapse bioassays support a concurrent restoration of functionality, mostly evident for R-ID, arguably contributing to clinical response.
- C. C. Austin et al., “Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group [version 1; peer review: 1 approved, 2 approved with reservations],” Wellcome Open Research, vol. 5, no. 267, 2020, doi: 10.12688/wellcomeopenres.16378.1.
- M. T. Kotouza, K. Gemenetzi, C. Galigalidou, E. Vlachonikola, N. Pechlivanis, A. Agathangelidis, R. Sandaltzopoulos, P. A. Mitkas, K. Stamatopoulos, A. Chatzidimitriou, and F. E. Psomopoulos, “TRIP - T cell receptor/immunoglobulin profiler,” BMC Bioinformatics, vol. 21, no. 422, Sep. 2020, doi: 10.1186/s12859-020-03669-1.
- A.-C. Vagiona, M. A. Andrade-Navarro, F. Psomopoulos, and S. Petrakis, “Dynamics of a Protein Interaction Network Associated to the Aggregation of polyQ-Expanded Ataxin-1,” Genes, vol. 11, no. 10, p. 1129, Sep. 2020, doi: 10.3390/genes11101129.
- F. E. Psomopoulos, J. van Helden, C. Médigue, A. Chasapi, and C. A. Ouzounis, “Ancestral state reconstruction of metabolic pathways across pangenome ensembles,” 2020, doi: 10.1099/mgen.0.000429.
As genome sequencing efforts are unveiling the genetic diversity of the biosphere with an unprecedented speed, there is a need to accurately describe the structural and functional properties of groups of extant species whose genomes have been sequenced, as well as their inferred ancestors, at any given taxonomic level of their phylogeny. Elaborate approaches for the reconstruction of ancestral states at the sequence level have been developed, subsequently augmented by methods based on gene content. While these approaches of sequence or gene-content reconstruction have been successfully deployed, there has been less progress on the explicit inference of functional properties of ancestral genomes, in terms of metabolic pathways and other cellular processes. Herein, we describe PathTrace, an efficient algorithm for parsimony-based reconstructions of the evolutionary history of individual metabolic pathways, pivotal representations of key functional modules of cellular function. The algorithm is implemented as a five-step process through which pathways are represented as fuzzy vectors, where each enzyme is associated with a taxonomic conservation value derived from the phylogenetic profile of its protein sequence. The method is evaluated with a selected benchmark set of pathways against collections of genome sequences from key data resources. By deploying a pangenome-driven approach for pathway sets, we demonstrate that the inferred patterns are largely insensitive to noise, as opposed to gene-content reconstruction methods. In addition, the resulting reconstructions are closely correlated with the evolutionary distance of the taxa under study, suggesting that a diligent selection of target pangenomes is essential for maintaining cohesiveness of the method and consistency of the inference, serving as an internal control for an arbitrary selection of queries. The PathTrace method is a first step towards the large-scale analysis of metabolic pathway evolution and our deeper understanding of functional relationships reflected in emerging pangenome collections.
- K. T. Gurwitz et al., “A framework to assess the quality and impact of bioinformatics training across ELIXIR,” PLOS Computational Biology, vol. 16, no. 7, pp. 1–12, Jul. 2020, doi: 10.1371/journal.pcbi.1007976.
ELIXIR is a pan-European intergovernmental organisation for life science that aims to coordinate bioinformatics resources in a single infrastructure across Europe; bioinformatics training is central to its strategy, which aims to develop a training community that spans all ELIXIR member states. In an evidence-based approach for strengthening bioinformatics training programmes across Europe, the ELIXIR Training Platform, led by the ELIXIR EXCELERATE Quality and Impact Assessment Subtask in collaboration with the ELIXIR Training Coordinators Group, has implemented an assessment strategy to measure quality and impact of its entire training portfolio. Here, we present ELIXIR’s framework for assessing training quality and impact, which includes the following: specifying assessment aims, determining what data to collect in order to address these aims, and our strategy for centralised data collection to allow for ELIXIR-wide analyses. In addition, we present an overview of the ELIXIR training data collected over the past 4 years. We highlight the importance of a coordinated and consistent data collection approach and the relevance of defining specific metrics and answer scales for consortium-wide analyses as well as for comparison of data across iterations of the same course.
- L. Garcia et al., “Ten simple rules for making training materials FAIR,” PLOS Computational Biology, vol. 16, no. 5, pp. 1–9, May 2020, doi: 10.1371/journal.pcbi.1007854.
Author summary Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it’s sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They’re often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all.
- L. Stamatia et al., “Nuclear inclusions of pathogenic ataxin-1 induce oxidative stress and perturb the protein synthesis machinery,” Redox Biology, vol. 32, p. 101458, 2020, doi: 10.1016/j.redox.2020.101458.
Spinocerebellar ataxia type-1 (SCA1) is caused by an abnormally expanded polyglutamine (polyQ) tract in ataxin-1. These expansions are responsible for protein misfolding and self-assembly into intranuclear inclusion bodies (IIBs) that are somehow linked to neuronal death. However, owing to lack of a suitable cellular model, the downstream consequences of IIB formation are yet to be resolved. Here, we describe a nuclear protein aggregation model of pathogenic human ataxin-1 and characterize IIB effects. Using an inducible Sleeping Beauty transposon system, we overexpressed the ATXN1(Q82) gene in human mesenchymal stem cells that are resistant to the early cytotoxic effects caused by the expression of the mutant protein. We characterized the structure and the protein composition of insoluble polyQ IIBs which gradually occupy the nuclei and are responsible for the generation of reactive oxygen species. In response to their formation, our transcriptome analysis reveals a cerebellum-specific perturbed protein interaction network, primarily affecting protein synthesis. We propose that insoluble polyQ IIBs cause oxidative and nucleolar stress and affect the assembly of the ribosome by capturing or down-regulating essential components. The inducible cell system can be utilized to decipher the cellular consequences of polyQ protein aggregation. Our strategy provides a broadly applicable methodology for studying polyQ diseases.
- M. Tsagiopoulou, V. Chapaprieta, Duran-Ferrer Martı́, T. Moysiadis, F. Psomopoulos, P. Kollia, N. Papakonstantinou, E. Campo, K. Stamatopoulos, and J. I. Martin-Subero, “Chronic lymphocytic leukemias with trisomy 12 show a distinct DNA methylation profile linked to altered chromatin activation,” Haematologica, 2020, doi: 10.3324/haematol.2019.240721.
-
- A. Agathangelidis, C. Galigalidou, L. Scarfò, T. Moysiadis, A. Rovida, E. Vlachonikola, E. Sofou, F. Psomopoulos, A. Vardi, P. Ranghetti, A. Siorenta, A. Galanis, K. Stamatopoulos, A. Chatzidimitriou, and P. Ghia, “High-throughput analysis of the T cell receptor gene repertoire in low-count monoclonal B cell lymphocytosis reveals a distinct profile from chronic lymphocytic leukemia,” Haematologica, 2020, doi: 10.3324/haematol.2019.221275.
-
- E. Gavriilaki et al., “Pretransplant Genetic Susceptibility: Clinical Relevance in Transplant-Associated Thrombotic Microangiopathy,” Thrombosis and Haemostasis, vol. 120, no. 04, pp. 638–646, 2020, doi: 10.1055/s-0040-1702225.
- M. T. Kotouza, F. E. Psomopoulos, and P. A. Mitkas, “A dockerized framework for hierarchical frequency-based document clustering on cloud computing infrastructures,” Journal of Cloud Computing, vol. 9, no. 2, pp. 1–17, 2020, doi: 10.1186/s13677-019-0150-y.
Scalable big data analysis frameworks are of paramount importance in the modern web society, which is characterized by a huge number of resources, including electronic text documents. Document clustering is an important field in text mining and is commonly used for document organization, browsing, summarization and classification. Hierarchical clustering methods construct a hierarchy structure that, combined with the produced clusters, can be useful in managing documents, thus making the browsing and navigation process easier and quicker, and providing only relevant information to the users’ queries by leveraging the structure relationships. Nevertheless, the high computational cost and memory usage of baseline hierarchical clustering algorithms render them inappropriate for the vast number of documents that must be handled daily. In this paper, we propose a new scalable hierarchical clustering framework, which uses the frequency of the topics in the documents to overcome these limitations. Our work consists of a binary tree construction algorithm that creates a hierarchy of the documents using three metrics (Identity, Entropy, Bin Similarity), and a branch breaking algorithm which composes the final clusters by applying thresholds to each branch of the tree. The clustering algorithm is followed by a meta-clustering module which makes use of graph theory to obtain insights in the leaf clusters’ connections. The feature vectors representing each document derive from topic modeling. At the implementation level, the clustering method has been dockerized in order to facilitate its deployment on cloud computing infrastructures. Finally, the proposed framework is evaluated on several datasets of varying size and content, achieving significant reduction in both memory consumption and computational time over existing hierarchical clustering algorithms. The experiments also include performance testing on cloud resources using different setups and the results are promising.
2019
- A.-L. Lamprecht, L. Garcia, M. Kuzak, C. Martinez, R. Arcila, E. Martin Del Pico, V. Dominguez Del Angel, S. van de Sandt, J. Ison, P. A. Martinez, P. McQuilton, A. Valencia, J. Harrow, F. Psomopoulos, J. L. Gelpi, N. Chue Hong, C. Goble, and S. Capella-Gutierrez, “Towards FAIR principles for research software,” Data Science, vol. 2, no. 2, pp. 1–23, 2019, doi: 10.3233/DS-190026.
The FAIR Guiding Principles, published in 2016, aim to improve the findability, accessibility, interoperability and reusability of digital research objects for both humans and machines. Until now the FAIR principles have been mostly applied to research data. The ideas behind these principles are, however, also directly relevant to research software. Hence there is a distinct need to explore how the FAIR principles can be applied to software. In this work, we aim to summarize the current status of the debate around FAIR and software, as basis for the development of community-agreed principles for FAIR research software in the future. We discuss what makes software different from data with regard to the application of the FAIR principles, and which desired characteristics of research software go beyond FAIR. Then we present an analysis of where the existing principles can directly be applied to software, where they need to be adapted or reinterpreted, and where the definition of additional principles is required. Here interoperability has proven to be the most challenging principle, calling for particular attention in future discussions. Finally, we outline next steps on the way towards definite FAIR principles for research software.
- M. Kuzak, J. Harrow, P. A. Martinez, F. E. Psomopoulos, and A. Via, “ELIXIR Europe on the Road to Sustainable Research Software,” Biodiversity Information Science and Standards, vol. 3, p. e37677, 2019, doi: 10.3897/biss.3.37677.
ELIXIR (ELIXIR Europe 2019a) is an intergovernmental organization that brings together life science resources across Europe. These resources include databases, software tools, training materials, cloud storage, and supercomputers. One of the goals of ELIXIR is to coordinate these resources so that they form a single infrastructure. This infrastructure makes it easier for scientists to find and share data, exchange expertise, and agree on best practices. ELIXIR’s activities are divided into the following five areas: Data, Tools, Interoperability, Compute and Training, each known as “platform”. The ELIXIR Tools Platform works to improve the discovery, quality and sustainability of software resources. The Software Development Best Practices task of the Tools Platform aims to raise the quality and sustainability of research software by producing, adopting, and promoting information standards and best practices relevant to the software development life cycle. We have published four (4OSS) simple recommendations to encourage best practices in research software (Jiménez et al. 2017) and the Top 10 metrics for recommended life science software practices (Artaza et al. 2016). The 4OSS simple recommendations are as follows: (1) Develop a publicly accessible open source code from day one, (2) Make software easy to discover by providing software metadata via a popular community registry, (3) Adopt a license and comply with the licenses of third-party dependencies, and (4) Have clear and transparent contribution, governance and communication processes. In order to encourage researchers and developers to adopt the 4OSS recommendations and build FAIR (Findable, Accessible, Interoperable and Reusable) software, the best practices group, in partnership with the ELIXIR Training platform, The Carpentries (Carpentries 2019, ELIXIR Europe 2019b), and other communities, are creating a collection of training materials (Kuzak et al. 2019). The next step is to adopt, promote, and recognise these information standards and best practices. The group will address this by (i) developing comprehensive guidelines for software curation, (ii) through training researchers and developers towards the adoption of software best practices and (iii) improvement of the usability of Tools Platform products. Additionally, a direct outcome of this task will be a software management plan template, connected to a concise description of the guidelines for open research software; and production of a white paper for the software development management plan for ELIXIR, which can be consequently used to produce training materials. We will work with the newly formed ReSA (Research Software Alliance) to facilitate the adoption of this plan for the broader community.
- F. F. Parlapani, S. Michailidou, D. A. Anagnostopoulos, S. Koromilas, K. Kios, K. Pasentsis, F. Psomopoulos, A. Argiriou, S. A. Haroutounian, and I. S. Boziaris, “Bacterial communities and potential spoilage markers of whole blue crab (Callinectes sapidus) stored under commercial simulated conditions,” Food Microbiology, vol. 82, pp. 325–333, 2019, doi: 10.1016/j.fm.2019.03.011.
Bacterial communities composition using 16S Next Generation Sequencing (NGS) and Volatile Organic Compounds (VOCs) profile of whole blue crabs (Callinectes sapidus) stored at 4 and 10 °C (proper and abuse temperature) simulating real storage conditions were performed. Conventional microbiological and chemical analyses (Total Volatile Base-Nitrogen/TVB-N and Trimethylamine-Nitrogen/TMA-N) were also carried out. The rejection time point was 10 and 6 days for the whole crabs stored at 4 and 10 °C, respectively, as determined by development of unpleasant odors, which coincided with crabs death. Initially, the Aerobic Plate Count (APC) was 4.87 log cfu/g and increased by 3 logs at the rejection time. The 16S NGS analysis of DNA extracted directly from the crab tissue (culture-independent method), showed that the initial microbiota of the blue crab mainly consisted of Candidatus Bacilloplasma, while potential pathogens e.g. Listeria monocytogenes, Pseudomonas aeruginosa and Acinetobacter baumannii, were also found. At the rejection point, bacteria of Rhodobacteraceae family (52%) and Vibrio spp. (40.2%) dominated at 4 and 10 °C, respectively. TVB-N and TMA-N also increased, reaching higher values at higher storage temperature. The relative concentrations of some VOCs such as 1-octen-3-ol, trans-2-octenal, trans,trans-2,4-heptadienal, 2-butanone, 3-butanone, 2-heptanone, ethyl isobutyrate, ethyl acetate, ethyl-2-methylbutyrate, ethyl isovalerate, hexanoic acid ethyl ester and indole, exhibited an increasing trend during crab storage, making them promising spoilage markers. The composition of microbial communities at different storage temperatures was examined by 16S amplicon meta-barcoding analysis. This kind of analysis in conjugation with the volatile profile can be used to explore the microbiological quality and further assist towards the application of the appropriate strategies to extend crab shelf-life and protect consumer’s health.
- A. M. Kintsakis, F. E. Psomopoulos, and P. A. Mitkas, “Reinforcement Learning based scheduling in a workflow management system,” Engineering Applications of Artificial Intelligence, vol. 81, pp. 94–106, 2019, doi: 10.1016/j.engappai.2019.02.013.
Any computational process from simple data analytics tasks to training a machine learning model can be described by a workflow. Many workflow management systems (WMS) exist that undertake the task of scheduling workflows across distributed computational resources. In this work, we introduce a WMS that leverages machine learning to predict workflow task runtime and the probability of failure of task assignments to execution sites. The expected runtime of workflow tasks can be used to approximate the weight of the workflow graph branches in respect to the total workflow workload and the ability to anticipate task failures can discourage task assignments that are unlikely to succeed. We demonstrate that the proposed machine learning models can lead to significantly more informed scheduling decisions that minimize task failures and utilize execution sites more efficiently, thus leading to reduced workflow runtime. Additionally, we train a modified sequence-to-sequence neural network architecture via reinforcement learning to perform scheduling decisions as part of a WMS. Our approach introduces a WMS that can drastically improve its scheduling performance by independently learning over time, without external intervention or reliance on any specific heuristic or optimization technique. Finally, we test our approach in real-world scenarios utilizing computationally demanding and data intensive workflows and evaluate its performance against existing scheduling methodologies traditionally used in WMSes. The performance evaluation outcome confirms that the proposed approach significantly outperforms the other scheduling algorithms in a consistent manner and achieves the best execution runtime with the lowest number of failed tasks and communication costs.
- A. Agathangelidis, F. Psomopoulos, and K. Stamatopoulos, “Stereotyped B Cell Receptor Immunoglobulins in B Cell Lymphomas,” Methods in Molecular Biology: "Lymphoma: Methods and Protocols", pp. 139–155, 2019, doi: 10.1007/978-1-4939-9151-8_7.
Comprehensive analysis of the clonotypic B cell receptor immunoglobulin (BcR IG) gene rearrangement sequences in patients with mature B cell neoplasms has led to the identification of significant repertoire restrictions, culminating in the discovery of subsets of patients expressing highly similar, stereotyped BcR IG. This finding strongly supports selection by common epitopes or classes of structurally similar epitopes in the ontogeny of these tumors. BcR IG stereotypy was initially described in chronic lymphocytic leukemia (CLL), where the stereotyped fraction of the disease accounts for a remarkable one-third of patients. However, subsequent studies showed that stereotyped BcR IG are also present in other neoplasms of mature B cells, including mantle cell lymphoma (MCL) and splenic marginal zone lymphoma (SMZL). Subsequent cross-entity comparisons led to the conclusion that stereotyped IG are mostly “disease-specific,” implicating distinct immunopathogenetic processes. Interestingly, mounting evidence suggests that a molecular subclassification of lymphomas based on BcR IG stereotypy is biologically and clinically relevant. Indeed, particularly in CLL, patients assigned to the same subset due to expressing a particular stereotyped BcR IG display remarkably consistent biological background and clinical course, at least for major and well-studied subsets. Thus, the robust assignment to stereotyped subsets may assist in the identification of mechanisms underlying disease onset and progression, while also refining risk stratification. In this book chapter, we provide an overview of the recent BcR IG stereotypy studies in mature B cell malignancies and outline previous and current methodological approaches used for the identification of stereotyped IG.
- M. Wu, F. Psomopoulos, S. J. Khalsa, and A. de Waard, “Data Discovery Paradigms: User Requirements and Recommendations for Data Repositories,” Data Science Journal, vol. 18, no. 1, p. 13, 2019, doi: 10.5334/dsj-2019-003.
As data repositories make more data openly available it becomes challenging for researchers to find what they need either from a repository or through web search engines. This study attempts to investigate data users’ requirements and the role that data repositories can play in supporting data discoverability by meeting those requirements. We collected 79 data discovery use cases (or data search scenarios), from which we derived nine functional requirements for data repositories through qualitative analysis. We then applied usability heuristic evaluation and expert review methods to identify best practices that data repositories can implement to meet each functional requirement. We propose the following ten recommendations for data repository operators to consider for improving data discoverability and user’s data search experience: 1. Provide a range of query interfaces to accommodate various data search behaviours. 2. Provide multiple access points to find data. 3. Make it easier for researchers to judge relevance, accessibility and reusability of a data collection from a search summary. 4. Make individual metadata records readable and analysable. 5. Enable sharing and downloading of bibliographic references. 6. Expose data usage statistics. 7. Strive for consistency with other repositories. 8. Identify and aggregate metadata records that describe the same data object. 9. Make metadata records easily indexed and searchable by major web search engines. 10. Follow API search standards and community adopted vocabularies for interoperability.
Conferences and Announcements
2024
- N. Pechlivanis, A. Anastasiadou, A. Papageorgiou, E. Pafilis, and F. Psomopoulos, “Odyssey: an Interactive R Shiny App Approach to explore Molecular Biodiversity in Greece,” Sep. 2024, doi: 10.5281/ZENODO.14186452.
- S.-C. Fragkouli, N. Pechlivanis, A. Anastasiadou, G. Karakatsoulis, A. Orfanou, P. Kollia, A. Agathangelidis, and F. Psomopoulos, “Synth4bench: generating synthetic genomics data for the evaluation of somatic variant callers,” Sep. 2024, doi: 10.5281/ZENODO.14186509.
2023
- G. Gkoliou, N. Pechlivanis, S. Chatzileontiadou, C. Xydopoulou, C. Frouzaki, G. Karakatsoulis, E. Vlachonikola, M. Gerousi, F. Psomopoulos, M. Papaioannou, K. Chlichlia, K. Stamatopoulos, E. Hatjiharissi, and A. Chatzidimitriou, “Distinct T Cell Receptor Gene Repertoires and T Cell Subset Distribution in Peripheral Blood and Bone Marrow of Patients with Multiple Myeloma,” in Blood, Nov. 2023, vol. 142, no. Supplement 1, pp. 4677–4677, doi: 10.1182/blood-2023-185872.
- M. Gerousi, G. Gavriilidis, S. Keisaris, A. Kourouni, A. Orfanou, A. Iatrou, A. Pseftogkas, G. Mosialos, E. Theodosiou, A. Chatzidimitriou, F. Psomopoulos, P. Ghia, K. Stamatopoulos, and K. Xanthopoulos, “The Deubiquitinase CYLD Acts As an Oncogene in a Cellular Model of Chronic Lymphocytic Leukemia,” in Blood, Nov. 2023, vol. 142, no. Supplement 1, pp. 3265–3265, doi: 10.1182/blood-2023-188983.
- G. Gkoliou, N. Pechlivanis, S. Chatzileontiadou, C. Xydopoulou, C. Frouzaki, G. Karakatsoulis, E. Vlachonikola, M. Gerousi, F. Psomopoulos, A. Siorenta, M. Papaioannou, K. Chlichlia, K. Stamatopoulos, E. Hatjiharissi, and A. Chatzidimitriou, “P835: IN SILICO PREDICTION REVEALS PUTATIVE T-CELL CLASS I/II NEOEPITOPES WITHIN THE CLONOTYPIC IMMUNOGLOBULIN HEAVY AND LIGHT CHAINS IN PATIENTS WITH MULTIPLE MYELOMA,” in HemaSphere, Aug. 2023, vol. 7, no. S3, p. e2734671, doi: 10.1097/01.hs9.0000970244.27346.71.
- A. Iatrou, E. Sofou, E. Kotroni, L. Ann Sutton, M. Frenquelli, R. Sandaltzopoulos, I. Sakellari, N. Stavrogianni, F. Psomopoulos, P. Ghia, R. Rosenquist, A. Agathangelidis, A. Chatzidimitriou, and K. Stamatopoulos, “P605: IMMUNOGENETICS AND ANTIGEN REACTIVITY PROFILING CONTRIBUTE TO UNRAVELLING THE ONTOGENY OF CLL STEREOTYPED SUBSET #4,” in HemaSphere, Aug. 2023, vol. 7, no. S3, p. e7074446, doi: 10.1097/01.hs9.0000969324.70744.46.
- G. Gavriilidis, S.-C. Fragkouli, E. Theodosiou, V. Vasileiou, S. Keisaris, and F. Psomopoulos, “SCell-wise fluxomics of Chronic Lymphocytic Leukemia single-cell data reveal novel metabolic adaptations to Ibrutinib therapy, 31st Conference in Intelligent Systems For Molecular Biology and the 22nd European Conference On Computational Biology (ISΜB-ECCB23) ,” Jul. 2023, doi: TBA.
- S.-C. Fragkouli, N. Pechlivanis, A. Agathangelidis, and F. Psomopoulos, “Synthetic Genomics Data Generation and Evaluation for the Use Case of Benchmarking Somatic Variant Calling Algorithms,” Jul. 2023, doi: 10.7490/f1000research.1119575.1.
2022
- Styliani-Christina Fragkouli, A. Agathangelidis, and F. E. Psomopoulos, “Shedding Light on Somatic Variant Calling,” 2022, doi: 10.13140/RG.2.2.12701.18402.
- K. Kyritsis, N. Pechlivanis, V. Vasileiou, A. Magklara, A. Kougioumtzi, P. Dafopoulos, E. Ntzioni, E. Tsarouchi, D. Sakellariou, M. Kotoulas, S. Arampatzis, P. Chatzikamaris, E. Siomou, M. Argyraki, D. Botskaris, I. Talianidis, and F. Psomopoulos, “Poster 3657: GENOPTICS: An Intuitive Platform of Visual Analytics for Integrative Analysis of Large-scale Multi-omics Data,” Sep. 2022, doi: 10.7490/f1000research.1119204.1.
- K. Kyritsis, G.-N. Kartanos, V. Siarkou, and F. Psomopoulos, “Poster 9113: k-mer and GWAS Approaches to Identify Host-Specific Genomic Determinants in Klebsiella Pneumoniae,” Sep. 2022, doi: 10.7490/f1000research.1119205.1.
- G. Gavriilidis, S. Dimitsaki, V. Vasileiou, and F. Psomopoulos, “Biologically informed neural network identifies unfolded proteinrResponse as key pathway in critical COVID-19,” 2022, doi: 10.7490/F1000RESEARCH.1119199.1.
- N. Pechlivanis, G. Karakatsoulis, S. Sgardelis, I. Kappas, and F. Psomopoulos, “Poster 3126: Microbial co-accurance Network Reveals ClimateE and Geographic Patterns for Soil Diversity on the Planet,” Sep. 2022.
- A. Mitsigkolas, N. Pechlivanis, and F. Psomopoulos, “Poster 8471: Assesing SARS-COV-2 Evolution Through the Analysis Of Emerging Mutations,” Sep. 2022.
- A. Mitsigkolas, N. Pechlivanis, and F. Psomopoulos, “Assessing SARS-CoV-2 evolution through the analysis of emerging mutations,” Oct. 2022, doi: 10.1101/2022.10.25.513701.
- N. Pechlivanis, T. Maria, M. Maria-Christina, T. Anastasis, M. Evangelia, C. Taxiarchis, C. Serafeim C., P. Maria, K. Margaritis, K. Thodoris, L. Stamatia, V. Elisavet, C. Anastasia, P. Agis, P. Nikolaos, D. Chrysostomos I., A. Anagnostis, and P. Fotis, “lineagespot: Detecting SARS-CoV-2 lineages and mutational load in municipal wastewater,” Jul. 2022, doi: 10.7490/f1000research.1119052.1.
2021
- N. Pechlivanis, A. Togkousidis, M. C. Maniou, M. Tsagiopoulou, and F. Psomopoulos, “Developing a novel feature space for sequence data analysis; a use-case on SARS-CoV-2 data,” 2021, doi: 10.5281/ZENODO.4897477.
- D. S. Katz, M. Barker, L. J. Garcia Castro, N. P. Chue Hong, M. Gruenpeter, J. L. Harrow, C. Martinez Ortiz, P. A. Martinez, and F. Psomopoulos, “FAIR Research Software and Science Gateways,” May 2021, doi: 10.5281/zenodo.4923124.
In recent years, the scholarly community has examined its culture and practices, and found a set of overlapping areas in which to improve, including open science (making both the outputs and processes of scholarly research available),reproducibility (increasing trust in scholarly results by making them repeatable by others), and FAIR (making scholarly out-puts, specifically data, findable, accessible, interoperable, and reusable). While the scholarly community is generally supportive of all of these efforts, the degree of support wanes with both the amount of extra work that is needed and the lack of clear details on how to achieve them, along with misaligned incentives. In this lightning talk, we will initially focus on FAIR and the details of how it can be applied to research software. This leads to a number of distinct challenges, including scope (defining both software and research software), principles (defining what find-able, accessible, interoperable, and reusable mean for research software), implementation (developing guidelines and instructions for how to make research software FAIR), and metrics (providing a means to measure the FAIRness of research software).Science gateways include a number of different types of software, for example, the frameworks used to construct the gateways themselves, tools provided by the community that run in the gateway, and software implemented as services with which the gateways interact. The second part of this lightning talk will discuss how FAIR principles for research software apply to each of these types of software common in science gateways.We will close by explaining how members of the science gateway community can become more involved in the FAIR for research software process, to learn, to contribute, or to champion.
2020
- D. S. Katz, M. Barker, N. Chue Hong, L. J. Garcia-Castro, M. Gruenpeter, J. Harrow, M. Kuzak, P. Martinez Villegas, and F. E. Psomopoulos, “Toward defining and implementing FAIR for research software,” in AGU Fall Meeting Abstracts, Dec. 2020, vol. 2020, pp. IN037–01, doi: 10.5281/zenodo.4085311.
- L. J. Garcia Castro, M. Barker, N. P. Chue Hong, F. Psomopoulos, J. Harrow, D. S. Katz, M. Kuzak, P. A. Martinez, and A. Via, “Software as a first-class citizen in research,” Nov. 2020, doi: 10.4126/FRL01-006423290.
In recent years the importance of software in research has become increasingly recognized by the research community. This journey still has a long way to go. Research data is currently backed by a variety of efforts to implement and make FAIR principles a reality, complemented by Data Management Plans. Both FAIR data principles and management plans offer elements that could be useful for research software but none of them can be directly applied; in both cases there is a need for adaptation and then adoption. In this position paper we discuss current efforts around FAIR for research software that will also support the advancement of Software Management Plans. In turn, use of SMPs encourages researchers to make their datasets FAIR.
2019
- M. T. Kotouza, F. E. Psomopoulos, and P. A. Mitkas, “A Dockerized String Analysis Workflow for Big Data,” in 23rd European Conference on Advances in Databases and Information Systems, ASBIS 2019, Bled, Slovenia, September 8-11, 2019, 2019, pp. 564–569, doi: 10.1007/978-3-030-30278-8_55.
- A. Vardi, E. Vlachonikola, S. Mourati, F. Psomopoulos, N. Pantouloufos, A. Kouvatsi, N. Stavroyianni, A. Anagnostopoulos, K. Stamatopoulos, and A. Hadzidimitriou, “PS1131 High-Throughput B-Cell immunoprofiling at diagnosis and relapse offers further evidence of functional selection throughout the natural history of chronic lymphocytic leukemia,” in HemaSphere, 2019, vol. 3, no. S1, p. 512, doi: 10.1097/01.HS9.0000562808.48237.52.
- K. Gemenetzi, A. Agathangelidis, F. Psomopoulos, K. Pasentsis, E. Koravou, M. Iskas, N. Stavroyianni, A. Anagnostopoulos, R. Sandaltzopoulos, K. Stamatopoulos, and A. Chatzidimitriou, “VH CDR3-Focused Somatic Hypermutation in CLL IGHV-IGHD-IGHJ Gene Rearrangements with 100% IGHV Germline Identity,” in Blood, Nov. 2019, vol. 134, no. Supplement_1, pp. 4277–4277, doi: 10.1182/blood-2019-127979.
Classification of patients with chronic lymphocytic leukemia (CLL) based on the immunoglobulin heavy variable (IGHV) gene somatic hypermutation (SHM) status has established predictive and prognostic relevance. The SHM status is assessed based on the number of mutations within the sequence of the rearranged IGHV gene excluding the VH CDR3. This is mostly due to the difficulty in discriminating actual SHM from random nucleotides added between the recombined IGHV, IGHD and IGHJ genes. Hence, this approach may underestimate the true impact of SHM, in fact overlooking the most critical region for antigen-antibody interactions i.e. the VH CDR3. Relevant to mention in this respect, studies from our group in CLL with mutated IGHV genes (M-CLL), particularly subset #4, have revealed considerable intra-VH CDR3 diversity attributed to ongoing SHM.Prompted by these findings, here we investigated whether SHM may also be present in cases bearing ’truly unmutated’ IGHV genes (i.e. 100\% germline identity across VH FR1-VH FR3), focusing on two well characterized stereotyped subsets i.e. subset #1 (IGHV clan I/IGHD6-19/IGHJ4) and subset #6 (IGHV1-69/IGHD3-16/IGHJ3). These subsets carry germline-encoded amino acid (aa) motifs within the VH CDR3, namely QWL and YDYVWGSY, originating from the IGHD6-19 and IGHD3-16 gene, respectively. However, in both subsets, cases exist with variations in these motifs that could potentially represent SHM.The present study included 12 subset #1 and 5 subset #6 patients with clonotypic IGHV genes lacking any SHM (100\% germline identity). IGHV-IGHD-IGHJ gene rearrangements were RT-PCR amplified by subgroup-specific leader primers and a high-fidelity polymerase in order to ensure high data quality. RT-PCR products were subjected to paired-end NGS on the MiSeq platform. Sequence annotation was performed with IMGT/HighV-QUEST and metadata analysis was undertaken using an in-house purpose-built bioinformatics pipeline. Rearrangements with the same IGHV gene and identical VH CDR3 aa sequences were defined as clonotypes.Overall, we obtained 1,570,668 productive reads with V-region identity 99-100\%; of these, 1,232,958 (mean: 102,746, range: 20,796-242,519) concerned subset #1 while 337,710 (mean: 67,542, range: 50,403-79,683) concerned subset #6. On average, 64.4\% (range: 1.7-77.5\%) of subset #1 reads and 49.2\% (range: 0.7-70\%) of subset #6 reads corresponded to rearrangements with IGHV genes lacking any SHM (100\% germline identity). Clonotype computation revealed 1,831 and 1,048 unique clonotypes for subset #1 and #6, respectively. Subset #1 displayed a mean of 157 distinct clonotypes per sample (range: 74-267), with the dominant clonotype having a mean frequency of 96.9\% (range: 96-98.2\%). Of note, 44 clonotypes were shared between different patients (albeit at varying frequencies), including the dominant clonotype of 11/12 cases, which was present in 2-6 additional subset #1 patients. Subset #6 cases carried a higher number of distinct clonotypes per sample (mean: 219, range: 189-243) while the dominant clonotype had a mean frequency of 95.6\% (range: 94.5-96.5\%). Shared clonotypes (n=30) were identified also in subset #6 and the dominant clonotype of each subset #6 case was present in 3-5 additional subset #6 patients. Focusing on the VH CDR3, in particular the IGHD-encoded part, the following observations were made: (1) in both subsets, extensive intra-VH CDR3 variation was detected at certain positions within the IGHD gene; (2) in most cases, the observed aa substitutions were conservative i.e. concerned aa sharing similar physicochemical properties. Particularly noteworthy in this respect were the observations in subset #6 that: (i) the valine residue (V) in the D-derived YDYVWGSY motif was very frequently mutated to another aliphatic residue (A, I, L); (ii) in cases were the predominant clonotype carried I (also in the Sanger-derived sequence), several minor clonotypes carried the germline-encoded V, compelling evidence that the observed substitution concerned true SHM.In conclusion, we provide immunogenetic evidence for intra-VH CDR3 variations, very likely attributed to SHM, in CLL patients carrying ’truly unmutated’ IGHV genes. While the prognostic/predictive relevance of this observation is beyond the scope of the present work, our findings highlight the possible need to reappraise definitions (’semantics’) regarding SHM status in CLL.Stamatopoulos:Janssen: Honoraria, Research Funding; Abbvie: Honoraria, Research Funding. Chatzidimitriou:Janssen: Honoraria.
- M. Gerousi, F. Psomopoulos, K. Kotta, N. Stavroyianni, A. Anagnostopoulos, I. Kotsianidis, S. Ntoufa, and K. Stamatopoulos, “Functional Calcitriol/Vitamin D Receptor Signaling in Chronic Lymphocytic Leukemia,” in Blood, Nov. 2019, vol. 134, no. Supplement_1, pp. 3019–3019, doi: 10.1182/blood-2019-127910.
Calcitriol, the biologically active form of vitamin D, modulates a plethora of cellular processes following its receptor ligation, namely the vitamin D receptor (VDR), a nuclear transcription factor that regulates the transcription of diverse genes. It has been proposed that vitamin D may play a role in prevention and treatment of cancer while epidemiological studies have linked vitamin D insufficiency to adverse disease outcome in chronic lymphocytic leukemia (CLL). Recently, we reported that VDR is functional in CLL cells after calcitriol supplementation, as well as after stimulation through both the calcitriol/VDR signaling system and other prosurvival pathways triggered from the tumor microenvironment. In this study, we aimed at investigating key molecules and signaling pathways that are altered after calcitriol treatment and are known to play a relevant role in CLL pathophysiology.CD19+ primary CLL cells were negatively selected from peripheral blood samples of patients that were treatment naïve at the time of sample collection. CLL cells were cultured in vitro with calcitriol or co-cultured with the HS-5 mesenchymal cell line for 24 hours. VDR+, CYP24A1+, phospho-ERK+ and phospho-NF-κB p65+ cells were determined by Flow Cytometry (FC). Total RNA was extracted from calcitriol-treated and non-treated CLL cells, while mRNA selection was performed using NEBNext Poly(A) mRNA Magnetic Isolation Module. Library preparation for RNA-Sequencing (RNA-Seq) analysis was conducted with the NEBNext Ultra II Directional RNA Library Prep Kit. The libraries were paired-end sequenced on the NextSeq 500 Illumina platform. Differential expression analysis was performed using DESeq2; genes with log2FC\>|1| and P≤0.05 were considered as differentially expressed.RNA-Seq analysis (n=6) confirmed our previous findings that the CYP24A1 gene is significantly upregulated by calcitriol, being the top upregulated gene, whereas the VDR gene remains unaffected by this treatment. Overall, 85 genes were differentially expressed in unstimulated versus calcitriol-treated cells, of which 28 were overexpressed in the latter thus contrasting the remaining 57 which showed the opposite pattern. Pathway enrichment and gene ontology (GO) analysis of the differentially expressed genes revealed significant enrichment in PI3K-Akt pathway and Toll-like receptor cascades, as well as in vitamin D metabolism and inflammatory response pathways. Additionally, flow cytometric analysis showed that calcitriol-treated CLL cells displayed increased pERKlevels (FD=1.3, p\<0.05) and, in contrast decreased pNF-κBlevels (FD=2.7, p\<0.05), highlighting active VDR signaling in CLL. Aiming at placing our findings in a more physiological context, we co-cultured CLL cells with the HS-5 cell line. Based on our previous finding that co-cultured CLL cells showed induced CYP24A1 levels, we evaluated pNF-κB expression. pNF-κB levels were found to be increased in co-cultured CLL cells (FD=4.2, p\<0.05), while the addition of calcitriol downregulated pNF-κB (FD=1.5, p\<0.05). Moreover, ex vivo calcitriol exposure of CLL cells from patients under ibrutinib treatment (at baseline, +1 and +3-6 months, n=7) resulted in significant upregulation of pERK (FD=1.6, p\<0.01; FD=1.4, p\<0.01; FD=1.9, p\<0.01; for each timepoint respectively) but, significant downregulation of pNF-κΒ (FD=3.4, p\<0.01; FD=3, p\<0.05; FD=2.3, p\<0.05; for each timepoint respectively), indicating preserved calcitriol/VDR signaling capacity.In conclusion, we provide evidence that the calcitriol/VDR system is active in CLL, modulating NF-κB and MAPK signaling as well as the expression of the CYP24A1 target gene. This observation is further supported by RNA-Seq analysis that confirms CYP24A1 upregulation and highlights new signaling pathways that need to be validated. Interestingly, the calcitriol/VDR system appears relatively unaffected by either stimulation or inhibition (ibrutinib) of microenvironmental signals that promote CLL cell survival and/or proliferation, indicating context-independent signaling capacity.Kotsianidis:Celgene: Research Funding. Stamatopoulos:Janssen: Honoraria, Research Funding; Abbvie: Honoraria, Research Funding.
- K. Gemenetzi, A. Agathangelidis, F. Psomopoulos, K. Plevova, L.-A. Sutton, K. Pasentsis, A. Anagnostopoulos, R. Sandaltzopoulos, R. Rosenquist, F. Davi, S. Pospisilova, K. Stamatopoulos, and A. Chatzidimitriou, “Higher Order Restrictions of the Immunoglobulin Repertoire in CLL: The Illustrative Case of Stereotyped Subsets #2 and #169,” in Blood, Nov. 2019, vol. 134, no. Supplement_1, pp. 5453–5453, doi: 10.1182/blood-2019-128017.
Stereotyped subset #2 (IGHV3-21/IGLV3-21) is the largest subset in CLL ( 3\% of all patients). Membership in subset #2 is clinically relevant since these patients experience an aggressive disease irrespective of the somatic hypermutation (SHM) status of the clonotypic immunoglobulin heavy variable (IGHV) gene. Low-throughput evidence suggests that stereotyped subset #169, a minor CLL subset ( 0.2\% of all CLL), resembles subset #2 at the immunogenetic level. More specifically: (i) the clonotypic heavy chain (HC) of subset #169 is encoded by the IGHV3-48 gene which is closely related to the IGHV3-21 gene; (ii) both subsets carry VH CDR3s comprising 9-amino acids (aa) with a conserved aspartic acid (D) at VH CDR3 position 3; (iii) both subsets bear light chains (LC) encoded by the IGLV3-21 gene with a restricted VL CDR3; and, (iv) both subsets have borderline SHM status. Here we comprehensively assessed the ontogenetic relationship between CLL subsets #2 and #169 by analyzing their immunogenetic signatures. Utilizing next-generation sequencing (NGS) we studied the HC and LC gene rearrangements of 6 subset #169 patients and 20 subset #2 cases. In brief, IGHV-IGHD-IGHJ and IGLV-IGLJ gene rearrangements were RT-PCR amplified using subgroup-specific leader primers as well as IGHJ and IGLC primers, respectively. Libraries were sequenced on the MiSeq Illumina instrument. IG sequence annotation was performed with IMGT/HighV-QUEST and metadata analysis conducted using an in-house, validated bioinformatics pipeline. Rearrangements with identical CDR3 aa sequences were herein defined as clonotypes, whereas clonotypes with different aa substitutions within the V-domain were defined as subclones. For the HC analysis of subset #169, we obtained 894,849 productive sequences (mean: 127,836, range: 87,509-208,019). On average, each analyzed sample carried 54 clonotypes (range: 44-68); the dominant clonotype had a mean frequency of 99.1\% (range: 98.8-99.2\%) and displayed considerable intraclonal heterogeneity with a mean of 2,641 subclones/sample (range: 1,566-6,533). For the LCs of subset #169, we obtained 2,096,728 productive sequences (mean: 299,533, range: 186,637-389,258). LCs carried a higher number of distinct clonotypes/sample compared to their partner HCs (mean: 148, range: 110-205); the dominant clonotype had a mean frequency of 98.1\% (range: 97.2-98.6\%). Intraclonal heterogeneity was also observed in the LCs, with a mean of 6,325 subclones/sample (range: 4,651-11,444), hence more pronounced than in their partner HCs. Viewing each of the cumulative VH and VL CDR3 sequence datasets as a single entity branching through diversification enabled the identification of common sequences. In particular, 2 VH clonotypes were present in 3/6 cases, while a single VL clonotype was present in all 6 cases, albeit at varying frequencies; interestingly, this VL CDR3 sequence was also detected in all subset #2 cases, underscoring the molecular similarities between the two subsets. Focusing on SHM, the following observations were made: (i) the frequent 3-nucleotide (AGT) deletion evidenced in the VH CDR2 of subset #2 (leading to the deletion of one of 5 consecutive serine residues) was also detected in all subset #169 cases at subclonal level (average: 6\% per sample, range: 0.1-10.8\%); of note, the 5-serine stretch is also present in the germline VH CDR2 of the IGHV3-48 gene; (ii) the R-to-G substitution at the VL-CL linker, a ubiquitous SHM in subset #2 and previously reported as critical for IG self-association leading to cell autonomous signaling in this subset, was present in all subset #169 samples as a clonal event with a mean frequency of 98.3\%; and, finally, (iii) the S-to-G substitution at position 6 of the VL CDR3, present in all subset #2 cases (mean : 44.2\% ,range: 6.3-87\%), was also found in all #169 samples, representing a clonal event in 1 case (97.2\% of all clonotypes) and a subclonal event in the remaining 5 cases (mean: 0.6\%, range: 0.4-1.1\%). In conclusion, the present high-throughput sequencing data cements the immunogenetic relatedness of CLL stereotyped subsets #2 and #169, further highlighting the role of antigen selection throughout their natural history. These findings also argue for a similar pathophysiology for these subsets that could also be reflected in a similar clonal behavior, with implications for risk stratification.Sutton:Abbvie: Honoraria; Gilead: Honoraria; Janssen: Honoraria. Stamatopoulos:Abbvie: Honoraria, Research Funding; Janssen: Honoraria, Research Funding. Chatzidimitriou:Janssen: Honoraria.
- M. Tsagiopoulou, V. Chapaprieta, N. Russiñol, F. Psomopoulos, N. Papakonstantinou, N. Stavroyianni, A. Anagnostopoulos, P. Kollia, E. Campo, K. Stamatopoulos, and J. I. Martin-Subero, “Genome-Wide Histone Acetylation Profiling in Chronic Lymphocytic Leukemia Reveals a Distinctive Signature in Stereotyped Subset #8,” in Blood, Nov. 2019, vol. 134, no. Supplement_1, pp. 1241–1241, doi: 10.1182/blood-2019-127817.
In CLL, subsets of patients carrying stereotyped B cell receptors (BcR) share similar biological and clinical features independently of IGHV gene somatic hypermutation status. Although the chromatin landscape of CLL as a whole has been recently characterized, it remains largely unexplored in stereotyped cases. Here, we analyzed the active chromatin regulatory landscape of 3 major CLL stereotyped subsets associated with clinical aggressiveness.We performed chromatin-immunoprecipitation followed by sequencing (ChIP-Seq) with an antibody for the H3K27ac histone mark in sorted CLL cells from 19 cases, including clinically aggressive subsets #1 (clan I genes/IGKV(D)1-39, IG-unmutated CLL (U-CLL)(n=3)], #2 [IGHV3-21/IGLV3-21, IG-mutated CLL (M-CLL)(n=3)] and #8 [IGHV4-39/IGKV1(D)-39, U-CLL(n=3)] which we compared to non-stereotyped CLL cases [5 M-CLL|5 U-CLL]. In addition, a series of 15 normal B cell samples from different stages of B-cell differentiation were analyzed [naive B cells from peripheral blood (n=3), tonsillar naive B cells (n=3), germinal centre (GC) B cells (n=3), memory B cells (n=3), tonsillar plasma cells (n=3)].Initial unsupervised principal component analysis (PCA) disclosed a distinct chromatin acetylation pattern in CLL, regardless of stereotypy status, versus normal B cells. CLL as a whole was found to be closer to naive and memory B cells rather than GC B cells and plasma cells. Detailed analysis of individual principal components (PC) revealed that PC4, which accounts for 5\% of the total variability, segregated subset #8 cases and GC B cells from other CLLs and normal B cell subpopulations. Although PC4 accounts for only a small part of the total variability (5\%), this suggests that subset #8 cases may share some chromatin features with proliferating GC B cells, in line with the fact that subset #8 BcR are IgG-switched.We also investigated whether stereotyped CLLs have different chromatin acetylation features compared to non-stereotyped CLLs matched by IGHV somatic hypermutation status and identified 878 Differential Regions (DR) in subset #8 vs. U-CLL, 84 DR in subset #1 vs. U-CLL and 66 DR in #2 compared vs. M-CLL.As subset #8 cases seemed to have the most distinct profile, we further characterized the detected regions. The 435 and 443 regions gaining and losing activation, respectively, mostly targeted promoters (29.5\%) and regulatory elements located in introns (31\%) and distal intergenic regions (21.8\%). Hierarchical clustering based on the 878 DRs enabled the clear discrimination of subset #8 cases from U-CLL and normal B cells; however, it is worth noting that for several of these 878 DRs the acetylation patterns were shared between subset #8 and normal B cell subpopulations rather than subset #8 and U-CLL.Of note, 11/435 regions gaining activity on subset #8 were found within the gene encoding for the EBF1 transcription factor (TF); additional regions were associated with genes significant to CLL pathogenesis, e.g. TCF4 and E2F1. Moreover, 3 DRs losing activity in subset #8 were located within the CTLA4 gene and 2 DRs within the IL21R gene, which we have recently reported as hypermethylated and not expressed in subset #8.Next, we performed TF binding site analysis by MEME/AME suit, separately for regions gaining or losing activity, and identified significant enrichment (adj-p\<0.001) on TFs such as AP-1, FOX, GATA, IRF. The regions losing activity in subset #8 showed a higher number of enriched TFs versus those gaining activity (165 vs 93 TFs), particularly displaying enrichment for many HOX family members . However, a cluster of TFs with enrichment on TF binding site analysis, such as FOXO1, FOXP1, MEF2D, PRDM1, RUNX1, RXRA, STAT6, were also located within the 878 DRs discriminating subset #8 from either U-CLL or normal B cell subpopulations.Taken together, subset #8 cases have a distinct chromatin acetylation signature which includes both loss and gain of active elements, shared features with proliferating GC B cells, and specific changes in chromatin activity of several genes and TFs relevant to B cell/CLL biology. These findings further underscore the concept that BcR stereotypy defines subsets of patients with consistent biological profile, while they may also be relevant to the particular clinical behavior of subset #8, known to be associated with the highest risk of Richter’s transformation amongst all CLL.Stamatopoulos:Abbvie: Honoraria, Research Funding; Janssen: Honoraria, Research Funding.
Other
2024
- O. A. Attafi et al., “DOME Registry: Implementing community-wide recommendations for reporting supervised machine learning in biology.” 2024, [Online]. Available at: https://arxiv.org/abs/2408.07721.
- F. Psomopoulos, E. Capriotti, N. Rosinach, D. Cirillo, L. Castro, S. Tosatto, and the ELIXIR ML Focus Group members, “The impact of the ELIXIR community in Machine Learning.” ELIXIR All Hands Meeting, Jun. 2024, doi: 10.7490/f1000research.1119794.1.
- S.-C. Fragkouli, N. Pechlivanis, A. Anastasiadou, G. Karakatsoulis, A. Orfanou, P. Kollia, A. Agathangelidis, and F. E. Psomopoulos, “Exploring Somatic Variant Callers’ Behavior: A Synthetic Genomics Feature Space Approach.” ELIXIR All Hands Meeting, Jun. 2024, doi: 10.7490/f1000research.1119793.1.
- S.-C. Fragkouli, D. Solanki, L. J. Castro, F. E. Psomopoulos, N. Queralt-Rosinach, D. Cirillo, and L. C. Crossman, “Synthetic data: How could it be used for infectious disease research?” 2024, [Online]. Available at: https://arxiv.org/abs/2407.06211.
- F. Psomopoulos, “FAIR for Machine Learning; Building on the Lessons from FAIR Software.” Zenodo, 2024, doi: 10.5281/ZENODO.10953108.
- S.-C. Fragkouli, N. Pechlivanis, A. Anastasiadou, G. Karakatsoulis, A. Orfanou, P. Kollia, A. Agathangelidis, and F. E. Psomopoulos, “Synth4bench: a framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms.” 2024, doi: 10.1101/2024.03.07.582313.
- F. Adriano, E. Parkinson, D. Bianchini, F. Psomopoulos, M. Varadi, M. Andrabi, S.-C. Fragkouli, and U. Vadadokhau, “RDMkit, Your Domain, Machine Learning.” 2024, [Online]. Available at: https://rdmkit.elixir-europe.org/machine_learning.
- F. Liberante, F. Psomopoulos, G. Farrell, S. Suchánek, P. Lieby, M. Maccallum, S. Gundersen, and W. Nyberg Åkerström, “ELIXIR Report of the 21st Plenary of the RDA, October 2023.” Zenodo, 2024, doi: 10.5281/ZENODO.10721761.
- S.-C. Fragkouli, A. Agathangelidis, and F. E. Psomopoulos, “10 Synthetic Genomics Datasets.” Feb. 2024, doi: 10.5281/zenodo.10683211.
- G. I. Gavriilidis, V. Vasileiou, S. Dimitsaki, G. Karakatsoulis, A. Giannakakis, G. A. Pavlopoulos, and F. Psomopoulos, “APNet, an explainable sparse deep learning model to discover differentially active drivers of severe COVID-19.” bioRxiv, Jan. 2024, doi: 10.1101/2024.01.11.575161.
- F. Psomopoulos, S. Capella-Gutierrez, L. Portell-Silva, and N. Pechlivanis, “EOSC-EVERSE: Paving the way towards a European Virtual Institute for Research Software Excellence.” Zenodo, 2024, doi: 10.5281/ZENODO.10526785.
2023
- G. I. Gavriilidis, Sofoklis, Thomas, Konstantinos, and Fotis, “PertFlow: A cloud-based workflow to facilitate perturbational modeling on single-cell transcriptomics for pharmacological research.” Zenodo, 2023, doi: 10.5281/ZENODO.8350620.
- F. Psomopoulos, G. Juckeland, G. A. Stewart, S. Roiser, S. Capella-Gutierrez, L. Portell-Silva, P. Bos, J. Maassen, T. Vuillaume, N. Chue Hong, D. Garijo, J. Tedds, C. Doglioni, and C. Goble, “EOSC EVERSE: Paving the way towards a European Virtual Institute for Research Software Excellence.” Zenodo, 2023, doi: 10.5281/ZENODO.10183077.
- S.-C. Fragkouli, N. Pechlivanis, A. Orfanou, A. Anastasiadou, A. Agathangelidis, and F. Psomopoulos, “Synth4bench: a framework for generating synthetic genomics data for the evaluation of somatic variant calling algorithms, 17th Conference of Hellenic Society for Computational Biology and Bioinformatics (HSCBB),” 17th Conference of Hellenic Society for Computational Biology and Bioinformatics (HSCBB). Oct. 2023, doi: 10.5281/zenodo.8432060.
- F. Psomopoulos, “From FAIR data to FAIR Research Software, towards FAIR Machine Learning.” Zenodo, 2023, doi: 10.5281/ZENODO.10212280.
- S. G. Sutcliffe et al., “Tracking SARS-CoV-2 variants of concern in wastewater: an assessment of nine computational tools using simulated genomic data,” bioRxiv. Cold Spring Harbor Laboratory, Dec. 2023, doi: 10.1101/2023.12.20.572426.
AbstractWastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic “novel” lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances, and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1% frequency, results were more reliable above a 5% threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of noise or bias in wastewater sequencing data and to appreciate the commonalities and differences across methods.
- F. E. Psomopoulos, K. A. Kyritsis, I. Topolsky, B. Batut, A. H. Fitzpatrick, and G. Leoni, “Exploring the landscape of the genomic wastewater surveillance ecosystem: a roadmap towards standardization.” Center for Open Science, Nov. 2023, doi: 10.37044/osf.io/rtgk9.
- R. M. Waterhouse, A.-F. Adam-Blondon, B. Balech, E. Barta, K. F. Heil, G. M. Hughes, L. S. Jermiin, M. Kalaš, J. Lanfear, E. Pafilis, A. C. Papageorgiou, F. Psomopoulos, N. Raes, J. Burgin, and T. Gabaldón, “F1000Research Article: The ELIXIR Biodiversity Community: Understanding short- and long-term changes in biodiversity. — f1000research.com.” https://f1000research.com/articles/12-499, 2023.
- B. Batut, F. E. Psomopoulos, A. Via, and P. Palagi, “Teaching and Hosting Galaxy training / Hands-on: Training techniques to enhance learner participation and engagement — usegalaxy.no.” https://usegalaxy.no/training-material/topics/teaching/tutorials/learner_participation_engagement/tutorial.html, 2023.
- B. Batut, F. E. Psomopoulos, A. Via, P. Palagi, and C. Gallardo, “Contributing to the Galaxy Training Material / Hands-on: Principles of learning and how they apply to training and teaching — training.galaxyproject.org.” https://training.galaxyproject.org/training-material/topics/contributing/tutorials/learning-principles/tutorial.html, 2023.
- B. Batut, F. E. Psomopoulos, A. Via, and P. Palagi, “Contributing to the Galaxy Training Material / Hands-on: Design and plan session, course, materials — training.galaxyproject.org.” https://training.galaxyproject.org/training-material/topics/contributing/tutorials/design/tutorial.html, 2023.
- F. E. Psomopoulos and Gallantries, “Statistics and machine learning / Hands-on: Introduction to Machine Learning using R — training.galaxyproject.org.” https://training.galaxyproject.org/training-material/topics/statistics/tutorials/intro-to-ml-with-r/tutorial.html, 2023.
- S.-C. Fragkouli, A. Agathangelidis, and F. E. Psomopoulos, “TP53 synthetic genomics data for benchmarking variant callers.” Jun. 2023, doi: 10.5281/zenodo.8095898.
- G. Nilsonne, G. O’Neill, S. Dahle, V. Gaillard, J. Priess-Buchheit, S. Birgit, and F. Psomopoulos, “EOSC as an Enabler of Research Assessment Reform: Position Paper from Task Force on Research Careers, Recognition, and Credit.” Zenodo, 2023, doi: 10.5281/ZENODO.10417069.
- F. Psomopoulos, F. Al-Shahrour, and C. van Gelder, “Establishing the EOSC4Cancer Network of Experts.” Zenodo, 2023, doi: 10.5281/ZENODO.8073708.
- F. Psomopoulos, F. Al-Shahrour, C. van Gelder, S. Morgan, M. Andrabi, K. Majcen, and F. Schoots, “A guidance document for the EOSC4Cancer learning pathway.” Zenodo, 2023, doi: 10.5281/ZENODO.10200523.
- L. J. Castro, F. Beuttenmüller, Z. Chen, S. Efeoglu, D. Garijo, F. Psomopoulos, B. Serrano-Solano, K. B. Shiferaw, D. Solanki, B. Wentzel, and Y. Zhang, “Towards metadata for machine learning - Crosswalk tables.” Zenodo, 2023, doi: 10.5281/ZENODO.10407320.
- L. J. Castro, F. Psomopoulos, B. Serrano-Solano, C. Sharma, K. B. Shiferaw, D. Solanki, and Y. Zhang, “Lifecycle for FAIR Machine Learning.” Zenodo, 2023, doi: 10.5281/ZENODO.10407265.
Short articles
2022
- J. Tedds, S. Capella-Gutierrez, J. Clark-Casey, F. Coppens, G. Farrell, C. Van Gelder, B. Grüning, K. Heil, J. Lindvall, P. Maccallum, L. Matyska, F. Psomopoulos, P. Ruch, and S.-A. Sansone, “ELIXIR EOSC Strategy 2022.” Zenodo, 2022, doi: 10.5281/ZENODO.7120997.
- A. Mitsigkolas, N. Pechlivanis, and F. Psomopoulos, “Assessing SARS-CoV-2 evolution through the analysis of emerging mutations.” Cold Spring Harbor Laboratory, Oct. 2022, doi: 10.1101/2022.10.25.513701.
- E. A. Huerta et al., “FAIR for AI: An interdisciplinary, international, inclusive, and diverse community building perspective.” arXiv, 2022, doi: 10.48550/ARXIV.2210.08973.
- N. P. Chue Hong et al., “FAIR Principles for Research Software (FAIR4RS Principles).” Zenodo, May 2022, doi: 10.15497/RDA00068.
2021
- M. C. Maniou, N. Pechlivanis, A. Togkousidis, and F. Psomopoulos, “k – taxatree: An alignment-free multi-label classification workflow for efficient taxonomic assignment of metagenomic NGS data.” Zenodo, 2021, doi: 10.5281/zenodo.5769944.
Annotating NGS sequences by assigning taxa labels is a key component for the majority of metagenomic studies, and is often a prerequisite in effectively assessing biodiversity in a given environment. In this work we introduce k-taxatree, an alignment-free machine learning method that enables robust assignment of taxonomic labels to short reads, utilizing a multi-label Random Forest approach as the underlying model. We demonstrate the effectiveness of the method by applying it to data from the V4 hypervariable region of 16S rRNA reads, retrieved from the Earth Microbiome Project, displaying accuracy scores over 95% in the validation set. The workflow has been fully developed in R and is freely available at https://github.com/BiodataAnalysisGroup/k-taxatree.
- N. Pechlivanis, M. Tsagiopoulou, M. C. Maniou, A. Togkousidis, E. Mouchtaropoulou, T. Chassalevris, S. Chaintoutis, C. Dovas, M. Petala, M. Kostoglou, T. Karapantsios, S. Laidou, E. Vlachonikola, A. Chatzidimitriou, A. Papadopoulos, N. Papaioannou, A. Argiriou, and F. Psomopoulos, “Detecting SARS-CoV-2 lineages and mutational load in municipal wastewater a use-case in the metropolitan area of Thessaloniki, Greece.” Cold Spring Harbor Laboratory, Mar. 2021, doi: 10.1101/2021.03.17.21252673.
- R. Alves, D. Bampalikis, L. J. Castro, J. M. Fernández, J. Harrow, M. Kuzak, E. Martin, F. Psomopoulos, and A. Via, “ELIXIR Software Management Plan for Life Sciences.” BioHackrXiv, 2021, doi: 10.37044/osf.io/k8znb.
<p>Data Management Plans are now considered a key element of Open Science. They describe the data management life cycle for the data to be collected, processed and/or generated within the lifetime of a particular project or activity. A Software Manag ement Plan (SMP) plays the same role but for software. Beyond its management perspective, the main advantage of an SMP is that it both provides clear context to the software that is being developed and raises awareness. Although there are a few SMPs already available, most of them require significant technical knowledge to be effectively used. ELIXIR has developed a low-barrier SMP, specifically tailored for life science researchers, aligned to the FAIR Research Software principles. Starting from the Four Recommendations for Open Source Software, the ELIXIR SMP was iteratively refined by surveying the practices of the community and incorporating the received feedback. Currently available as a survey, future plans of the ELIXIR SMP include a human- and machine-readable version, that can be automatically queried and connected to relevant tools and metrics within the ELIXIR Tools ecosystem and beyond.</p>
2020
- F. Ballesio, A. H. Bangash, D. Barradas Bautista, J. Barton, A. Guarracino, L. Heumos, A. Panoli, M. Pietrosanto, A. Togkousidis, P. Davis, and F. E. Psomopoulos, “Determining a novel feature-space for SARS-CoV-2 sequence data.” Center for Open Science, 2020, doi: 10.37044/osf.io/xt7gw.
- F. Psomopoulos, C. W. G. van Gelder, P. Kahlem, B. Leskošek, and J. Lindvall, “ELIXIR Training Platform Task 2: Gap analysis, training materials development and training delivery,” F1000Research, vol. 9. 2020, doi: 10.7490/f1000research.1117955.1.
- M. Tsagiopoulou, N. Pechlivanis, and F. Psomopoulos, “InterTADs: Integration of Multi-Omics Data on Topological Associated Domains.” Aug. 2020, doi: 10.21203/rs.3.rs-54194/v1.
- RDA COVID-19 Working Group, “Recommendations and Guidelines on data sharing,” Research Data Alliance. 2020, doi: 10.15497/rda00052.
- S. Athanasiou et al., “National Plan for Open Science.” Zenodo, Jun. 2020, doi: 10.5281/zenodo.3908953.
2019
- A. Nicolaidis and F. Psomopoulos, “DNA coding and Gödel numbering.” 2019, doi: 10.48550/arXiv.1909.13574.
- E. A. Becker et al., “datacarpentry/wrangling-genomics: Data Carpentry: Genomics data wrangling and processing, June 2019.” Jun. 2019, doi: 10.5281/zenodo.3260609.