• Dodson Preston opublikował 5 miesięcy, 2 tygodnie temu

    The methods presented in the current work are available through the following software https//github.com/labgem/PPanGGOLiN. Detailed results and scripts to compute the benchmark metrics are available at https//github.com/axbazin/panrgp_supdata.

    The methods presented in the current work are available through the following software https//github.com/labgem/PPanGGOLiN. Detailed results and scripts to compute the benchmark metrics are available at https//github.com/axbazin/panrgp_supdata.

    The discovery of protein-ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein-ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost.

    We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10-20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2-5 h on average.

    The source code and datasets are available at https//github.com/charles-abreu/GRaSP.

    Supplementary data are available at Bioinformatics online.

    Supplementary data are available at Bioinformatics online.

    The novel coronavirus (SARS-CoV-2) currently spreads worldwide, causing the disease COVID-19. The number of infections increases daily, without any approved antiviral therapy. The recently released viral nucleotide sequence enables the identification of therapeutic targets, e.g. by analyzing integrated human-virus metabolic models. Investigations of changed metabolic processes after virus infections and the effect of knock-outs on the host and the virus can reveal new potential targets.

    We generated an integrated host-virus genome-scale metabolic model of human alveolar macrophages and SARS-CoV-2. Analyses of stoichiometric and metabolic changes between uninfected and infected host cells using flux balance analysis (FBA) highlighted the different requirements of host and virus. Consequently, alterations in the metabolism can have different effects on host and virus, leading to potential antiviral targets. One of these potential targets is guanylate kinase (GK1). In FBA analyses, the knock-out of the GK1 decreased the growth of the virus to zero, while not affecting the host. As GK1 inhibitors are described in the literature, its potential therapeutic effect for SARS-CoV-2 infections needs to be verified in in-vitro experiments.

    The computational model is accessible at https//identifiers.org/biomodels.db/MODEL2003020001.

    The computational model is accessible at https//identifiers.org/biomodels.db/MODEL2003020001.

    microRNAs (miRNAs) are essential components of gene expression regulation at the post-transcriptional level. miRNAs have a well-defined molecular structure and this has facilitated the development of computational and high-throughput approaches to predict miRNAs genes. However, due to their short size, miRNAs have often been incorrectly annotated in both plants and animals. Consequently, published miRNA annotations and miRNA databases are enriched for false miRNAs, jeopardizing their utility as molecular information resources. To address this problem, we developed MirCure, a new software for quality control, filtering and curation of miRNA candidates. MirCure is an easy-to-use tool with a graphical interface that allows both scoring of miRNA reliability and browsing of supporting evidence by manual curators.

    Given a list of miRNA candidates, MirCure evaluates a number of miRNA-specific features based on gene expression, biogenesis and conservation data, and generates a score that can be used to discard poorly supported miRNA annotations. MirCure can also curate and adjust the annotation of the 5p and 3p arms based on user-provided small RNA-seq data. We evaluated MirCure on a set of manually curated animal and plant miRNAs and demonstrated great accuracy. Moreover, we show that MirCure can be used to revisit previous bona fide miRNAs annotations to improve miRNA databases.

    The MirCure software and all the additional scripts used in this project are publicly available at https//github.com/ConesaLab/MirCure. A Docker image of MirCure is available at https//hub.docker.com/r/conesalab/mircure.

    Supplementary data are available at Bioinformatics online.

    Supplementary data are available at Bioinformatics online.

    The relationship between gene co-expression and chromatin conformation is of great biological interest. Thanks to high-throughput chromosome conformation capture technologies (Hi-C), researchers are gaining insights on the tri-dimensional organization of the genome. Given the high complexity of Hi-C data and the difficult definition of gene co-expression networks, the development of proper computational tools to investigate such relationship is rapidly gaining the interest of researchers. One of the most fascinating questions in this context is how chromatin topology correlates with gene co-expression and which physical interaction patterns are most predictive of co-expression relationships.

    To address these questions, we developed a computational framework for the prediction of co-expression networks from chromatin conformation data. We first define a gene chromatin interaction network where each gene is associated to its physical interaction profile; then, we apply two graph embedding techniques to extract a low-dimensional vector representation of each gene from the interaction network; finally, we train a classifier on gene embedding pairs to predict if they are co-expressed. Both graph embedding techniques outperform previous methods based on manually designed topological features, highlighting the need for more advanced strategies to encode chromatin information. We also establish that the most recent technique, based on random walks, is superior. Overall, our results demonstrate that chromatin conformation and gene regulation share a non-linear relationship and that gene topological embeddings encode relevant information, which could be used also for downstream analysis.

    The source code for the analysis is available at https//github.com/marcovarrone/gene-expression-chromatin.

    Supplementary data are available at Bioinformatics online.

    Supplementary data are available at Bioinformatics online.

    Exploring metabolic pathways is one of the key techniques for developing highly productive microbes for the bioproduction of chemical compounds. To explore feasible pathways, not only examining a combination of well-known enzymatic reactions but also finding potential enzymatic reactions that can catalyze the desired structural changes are necessary. To achieve this, most conventional techniques use manually predefined-reaction rules, however, they cannot sufficiently find potential reactions because the conventional rules cannot comprehensively express structural changes before and after enzymatic reactions. Evaluating the feasibility of the explored pathways is another challenge because there is no way to validate the reaction possibility of unknown enzymatic reactions by these rules. Therefore, a technique for comprehensively capturing the structural changes in enzymatic reactions and a technique for evaluating the pathway feasibility are still necessary to explore feasible metabolic pathways.

    We devels technique, an enzymatic reaction is regarded as a difference vector between the main substrate and the main product in chemical latent space acquired from the generative model. Features of the enzymatic reaction are embedded into the fixed-dimensional vector, and it is possible to express structural changes of enzymatic reactions comprehensively. The technique also involves differential-evolution-based reaction selection to design feasible candidate pathways and pathway scoring using neural-network-based reaction-possibility prediction. The proposed technique was applied to the non-registered pathways relevant to the production of 2-butanone, and successfully explored feasible pathways that include such reactions.

    Human microbes get closely involved in an extensive variety of complex human diseases and become new drug targets. In silico methods for identifying potential microbe-drug associations provide an effective complement to conventional experimental methods, which can not only benefit screening candidate compounds for drug development but also facilitate novel knowledge discovery for understanding microbe-drug interaction mechanisms. On the other hand, the recent increased availability of accumulated biomedical data for microbes and drugs provides a great opportunity for a machine learning approach to predict microbe-drug associations. We are thus highly motivated to integrate these data sources to improve prediction accuracy. In addition, it is extremely challenging to predict interactions for new drugs or new microbes, which have no existing microbe-drug associations.

    In this work, we leverage various sources of biomedical information and construct multiple networks (graphs) for microbes and drugs. Then, wery data are available at Bioinformatics online.

    In de novo sequence assembly, a standard pre-processing step is k-mer counting, which computes the number of occurrences of every length-k sub-sequence in the sequencing reads. Sequencing errors can produce many k-mers that do not appear in the genome, leading to the need for an excessive amount of memory during counting. This issue is particularly serious when the genome to be assembled is large, the sequencing depth is high, or when the memory available is limited.

    Here, we propose a fast near-exact k-mer counting method, CQF-deNoise, which has a module for dynamically removing noisy false k-mers. It automatically determines the suitable time and number of rounds of noise removal according to a user-specified wrong removal rate. We tested CQF-deNoise comprehensively using data generated from a diverse set of genomes with various data properties, and found that the memory consumed was almost constant regardless of the sequencing errors while the noise removal procedurehad minimal effects on counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consistently performed the best in terms of memory usage, consuming 49-76% less memory than thesecond best method. When counting the k-mers from a human dataset with around 60× coverage, the peakmemory usage of CQF-deNoise was only 10.9GB (gigabytes) for k = 28 and 21.5GB for k = 55. De novo assembly of 106× human sequencing data using CQF-deNoise for k-mer counting required only 2.7 h and 90GB peak memory.

    The source codes of CQF-deNoise and SH-assembly are available at https//github.com/Christina-hshi/CQF-deNoise.git and https//github.com/Christina-hshi/SH-assembly.git, respectively, both under the BSD 3-Clause license.

    The source codes of CQF-deNoise and SH-assembly are available at https//github.com/Christina-hshi/CQF-deNoise.git and https//github.com/Christina-hshi/SH-assembly.git, respectively, both under the BSD 3-Clause license.

Szperamy.pl
Logo
Enable registration in settings - general
Compare items
  • Total (0)
Compare
0