Skip to Main content Skip to Navigation
New interface
Conference poster

CroMaSt: A workflow for domain family curation through cross-mapping of structural instances between protein domain databases

Hrishikesh Dhondge 1 Isaure Chauvot de Beauchêne 1 Marie-Dominique Devignes 1 
1 CAPSID - Computational Algorithms for Protein Structures and Interactions
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
Abstract : Protein domains can be viewed as building blocks, essential for understanding structure-function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, boundaries between different domains or families differ from one domain database to the other, raising the question of domain definition and enumeration. The answer to this question cannot be found in a single database. Rather, expert integration and curation of various databases are required to refine the contours of a domain of interest, in a domain-centric approach. Here, we illustrate the role of 3-D structure in clarifying domain definition with the help of CroMaSt: “Cross-Mapper for Structural Domains”, a fully automated workflow that classifies all structural instances of a given domain into 3 different categories (core, true and domain-like). CroMaSt is developed in Common Workflow Language (CWL) and takes advantage of 2 well-known and widely used domain databases, Pfam (sequence-based) and CATH (structure-based). It uses the domain definitions from Pfam and CATH and SIFTS resource for cross-mapping of structural instances from the above-mentioned sources. Structural alignments generated by Kpax allow to identify the false positive instances from each domain database. We tested CroMaSt on the RNA Recognition Motif (RRM), the most prevalent and diverse RNA-binding domain. Starting from PF00076 and 3.30.70.330 domain families from Pfam and CATH respectively, our workflow identifies 882 core, 966 true and 344 domain-like structural instances. The information generated by this method will play a crucial role in machine learning methods applied to domain-specific synthetic biology.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03789541
Contributor : Hrishikesh Dhondge Connect in order to contact the contributor
Submitted on : Tuesday, September 27, 2022 - 2:59:55 PM
Last modification on : Tuesday, October 25, 2022 - 4:25:07 PM

File

CroMaSt_ECCB_RNAct_final.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Hrishikesh Dhondge, Isaure Chauvot de Beauchêne, Marie-Dominique Devignes. CroMaSt: A workflow for domain family curation through cross-mapping of structural instances between protein domain databases. ECCB2022- 21st European Conference on Computational Biology, Sep 2022, Sitges, Spain. ⟨10.48546/WORKFLOWHUB.WORKFLOW.390.1⟩. ⟨hal-03789541⟩

Share

Metrics

Record views

26

Files downloads

5