Machine learning project
Computational analysis, prediction, and modeling play increasingly important roles in understanding biology, and have made important contributions to the identification of genes in genomic sequence, prediction of RNA splicing and alternative splicing patterns, identification of regulatory sites (e.g. promoters and transcription factor binding sites) in genomic sequence, identification of protein sequence motifs and their association with function, prediction of protein secondary structure and a long list of others. With the advent of genomic sequencing, increased attention has been drawn to computational approaches, and this has resulted in a high degree of interest among computer scientists, many of whom have relatively little background in molecular biology. There are many modeling approaches in use in other fields, and in particular in the machine learning and artificial intelligence communities, that could make contributions to computational biology, but it is difficult for workers in these communities to identify appropriate data sets and biological systems on which to test their ideas. The goal of this project is to assemble and systematically analyze key elements that serve as signals for various biological processes using machine- learning approaches. The assembled sets will serve as learning space for building sensors that could be integrated to achieve the best prediction for the presence of a particular biological signal. The resource will be invaluable to computational biologists and Machine-learning researchers for developing new methods and to biologists to derive new biological knowledge. This project is just started and I welcome suggestions, comments and I am happy to join hands with anyone who is interested in working on this project([email protected]).