What is Bioinformatics: A Primer
Articles —> What is Bioinformatics: A Primer
Bioinformatics is the broad field of using math, computer science, and statistics to study biology. Bioinformatics include protein structure analysis, genome sequence assembly, and programming for data analysis and storage: bioinformatics is a hub where a vast number of skills converge. The more knowledge and background in the necessary skills, the better situated one may be in the field. Although there may be too many subsets of skills to itemize in great detail, as a general rule of thumb I like to think of the field as the convergence of three broad categories: biology, computer science, and statistics.
Biological Sciences
At the center of bioinformatics is of course the Biological Sciences. This category can be further broked down into genetics, molecular biology, chemistry, structural biology, cancer biology, and a whole lot more. Biology is the center of the hub through which the other skills outlined below plug into. A strong knowledge of the subject is an important skill in bioinformatics, facilitating communication with biologists and helping to understand the underlying problem being investigated. While a deep knowledge of a particular field may not be required, a fundamental understanding of the Central Dogma of Molecular Biology is extremely important, as is a grasp of the concepts of organic chemistry.
Computers and Programming
Bioinformatics is typically about accomplishing tasks ordinarily difficult (or impossible) to do by hand. Enter into the equation computers. Knowing how to 'script' is an essential skill, and knowing how to write production software an added bonus.
Biological Sciences has entered the realm of 'big data', typically in the form of 'omics' - genomics, metabolomics, proteomics, transcriptomics - techniques which produce huge amounts of data. The size of the data creates a downstream requirement: the necessity to analyze, store, and visualize the results. None of these are trivial tasks, and all require knowledge of how to use software as well as how to deal with these large sets of data. Knowing how data is structured for storage (typically in a SQL database), how to parse the data (scripting and database communications), and how to visualize the data (user interface design and/or knowledge of statistical tools and software).
Of course the basics of computer science are important: data structures, algorithms, code syntax, command line control, etc...Bioinformatics is not a single language field: while historically the language of choice has been perl due to its easy to use text parsing capability, any suitable language is often sufficient (python, java, R, C, etc...) - and given the many different problems one may face, often the more languages the better.
Statistics
Statistics is inherent to bioinformatic data analysis, and virtually required for experimental design and interpretation in the biological sciences. While this may often not be a requirement, it is a very useful skill. The basics should be known (mean, standard deviation). More advanced topics (linear regression, baysian probability) may be harder to learn but also valuable skills. As biology is entering the realm of 'big data', machine learning, statistical learning, and data mining are all important statistical categories.
Where to Start?
Ten years ago there were very few courses in bioinformatics. These days, many scholastic institutions offer complete programs in the subject. However these are often graduate level programs requiring a good understanding in one or more of the categories above as a prerequisite. As with any subject, practice. This could involve a personal project, a project as an intern or together with an academic institution, or an open source project with a team. To use myself as an example: I started in bioinformatics by writing scripts which helped my research as a graduate student: I had an interest in programming since I was a child, and applied these interests directly to my work. Over time, the scripts coalesced into my own project: GeneCoder, now production level software for molecular biology.
There are no comments on this article.