Data and Workflow Management for Bioinformatics

Course Description

This course explains where large data sets come from and how they are stored and managed. It also examines data sizes, accessibility approaches, and how data are transformed and used for AI consumption. You will examine the challenges and considerations when choosing data for training sets.

By the end of course, you will understand the types of data used in bioinformatics, how the data are collected, stored, managed and searched, and how the data are transformed for further processing and analysis. You will also develop skills on how to aggregate and normalize the data to be used for machine learning and/or AI training sets.

Topics

Pipeline Design
Workflow management systems and workflow analysis with open-source tools
Documentation skills / proof of concept with foresight
Using SQL for bioinformatics data
Data lakes (e.g, Databricks, Redshift and/or Snowflake)
Large data sets
Databases - how to store, move, and learn what AI models to use

Additional Information

AI* - This course teaches students how to write bioinformatics programs by using AI for parsing and normalization of biological data.

Syllabus Library

Currently no classes scheduled. Would you like to be notified when a class is available?

Data and Workflow Management for Bioinformatics

Course Description

Topics

This course applies to these programs:

Ask A Question