Unlocking Big Genetic Data Sets

Unlocking Big Genetic Data Sets

Published By: Columbia University, 11/9/2016

>> View the Article <<


Researchers at Columbia University are studying the use of machine-learning algorithms to predict the ancestry of individuals. They plan to use this algorithm to track trends in the genetics of populations for medical use.

Extended Discussion Questions

  • How does this story demonstrate the impact of scientific computing on scientific progress?
    • What are some positive impacts the TeraStructure algorithm could have on medicine?
    • Can you think of any other examples of how being able to crunch massive amounts of data has changed medical treatments?
  • Are there any potential negative impacts from this innovation?
    • Any way the data could be used against the patients? (Aiming at: Privacy violations, affecting medical insurance premiums…)
    • What kind of information or warnings do you think doctors should give patients? Prompts: For example, about prediction accuracy, about who else might see the data…?
  • What kinds of people — and from where — are most likely to have had their genes sequenced?
    • What effects might this have on the accuracy of the results?
    • If you’ve discussed statistical norming: What could the researchers do to reduce bias in the results?

Relating This Story to the CSP Curriculum Framework

Global Impact Learning Objectives:

  • LO 7.2.1 Explain how computing has impacted innovations in other fields.
  • LO 7.3.1 Analyze the beneficial and harmful effects of computing.

Global Impact Essential Knowledge:

  • EK 7.2.1A Machine learning and data mining have enabled innovation in medicine, business, and science.
  •  EK 7.2.1B Scientific computing has enabled innovation in science and business.
  • EK 7.2.1E Open and curated scientific databases have benefited scientific researchers.

Other CSP Big Ideas:

  • Idea 4 Algorithms

Banner Image: “Network Visualization – Violet – Offset Crop”, derivative work by ICSI. New license: CC BY-SA 4.0. Based on “Social Network Analysis Visualization” by Martin Grandjean. Original license: CC BY-SA 3.0

Home Forums Unlocking Big Genetic Data Sets

  • You must be logged in to reply to this topic.