class 11 Alphafold

Patrick Nguyen (ID: A17680785)

Background

In this hands-on session we will utilize AlphaFold to predict protein structure from sequence (Jumper et al. 2021).

Without the aid of such approaches, it can take years of expensive laboratory work to determine the structure of just one protein. With AlphaFold we can now accurately compute a typical protein structure in as little as ten minutes.

The PDB database (the main repository of experimental structures) only has ~250 thousand structures (we saw this in the last lab). The main protein sequnce database has over 200 million sequences! Only 0.125% of known sequences have a known structure - this is called the “structure knowledge gap”.

(250000 / 200000000)*100

[1] 0.125

Structures are much harder to determine than sequences
They are expensive (on average ~$1million each)
They take on average 3-5 years to solve!

EBI AlphaFold Database

The EBI has a database of pre-computed AlphaFold (AF) modles called AFDB. This is growing all the time and can be useful to check before running AF ourselves.

Running Alphafold

We can download and run locally (on our own computers) but we need a GPU. Or we can use “clound computing to run this on someone elses computers :)

We will use ColabFold < https://github.com/sokrypton/ColabFold >

We previously found there was no AFDB enty for our HIV sequence:

>HIV-Pr-Dimer
PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD
QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF:PQITLWQRPLVTIKIGGQLK
EALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPT
PVNIIGRNLLTQIGCTLNF

Here we will use AlphaFold2_mmseq2