Wikis > Co-op Student Projects > CDC Genome Annotation

A couple months ago Dr. Upton was sent 3 pox virus genomes (Skunkpox, Vole pox and Raccoon pox) from the CDC to annotate. I ran them all through GATU (SOP3) using Cowpox Brighton Red as the reference genome and then went through all the genes and marking them good or bad. I went through all the iffy ones with Dr. Upton and compiled a list of them to send to the CDC so they can check for sequencing errors. Now we’re are just waiting for them to get back to us after they check for sequencing errors.

The genes in about the first and last 20 kb of the Skunk and Vole and just the last 20kb of Raccoon we decided to ignore at the moment because it is located in the ITR region. There seems to be something fishy going on with the ITR region, because a bunch of genes that should be on the 5′ end, are found in the 3′ end, so one possibility is that they compiled it wrong. Raccoon is also completely missing the 5′ ITR and genes 3 and 25 are fused, with everything in between missing. Again, we think this is likely to be a sequencing error so we’re waiting to hear back from them about this too.

Once the CDC gets back, and if the ITR region was completely redone, you’ll likely have to run the genomes through GATU again. I’ve updated SOP3 to try and make fairly clear on what to do. Once you get a table from GATU, I would suggest pasting it into an excel file and then compare the new table with my old one to see what the difference are (look to see if the reference gene and the length of the gene are the same, then you can probably stick with my notes/comments on it). Any genes that different, you’ll want flag and just take a quick look at the alignment (you’ll have to make an alignment with the new genomes files and CPXV BR). You could even do an alignment with the genbank file from the old genome and the genbank file from the new genome to help you visualize what has been changed.

So kind of just to reiterate, once you get the new genomes, you’ll want to go through all the genes to check if they’re still the same or not and then also check the ones I’ve marked in blue to see if my comments have been addressed or not. If everything looks good, you can change them to green, or if they’re still iffy then you can keep it blue and ask Dr. Upton about all your blue ones at the end. There’s a few genes marked blue just as a reminder to make sure that the gene is annotated at a different start codon than the one GATU found; you’ll have to do this manually (I’m not sure on the exact process though because we never got that far, probably just manually editing the genbank file before sending it bank to CDC).

The other thing you’ll have to do once we get the new genomes, is go through all the new ORFs again. I compiled a list and went through most of them, but if the ITR has changed a lot, you’re probably better off copying the table again from GATU and then just using my list as a reference. Once you eliminate all the orfs that overlap existing genes (which are therefore unlikely to be genes), you’re left with potential new genes. Other than BLAST and PDBalert (if it even is currently working), I’m not sure what other analysis to do for them so you’ll have to talk to Dr. Upton.

The main document you’ll be interested in is the excel file called CDC Annotations v7. Most of the working files you’ll need are in the folder called Alignments or “Fastas + Gbks.” Old versions of the table if you’re interested are in a Tables folder. And the other document that you might find useful is the long VGO alignment printout that I taped together; although again, if the ITR regions have changed a lot, you’ll want to look into making a new one. The rest of the folders I wouldn’t worry about, they were just looking into questions that Dr. Upton had about the genomes at the time.


Comments are closed.