Bioinformatics Module – Metagenomics annotathon Assignment Report Template

Please use the attached template and sequence assigned to me. Answer all the question from section 1 to 4

Section 1.Finding the Open Reading Frame (ORF)

a.Genomic Sequence:

b. Translation Used:

c.Translated Sequence

d.Commentary on Translation

Section 2.BLAST Data and Sequence Alignment

a.Summary of BLAST Data (Maximum 2 pages of figures and text)

Be selective about what you show here – it might be worth summarizing your BLAST output as a table.Think about what information might be relevant to a reader.There is no point in having 10 pages of raw data showing pair-wise alignments.It would be better to find the closest homologues and provide a multiple alignment (see part b) with your sequence against their full sequences.

You will need to follow the Entrez links from your BLAST output to get these sequences in FASTA format and perform a multiple alignment. Show the minimum number of alignments to the nearest homologues to demonstrate what your protein relates to.

It is possible some of your top hits in BLAST may be the same protein from the same species (just sequenced by different researchers or in different context (for example, as synthetic constructs).Be wary about this as it is far more informative to be selective and show an alignment with proteins from different species if possible.

b.Multiple Sequence Alignment

Alignments are useful if you can align your sequence against the full-length versions of the homologues.Click on the accession numbers in your BLAST output data to full sequences and other information regarding the homologues. Highlight any regions of high homology.

c.Commentary on the BLAST data and Alignment (Maximum 100 words)

How confident are you about the BLAST data – remember to quote relevant scores and E-values.Has this protein ever been sequenced before? What is the percentage identity to the nearest homologues?If your putative protein is incomplete can you predict how much might be missing by examining the sequences of the homologues?

Section 3.Investigating Conserved Domains

  1. Protein domains and structure (Maximum 2 pages of figures and text)

This section will need to be completed with judicial use of supporting figures and text.Indicate which websites/databases were used to make any predictions


Can you infer anything about conserved functional domains of your putative protein either directly by analysing the primary sequence and/or by comparison to related proteins?Does your protein belong to any families or super-families of proteins based on conserved domains?

What is known about the structure? If you have a complete protein what is the MW and isoelectric point?If your protein is incomplete what are the values for the nearest homologue?If your protein is incomplete could we predict what is missing based on the sequence of nearest homologues? Is there a cellular localization signal – can you predict where in the cell this protein belongs? (Be careful about localization if it is likely prokaryotic!)

Optional – Assuming its structure has not been resolved – what can we infer about the 3D structure of your putative protein if anything? Note – remember the primary sequence database is many orders of magnitude larger than the tertiary structure database so this may not be applicable for all putative proteins.

Section 4.Proposed Biological Function and Taxonomy

a.Proposed Biological Function (Maximum 500 words plus supporting figures)

What does your putative protein do? When your ORF’s homologues have known functions, or if the ORF presents known conserved domains you may be able to predict a possible biological function. What can you conclude from the BLAST analysis and any conserved functional domains? You will need to search the relevant literature regarding these domains or the homologues themselves.


You may need to construct a basic phylogenetic tree using your BLAST output data. What can you infer about the species of origin of your DNA?What data does your phylogenetic tree provide? Does it make sense – would this class of organism likely to be found in the sample pool?

c.Assigning a Gene Ontogeny (GO) Term

Assign a relevant GO term from the simplified list of GO terms on the LMS from either (or both if applicable) the Molecular Processes or Biological Processes list.


Papers cited

Note: List the URLs of online tools used either in the figure captions or in text when referring to the relevant data.

A Note on General Scholarship

This report involves being judicious about what data you show.Remember that you are making a case for a proposed function and origin for your putative protein. You need to show the relevant data that makes this case for you.

Remember to include figure captions and legends and include references. Avoid simply cutting and pasting data from websites.

You will need to work on your figures to highlight regions/sequences of interest and to add labels which increase the clarity of your points. For example, highlight the coding portion of your genomic DNA or all I will see is a nucleotide sequence with no context.

What our clients say
Daphne Whitby
Daphne Whitby
My homework required that I use Java to produce a programming assignment. I’ve been running up and down with friends and workThank you for  your help 
Arnold M
Arnold M
This site did honor their end of the bargain. I have been searching for a college essay help services for a while, and finally, I found the best of the best.
Regina Smith
Regina Smith
I received my essay early this morning after I had placed an order last night. I was so amazed at how quickly they did my work. The most surprising thing is that I was not asked to pay for extra due to the short notice!! I am a happy student