= Practical 1 =
== Introduction ==
You have been assigned a specific gene from a specific organism from this [[/genes|list of genes]].  Your task is to find out about the organism, gene, the enzyme it encodes and the reaction that the enzyme catalyses.<<BR>>

The following instructions are not very specific about where on the relevant web pages you will find the items you need. This is because the page designs keep changing and it is difficult to keep instructions up to date. You'll just have to hunt! Also, because of the cross-linking between data items, there's more than one route to the same item, so it's quite easy to go round in circles. Finally, the types and amount of information available are different for different genes, so some of the web pages can appear different depending on what is listed. This does mean that the ways to do things given below are just suggestions; you may find the information following different routes.

== Exercise 1: basic information about your gene and organism ==
Here you will be conducting database searches to find links to your gene and its protein product. The gene designation I have given you is likely to consist of an organism designation (three or four letters) followed by a number (usually a sequence number for the coding sequence in the genome annotation). However, within a particular database, the associated data is likely to be linked to unique and specific identifiers or accession numbers. Look out for (and note for future reference):

 * Gene id
 * Protein id

Sometimes one database will include cross-references to the corresponding ids in other databases. One exception to these 'local' designations concerns enzymes: if your gene encodes an enzyme, the enzyme activity is represented by an internationally-agreed designation in the format EC''n''.''n''.''n''.''n'' which is common across all the databases.

=== Databases to consult ===
 *  The    National Centre for Biotechnology Information   (http://www.ncbi.nlm.nih.gov/).         This resource in the USA includes a number of well-known databases      including !PubMed and [[http://www.ncbi.nlm.nih.gov/Genbank/index.html|GenBank]].     The global  [[http://www.ncbi.nlm.nih.gov/sites/gquery|Query]] page is the point from which you        can find all database references to your gene. However, two sections of the NCBI site to be consulted today from the query page (or the  links below) are:

 *   [[http://www.ncbi.nlm.nih.gov/gene/|Gene]]            where you can enter your gene id.

 *   [[http://www.ncbi.nlm.nih.gov/blast/Blast.cgi|Blast]]                 where the starting point for the sequence comparison searches can               be found.

You can also consult:

 *  [[http://www.genome.jp/kegg/|KEGG]],   the Kyoto Encyclopaedia of Genes and Genomes in Japan, which links      genome annotations to functional information, in particular enzymes     and metabolism.


 *  [[http://www.expasy.ch/|ExPaSy]],      in Switzerland, which focuses on proteins and enzymes and tools to      examine them. In particular:


 *  [[http://www.uniprot.org/|UniProtKB]]  -       containing SwissProt (a manually-curated database of information        about known proteins) and TrEMBL (automatic annotations from the EBI    -       not curated and         therefore more speculative);

 *  [[http://www.expasy.ch/tools/#proteome|Proteomics      tools]] -       for a range of calculations of protein properties;

 *  [[http://www.expasy.ch/enzyme/|Enzyme]]        -       one of the copies of the enzyme catalogue (to be explored later).

 *  [[http://www.ebi.ac.uk/|EBI]],         the European Bioinformatics Institute, based in England, which has      mirrors of some of the other databases, but also has a number of its    own databases related to functional interpretation of genomic   information.


A more extensive list of links is available [[.//Meetings/Riga2013/Practicals/Practical_1/Links|here]]

=== Task 1: Identify your gene ===
Go to [[http://www.ncbi.nlm.nih.gov/sites/gquery|NCBI Query]] and enter your gene id. The table on the page should refresh and indicate where there are relevant links. Look for:

 * Genome. This   links to the organism genome.

  *  ''What can you         find out about your organism and its classification?''

  * Re-enter your  gene in the search box to see its chromosomal position and its  neighbours. Some of the information on the map is cryptic; you have     to click on the gene markers for it to pop up.

  *  ''Do nearby    genes have related functions?''

 *  Gene.  This will give the gene_id for the NCBI databases. '''Note it down      for future reference'''''. ''At this point, you should be able  to find the nucleotide sequence of your gene. It's useful to make a     copy of this for pasting into other applications such as similarity     searches. For this, the FASTA format (the one letter base sequence      uninterrupted by numbering etc) is most widely used, so look for a      link that leads you to this.

  *  ''What is the  functional annotation of your gene? If an enzyme, is there an EC        number?''
 *  Protein.       '''Note down the protein_id of the protein product''' (generally        the translated nucleotide sequence). Take a copy of the protein         sequence; again, the FASTA format (uninterrupted one-letter amino       acid codes) is the most useful. If there is information about the       protein product, you may also find it on UniProtKB. There are two       ways of finding the corresponding protein: inserting the FASTA  format protein sequence in the search box on UniProtKB, or trying to    get a match between gene or protein ids you already have via the ID     mapping search.

 *  Go     to [[http://www.genome.jp/kegg/|Kegg]] and search using         your gene name and/or enzyme EC number to expand on the information     above, especially the metabolic context of an enzyme.


=== Task 2: Similarity searching (BLAST) ===
Go to [[http://www.ncbi.nlm.nih.gov/blast/Blast.cgi|NCBI Blast]].

 *  Select nucleotide      blast and enter the gene_id or the FASTA nucleotide sequence in the     search box. Also '''select the non-redundant (nr) nucleotide    collection to search against, but do not enter your organism name at    this point – leave the organism box blank'''. Start the search  and wait for the (very large) results page to appear. (The default      number of matches reported is 100; this can be increased if you want    to see matches to more distantly-related sequences.) The summary of     the top matches is shown in a diagram near the top of the results.      Information appears when you roll the mouse over the diagram, and       clicking on it jumps you to the related information. Note that the      'score' relates to the quality of the match: the higher the score,      the better it is. The 'E value' indicates the probability of finding    an equivalent quality match with a random sequence of the same  length searched against a database of the size you have searched;       the lower the value, the better the match. An E value of 0.1 or         greater indicates the match could well have occurred merely by  chance.
  * ''What do you  notice about the similar sequences reported? Are they for genes of      the same function, or do they differ? ''
  *  ''Are the best         matches with genes in closely-related organisms?'' (There is an         option to see the matches arranged as a distance tree; this is like     a phylogeny, but not exactly the same as the distances are all  relative to the reference sequence rather than all ''v.'' all.)

 * Repeat the     nucleotide blast, but this time confine the search to the genome of     your organism.

  * ''Do similar   sequences to your gene occur elsewhere within the genome of the         organism? If so, are these to genes of the same or related function?''

 * Repeat         the above exercises, but this time comparing protein sequences.         Similarity can be pursued over greater distances with the protein       sequences because some nucleotide substitutions are synonymous at       the protein level, and the similarity scoring takes account of  whether or not substitutions are for closely related amino acids.       The protein blast can be performed using Blastp, entering the FASTA     protein sequence or protein_id as query, or using Blastx, entering      the gene sequence/id, but then make sure you use the appropriate        genetic code for your organism. With both options, search against       the non-redundant protein database.

  *  ''Have         you found more distant, good quality matches (assessed by E values)     using a protein search?'' (You might need to increase the       number of reported matches to tell.)


== Next ==
Now go to exercise [[/Ex2|Exercise 2]].