IgBlast tool

Reset page Save search parameters Retrieve recent results

Enter Query Sequence

Enter sequence(s) [?] Clear

Sequence can be in form of raw sequence, accession # or gi (example accession Y14934). You may opt to include a definition line starting with ">" at the top in conforming to FASTA format. You can also load your sequences contained in a local file (make sure it is a plain text file). If the sequence is already in GenBank, you can just enter its accession or gi #.

Multiple query sequences may be submitted. Each sequence must have a unique identifier and we suggest that you do not use white spaces in the identifier as any characters after the white spaces will be excluded.

Or, upload local sequence file

Germline gene databases Organism for query sequence

[?]

Specify the organism which the query sequence comes from. This allows the program to properly report the V domain delineation, the V-J frame status (i.e, in-frame, out-of-frame, etc) and the translation of the query nucleotide sequence.

Germline V gene Database

♦Germline V gene Database non-default value [?]

All IMGT germline databases are from IMGT/V-QUEST reference directory sets. Sequences from several different categories are available including functional genes (F), open reading frame genes (ORF), pseudogenes whose protein translation frames are intact (in-frame P) and orphon genes that are outside of normal immunoglobulin or T cell receptor gene loci.

All UNSWIg gene databases are from UNSWIg germline repertoire.

NCBI human V genes: This database consists of the "IMGT human V genes (F+ORF+in-frame P) including orphons" database plus a few pseudogenes that IMGT database did not include. It contains the same human sequences as the "Ig germline V genes" database for the previous version of IgBLAST.

NCBI human V genes (old): This is our earliest version of human Ig germline V genes database before addition of the human germline sequences from IMGT database. It is the same as "Ig germline V genes (old)" database for the previous version of IgBLAST.

NCBI mouse V genes, NCBI mouse D genes and NCBI mouse J genes: These are mouse germline sequences independently collected by NCBI.

Rhesus monkey germline V and J genes are from Sundling C et al, 2012.

See NCBI germline genes for details on NCBI germline gene collections.

Custom: You can search your own database. Your database should contain sequences in FASTA format.

Germline D gene Database

♦Germline D gene Database non-default value

Germline J gene Database

♦Germline J gene Database non-default value

Search Parameters Program

♦Program non-default value [?]

Choose blastp for protein sequences and blastn for nucleotide sequences.

Min D gene nucleotide matches

♦Min D gene nucleotide matches non-default value [?]

This controls the threshold for D gene detection. You can set the minimal number of required consecutive nucleotide matches between the query sequence and the D genes based on your own criteria. Note that the matches do not include overlapping matches at V-D or D-J junctions. The default value is 5 nucleotides.

D gene mismatch penalty

♦D gene mismatch penalty non-default value [?]

A higher mismatch penalty (for example, -4) favors detecting D gene matches with higher similarity to the query sequence but such matched regions are not necessarily long. On the other hand, a lower mismatch penalty (for example, -1) favors detecting longer D gene matches that do not necessarily have a high similarity to the query sequence. In general, a higher penalty works better if your sequence has few or no somatic mutations. But if your sequence has significant mutations (>5%),then a lower penalty should be chosen if you want to accommodate the low similarity introduced by mutations.

Extend alignment at 5' end

♦Extend alignment at 5' end non-default value [?]

If your sequence has too many mutations at 5' end of the V gene, then the IgBLAST result may not show that part of the V gene since IGBLAST uses local alignment algorithm. However, if you'd like to see those missed bases/residues, you can enable this option to direct IgBLAST to perform simple gapless alignment extension (up to 30 bases/residues) into that region.

Allow V(D)J genes to overlap

♦Allow V(D)J genes to overlap [?] Notice the default value has changed for this option

Enabling this option allows V(D)J genes to overlap at the rearranging junctions (i.e, there might be a stretch of nucleotide homology that is shared between the V(D)J gene segment ends, which mostly happens when there are no N nucleotide additions). The program does not allow V, D, J genes to overlap when assigning V, D, J gene matches by default. While this option might change the results for D gene matches in some cases, it has no effect on results for V gene matches (as well as related match statistics). Its effect on J gene matches is minimal and only occurs in rare cases. Note that this option is active only when the D and J gene mismatch penalty are set to -4 and -3, respectively.

This option was made available on April 6, 2017. Prior to this date, our web IgBLAST had been allowing V(D)J genes to overlap (i.e., this option was enabled internally). If you like the old behavior of IgBLAST, please enable this option.

Min required V gene length

♦Min required V gene length non-default value [?]

Only shows results if the query sequence matches a germline V gene for at least the specified minimal length (i.e., number of bases or amino acids). Otherwise, reports "No hits found".

Min required J gene length

♦Min required J gene length non-default value [?]

Only shows results if the query sequence matches a germline J gene for at least the specified minimal length (i.e., number of bases). Otherwise, reports "No hits found".

J gene mismatch penalty

♦J gene mismatch penalty non-default value [?]

A higher mismatch penalty (for example, -3) favors detecting J gene matches with higher similarity to the query sequence but such matched regions are not necessarily long. On the other hand, a lower mismatch penalty (for example, -1) favors detecting longer matches that do not necessarily have a high similarity to the query sequence. in general, a higher penalty works better if your sequence has few or no somatic mutations. But if your sequence has significant mutations (>5%),then a lower penalty should be chosen if you want to accommodate the low similarity introduced by mutations.

Note: Parameter values that differ from the default are highlighted in yellow and marked with ♦

Show results in a new window

Formatting Options Number of germline gene

V gene D gene J gene

♦Number of germline V gene non-default value ♦Number of germline D gene non-default value ♦Number of germline J gene non-default value

Amino acid translation

♦Amino acid translation non-default value Show amino acid translation [?]

This will translate your query as well as the top germline sequence and align the amino acid to the second base of a codon. The mismatched amino acids in the germline sequence will be colored.

V domain delineation system

♦V domain delineation system non-default value [?]

The V domain can be delineated using either IMGT system (Lefranc et al 2003) or Kabat system (Kabat et al, 1991, Sequences of Proteins of Immunological Interest, National Institutes of Health Publication No. 91-3242, 5th ed., United States Department of Health and Human Services, Bethesda, MD). Domain annotation of the query sequence is based on pre-annotated domain information for the best matched germline hit.

Number of clonotypes to show

♦Number of clonotypes to show non-default value [?]

Number of top clonotypes to show. Note this option is applicable only for multiple queries.

Alignment format

♦Alignment format non-default value

Additional databases Database

♦Additional databases non-default value [?]

nr: GenBank+EMBL+DDBJ+PDB+RefSeq sequences, but excludes EST, STS, GSS, WGS, TSA, patent sequences as well as phase 0, 1, and 2 HTGS sequences.

refseq_genomes: NCBI RefSeq genome sequences.

pdb: Sequences from the Protein Data Bank (PDB).

patent: Nucleotide sequences derived from the Patent division of GenBank.

mammalian genomes: NCBI mammal genomes.

Organism limit
Optional

Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown. [?]

Start typing in the text box, then select your taxid. The search will be restricted to the sequences in the database that correspond to your limited subset.

Search V gene only

♦Search V gene only non-default value [?]

This allows a user to find the best matches for the V gene in your query sequence among additional non-germline databases (i.e., nr, genome, etc). This option has NO effect on search against germline gene databases (see explanation below).

A typical rearranged query sequence includes a leader, the V, D, J gene (sometimes the C region is also included). When a sequence is submitted for blast search, the similarity matches will be performed over the entire query sequence. Unlike the germline V gene database which only contains the V gene sequences, other databases such as nr contain many rearranged sequences that also include a leader, the V, D, J and C genes. As a result, the best hit from these databases does not necessarily have the best match to the query V gene; Rather, it has the best match over the entire query sequence (For example, it may have very high similarity to the leader, D, J or C genes in a query sequence but only a low match to the V gene). This is not a problem if the goal is trying to find the best overall matches to a query sequence. However, if the goal is to find best matches to the V gene of a query sequence, then one needs to isolate the V gene part manually from a query sequence and then use it for a search.

With this option on, the V gene part from a query sequence is automatically isolated (based on comparison to hits from the germline V gene database) and then used for search against additional databases like nr. This option should be disabled, however, if the search intention is to find best hits based on overall matches.

Number of alignments

♦Number of alignments non-default value

Expect

♦Expect non-default value [?]

This is the statistical significance threshold for reporting matches against database sequences. Lower EXPECT thresholds are more stringent and report only high similarity matches. Choose higher EXPECT value (for example 1 or more) if you expect a low identity between your query sequence and the targets. Note that this option is only for the additional database search (it has no effect on the germline gene database search).

Show results in a new window