Interpol Hashing

Requirements:

  1. Operating system(s): all
  2. Any restrictions to use by non-academics: none

Download (R)

Please cite:
TBA

License:
MIT License

Usage:

The zip file contains three files:

  1. README.txt
  2. InterpolHashing.R
  3. Interpol_1.3.2.tar.gz

IMPORTANT

1) Make sure that you installed Interpol 1.3.2 from the Interpol_1.3.2.tar.gz.

install.packages("Interpol_1.3.2.tar.gz")

2) Install Seqinr from CRAN

install.packages("seqinr")

3) Download a fasta file as database, e.g., Swissprot from https://www.uniprot.org/help/downloads

4) Source the InterpolHashing.R file

source("InterpolHashing.R")

5) Create Interpol Hashing Database with the following commands:

database <- seqinr::read.fasta("uniprot_sprot.fasta.gz",as.string=TRUE, forceDNAtolower = FALSE)

database <- createDatabase(database)

database <- encodeDatabase(database, length_factor = 300)

6) Now you can search for sequences. Please make sure that you define the variable „query“ as a protein sequence, e.g.,

query <- "ALGATIIAGASLTFKILDEV"
getSequence(query, database, percentage = 0.01)

The result should look like this:

identifier    sequence length    score avgScore             p
sp|P58689|21DD_HETMG ALAGTIIAGASLTFKILDEV     20 2.549747 13.21368 1.831873e-218