This module contains all important definitions used by dinopy:
Custom types (like basenumbers and two_bit, see Note about dtype)
Dictionaries used for conversions
Constants for the byte values of DNA- and FASTA-/FASTQ-characters
Return types (NamedTuple)
Note
Some of the defined variables, like the conversion dicts, are
cdefed and can only be accessed from cython.
To convert between sequence types please use the conversion methods from
dinopy.conversion.
Dictionary subclass, that returns the 2bit representation for a IUPAC character.
If a IUPAC ambiguity code is encountered, a base satisfying the code
is (uniformly) randomly chosen.
Example
>>> irrd=dinopy.definitions.IUPACRandomReplacementDict()>>> irrd["A"]0>>> irrd["C"]1>>> irrd["R"]# R = A or G0>>> irrd["R"]# R = A or G2>>> irrd["N"]# N = A or C or G or T3>>> irrd["N"]# N = A or C or G or T1>>> irrd["N"]# N = A or C or G or T2>>> irrd["N"]# N = A or C or G or T1
Two bit encoding type for encoding parameters.
For convenience, four_bit behaves a lot like int in that it can either be used as a type
(for example in dinopy.processors.qgrams, see Note about dtype for more information),
or used as a conversion function (as in four_bit("ACGT")==0b0001001001001000) [1] .
Two bit encoding type for encoding parameters.
For convenience, two_bit behaves a lot like int in that it can either be used as a type
(for example in dinopy.processors.qgrams, see Note about dtype for more information),
or used as a conversion function (as in two_bit("ACGT")==0b00011011) [2] .
Bases ‘A’, ‘C’, ‘G’ and ‘T’ get replaced according to the following mapping:
A → 0b00
C → 0b01
G → 0b10
T → 0b11
Note that the bit complement of A is T and the bit complement of C is G.