dinopy.definitions module

This module contains all important definitions used by dinopy:

  • Custom types (like basenumbers and two_bit, see Note about dtype)

  • Dictionaries used for conversions

  • Constants for the byte values of DNA- and FASTA-/FASTQ-characters

  • Return types (NamedTuple)

Note

Some of the defined variables, like the conversion dicts, are cdefed and can only be accessed from cython. To convert between sequence types please use the conversion methods from dinopy.conversion.

class dinopy.definitions.FastaChromosomeC

Bases: object

__getitem__(key, /)

Return self[key].

length
name
sequence
class dinopy.definitions.FastaChromosomeInfoC

Bases: object

__getitem__(key, /)

Return self[key].

interval
length
name
class dinopy.definitions.FastaEntryC

Bases: object

__getitem__(key, /)

Return self[key].

interval
length
name
sequence
class dinopy.definitions.FastaGenomeC

Bases: object

__getitem__(key, /)

Return self[key].

info
sequence
class dinopy.definitions.FastaReadC

Bases: object

__getitem__(key, /)

Return self[key].

name
sequence
class dinopy.definitions.FastqLineC

Bases: object

__getitem__(key, /)

Return self[key].

type
value
class dinopy.definitions.FastqReadC

Bases: object

__getitem__(key, /)

Return self[key].

name
quality
sequence
class dinopy.definitions.FastqReadWithoutNameC

Bases: object

__getitem__(key, /)

Return self[key].

quality
sequence
class dinopy.definitions.FastqReadWithoutQVC

Bases: object

__getitem__(key, /)

Return self[key].

name
sequence
class dinopy.definitions.IUPACRandomReplacementDict(args, kwargs)

Bases: dict

Dictionary subclass, that returns the 2bit representation for a IUPAC character. If a IUPAC ambiguity code is encountered, a base satisfying the code is (uniformly) randomly chosen.

Example

>>> irrd = dinopy.definitions.IUPACRandomReplacementDict()
>>> irrd["A"]
0
>>> irrd["C"]
1
>>> irrd["R"]   # R = A or G
0
>>> irrd["R"]   # R = A or G
2
>>> irrd["N"]   # N = A or C or G or T
3
>>> irrd["N"]   # N = A or C or G or T
1
>>> irrd["N"]   # N = A or C or G or T
2
>>> irrd["N"]   # N = A or C or G or T
1
__getitem__()

Return a 2bit representation for key.

Parameters:

key (str, bytes or int) – A IUPAC codepoint.

Returns:

A base satisfying the entered IUPAC codepoint.

class dinopy.definitions.basenumbers

Bases: object

Basenumbers type for dtype parameters.

  • A → 0

  • C → 1

  • G → 2

  • T → 3

Represents bases as integer numbers (saved as bytes). See Note about dtype for details.

class dinopy.definitions.bit_base

Bases: object

class dinopy.definitions.four_bit(args)

Bases: bit_base

Two bit encoding type for encoding parameters. For convenience, four_bit behaves a lot like int in that it can either be used as a type (for example in dinopy.processors.qgrams, see Note about dtype for more information), or used as a conversion function (as in four_bit("ACGT") == 0b0001001001001000) [1] .

  • A → 0b0001

  • C → 0b0010

  • G → 0b0100

  • T → 0b1000

  • N → 0b1111

four_bit.__new__(cls, args)

class dinopy.definitions.two_bit(args)

Bases: bit_base

Two bit encoding type for encoding parameters. For convenience, two_bit behaves a lot like int in that it can either be used as a type (for example in dinopy.processors.qgrams, see Note about dtype for more information), or used as a conversion function (as in two_bit("ACGT") == 0b00011011) [2] .

Bases ‘A’, ‘C’, ‘G’ and ‘T’ get replaced according to the following mapping:

  • A → 0b00

  • C → 0b01

  • G → 0b10

  • T → 0b11

Note that the bit complement of A is T and the bit complement of C is G.

two_bit.__new__(cls, args)