dinopy.sambam module

Common declarations for SAM/BAM handling, most prominently AlignmentReord which corresponds to a single line in a standard SAM file.

class dinopy.sambam.AlignmentRecord

An AlignmentRecord resembles a single line in a SAM file and consists of exactly 11 (or 12) columns:

query_name

flag

rname

pos

mapping_quality

cigar

rnext

pnext

template_length

seq

qual

(optional)

AlignmentRecords are not orderable but can be compared for (in-)equality: Two AlignmentRecords are equal iff all of their fields are equal. For example:

ar1 = AlignmentRecord.fromvalues('r004', 0, 'ref', 16, 30, '6M14N5M', '*', 0, 0, 'ATAGCTTCAGC', '*', None)
ar2 = AlignmentRecord.fromvalues('r003', 0, 'ref', 16, 30, '6M14N5M', '*', 0, 0, 'ATAGCTTCAGC', '*', None)
assert(ar1 != ar2)  # True
cigar

unicode

Type:

cigar

flag

‘Flag’

Type:

flag

classmethod fromdict(cls, dict d, bool fill_missing=True)

Create a new AlignmentRecord from a given dictionary. 0-based positions.

Parameters:
  • d (dict) – dictionary containing necessary information to create an AlignmentRecord, i.e. {'query_name': a, 'flag': b, 'rname': c, 'pos': d, 'mapping_quality': e, 'cigar': f, 'rnext': g, 'pnext': h, 'template_length': i, 'query_sequence': j, 'qual': k, 'optional': l} where some of the fields may be omitted (see fill_missing).

  • fill_missing (bool) –

    If False, missing fields will raise a KeyError. If True, missing fields will be assigned their respective default value. (Default: True).

    The following fields, if omitted, will be replaced by their respective default values:

    Field

    Default value

    query_name

    ’*’

    rname

    ’*’

    cigar

    ’*’

    rnext

    ’*’

    qual

    ’*’

    seq

    ’*’

    mapping_quality

    255

    pos

    0

    pnext

    0

    template_length

    0

Other validations performed by this method:

  • length of qual and seq must match

  • if qual and seq are given and qual is not ‘*’, seq must not be ‘*’

Other operations performed by this method:

  • The value of flag will both get stored in AlignmentRecord.flag.integer_representation and split into its parts (i.e. all of the other fields in AlignmentRecord.flag.

Examples

  1. No optional column:

    from dinopy.sambam import AlignmentRecord
    ar = AlignmentRecord.fromdict({'query_name': 'r001', 'flag': 99, 'rname': 'ref', 'pos': 7, 'mapping_quality': 30, 'cigar': '8M2I4M1D3M', 'rnext': '=', 'pnext': 37, 'template_length': 39, 'query_sequence': 'TTAGATAAAGGATACTG', 'qual': '*', 'optional': None})
    
classmethod fromkeywords(cls, unicode qname=u'*', int flag=0, unicode rname=u'*', int pos=0, int mapq=255, unicode cigar=u'*', unicode rnext=u'*', int pnext=0, int tlen=0, unicode seq=u'*', unicode qual=u'*', dict optional=None)

Same as AlignmentRecord.fromvalues but with keyword arguments instead. 0-based positions.

Parameters:
  • qname (str) – Default "*".

  • flag (int) – Default 0.

  • rname (str) – Default "*".

  • pos (int) – Default 0.

  • mapq (int) – Default 255.

  • cigar (str) – Default "*".

  • rnext (str) – Default "*".

  • pnext (int) – Default 0.

  • tlen (int) – Default 0.

  • seq (str) – Default "*".

  • qual (str) – Default "*".

  • optional (dict) – Default None.

Examples

  1. Specifying everything:

    from dinopy.sambam import AlignmentRecord
    # r001  99      ref     7       30      8M2I4M1D3M      =       37      39      TTAGATAAAGGATACTG       *
    ar = AlignmentRecord.fromkeywords(qname="r001", flag=99, rname="ref", pos=7, mapq=30, cigar="8M2I4M1D3M", rnext="=", pnext=37, tlen=39, seq="TTAGATAAAGGATACTG", qual="*", optional=None)
    
  2. Only specify arguments that do not have the default values:

    from dinopy.sambam import AlignmentRecord
    # r002  0       ref     9       30      3S6M1P1I4M      *       0       0       AAAAGATAAGGATA  *
    ar = AlignmentRecord.fromkeywords(qname="r002", rname="ref", pos=9, mapq=30, cigar="3S6M1P1I4M", seq="AAAAGATAAGGATA", optional=None)
    
classmethod fromstr(cls, unicode s)

Create a new AlignmentRecord from a given string (as found in SAM files, i.e. 11+ tab delimited columns), with 1-based positions. AlignmentRecords are 0-based internally, only SAM string representations use 1-based positions (as per SAM specifications).

Parameters:

s (str) – A string describing an AlignmentRecord (as found in SAM files, i.e. 11+ tab delimited columns)

Examples

  1. No optional column, literal tabs:

    from dinopy.sambam import AlignmentRecord
    ar = AlignmentRecord.fromstr("r001      99      ref     7       30      8M2I4M1D3M      =       37      39      TTAGATAAAGGATACTG       *")
    
  2. No optional column, escaped tabs:

    from dinopy.sambam import AlignmentRecord
    ar = AlignmentRecord.fromstr("r004\t0\tref\t16\t30\t6M14N5M\t*\t0\t0\tATAGCTTCAGC\t*")
    
  3. With optional column, literal tabs:

    from dinopy.sambam import AlignmentRecord
    ar = AlignmentRecord.fromstr("r003      2064    ref     29      17      6H5M    *       0       0       TAGGC   *       SA:Z:ref,9,+,5S6M,30,1;")
    
classmethod fromvalues(cls, unicode qname, int flag, unicode rname, int pos, int mapq, unicode cigar, unicode rnext, int pnext, int tlen, unicode seq, unicode qual, dict optional)

Create a new AlignmentRecord using the specified arguments. optional is a dictionary of the form {'XY' : Z, } (where 'XY' is a two character tag and Z its value, omitting the type usually found in SAM files because it can be inferred) with any optional (surprise!) columns. For more information on optional columns, see the SAM specification. Also see create_flag for a convenient way to calculate the flag argument. 0-based positions.

Examples

  1. No optional column:

    from dinopy.sambam import AlignmentRecord
    ar = AlignmentRecord.fromvalues('r004', 0, 'ref', 16, 30, '6M14N5M', '*', 0, 0, 'ATAGCTTCAGC', '*', None)
    
  2. With optional column:

    from dinopy.sambam import AlignmentRecord
    ar = AlignmentRecord.fromvalues('r003', 0, 'ref', 9, 30, '5S6M', '*', 0, 0, 'GCCTAAGCTAA', '*', {'SA': 'ref,29,-,6H5M,17,0;'})
    

Important

Does not perform any validation whatsoever.

get_sam_repr(self) unicode

Returns the AlignmentRecords representation as seen in SAM files, i.e. 11 TAB-delimited values if the optional (column) is None, 12 TAB-delimited values otherwise. Note that positions are stored 0-based internally but displayed as 1-based positions as per the SAM specification.

mapping_quality

‘int’

Type:

mapping_quality

optional

dict

Type:

optional

optional_raw
pnext

‘int’

Type:

pnext

pos

‘int’

Type:

pos

qual

unicode

Type:

qual

query_name

unicode

Type:

query_name

query_sequence

unicode

Type:

query_sequence

rname

unicode

Type:

rname

rnext

unicode

Type:

rnext

template_length

‘int’

Type:

template_length

dinopy.sambam.cigar_str_from_pysam_cigartuples(list tuples) unicode
dinopy.sambam.cigar_str_from_tuples(list tuples) unicode
dinopy.sambam.create_flag(bool template_having_multiple_segments_in_sequencing=False, bool each_segment_properly_aligned=False, bool segment_unmapped=False, bool next_segment_in_template_unmapped=False, bool reverse_complemented=False, bool next_segment_reverse_complemented=False, bool first_segment=False, bool last_segment=False, bool secondary_alignment=False, bool not_passing_filters=False, bool pcr_or_optical_duplicate=False, bool supplementary_alignment=False) int

Calculates the integer representation (“Flag”) of the combination of given boolean attributes

Parameters:
  • template_having_multiple_segments_in_sequencing

  • each_segment_properly_aligned

  • segment_unmapped

  • next_segment_in_template_unmapped

  • reverse_complemented

  • next_segment_reverse_complemented

  • first_segment

  • last_segment

  • secondary_alignment

  • not_passing_filters

  • pcr_or_optical_duplicate

  • supplementary_alignment

Returns:

the integer representation (“Flag”) of the combination of given boolean attributes

Examples

  1. Integer describing the flag when reverse_complemented and first_segment are set:

    from dinopy.sambam import create_flag
    create_flag(reverse_complemented=True, first_segment=True)  # 80 == 16 + 64 == 0b1000000 | 0b0010000
    
dinopy.sambam.get_flag_description(AlignmentRecord al) unicode

Returns a human-readable version of an `AlignmentRecord`s flag attribute.

Parameters:

al

Returns:

a human-readable version of the AlignmentRecord flag attribute. A flag value of 17 == 0x11 == 0x10 | 0x1 corresponds to template_having_multiple_segments_in_sequencing and reverse_complemented, so the resulting string will equal ‘template_having_multiple_segments_in_sequencing, reverse_complemented’.

dinopy.sambam.pysam_cigartuples_from_cigar_str(unicode cigar) list
dinopy.sambam.tuples_from_cigar_str(unicode cigar) list