dinopy.sambam module¶
Common declarations for SAM/BAM handling, most prominently AlignmentReord
which corresponds to a single line in a standard SAM file.
- class dinopy.sambam.AlignmentRecord¶
An AlignmentRecord resembles a single line in a SAM file and consists of exactly 11 (or 12) columns:
query_name
flag
rname
pos
mapping_quality
cigar
rnext
pnext
template_length
seq
qual
(optional)
AlignmentRecords are not orderable but can be compared for (in-)equality: Two AlignmentRecords are equal iff all of their fields are equal. For example:
ar1 = AlignmentRecord.fromvalues('r004', 0, 'ref', 16, 30, '6M14N5M', '*', 0, 0, 'ATAGCTTCAGC', '*', None) ar2 = AlignmentRecord.fromvalues('r003', 0, 'ref', 16, 30, '6M14N5M', '*', 0, 0, 'ATAGCTTCAGC', '*', None) assert(ar1 != ar2) # True
- cigar¶
unicode
- Type:
cigar
- flag¶
‘Flag’
- Type:
flag
- classmethod fromdict(cls, dict d, bool fill_missing=True)¶
Create a new
AlignmentRecord
from a given dictionary. 0-based positions.- Parameters:
d (dict) – dictionary containing necessary information to create an AlignmentRecord, i.e.
{'query_name': a, 'flag': b, 'rname': c, 'pos': d, 'mapping_quality': e, 'cigar': f, 'rnext': g, 'pnext': h, 'template_length': i, 'query_sequence': j, 'qual': k, 'optional': l}
where some of the fields may be omitted (seefill_missing
).fill_missing (bool) –
If
False
, missing fields will raise a KeyError. IfTrue
, missing fields will be assigned their respective default value. (Default:True
).The following fields, if omitted, will be replaced by their respective default values:
Field
Default value
query_name
’*’
rname
’*’
cigar
’*’
rnext
’*’
qual
’*’
seq
’*’
mapping_quality
255
pos
0
pnext
0
template_length
0
Other validations performed by this method:
length of qual and seq must match
if qual and seq are given and qual is not ‘*’, seq must not be ‘*’
Other operations performed by this method:
The value of flag will both get stored in AlignmentRecord.flag.integer_representation and split into its parts (i.e. all of the other fields in
AlignmentRecord.flag
.
Examples
No optional column:
from dinopy.sambam import AlignmentRecord ar = AlignmentRecord.fromdict({'query_name': 'r001', 'flag': 99, 'rname': 'ref', 'pos': 7, 'mapping_quality': 30, 'cigar': '8M2I4M1D3M', 'rnext': '=', 'pnext': 37, 'template_length': 39, 'query_sequence': 'TTAGATAAAGGATACTG', 'qual': '*', 'optional': None})
- classmethod fromkeywords(cls, unicode qname=u'*', int flag=0, unicode rname=u'*', int pos=0, int mapq=255, unicode cigar=u'*', unicode rnext=u'*', int pnext=0, int tlen=0, unicode seq=u'*', unicode qual=u'*', dict optional=None)¶
Same as
AlignmentRecord.fromvalues
but with keyword arguments instead. 0-based positions.- Parameters:
qname (str) – Default
"*"
.flag (int) – Default
0
.rname (str) – Default
"*"
.pos (int) – Default
0
.mapq (int) – Default
255
.cigar (str) – Default
"*"
.rnext (str) – Default
"*"
.pnext (int) – Default
0
.tlen (int) – Default
0
.seq (str) – Default
"*"
.qual (str) – Default
"*"
.optional (dict) – Default
None
.
Examples
Specifying everything:
from dinopy.sambam import AlignmentRecord # r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG * ar = AlignmentRecord.fromkeywords(qname="r001", flag=99, rname="ref", pos=7, mapq=30, cigar="8M2I4M1D3M", rnext="=", pnext=37, tlen=39, seq="TTAGATAAAGGATACTG", qual="*", optional=None)
Only specify arguments that do not have the default values:
from dinopy.sambam import AlignmentRecord # r002 0 ref 9 30 3S6M1P1I4M * 0 0 AAAAGATAAGGATA * ar = AlignmentRecord.fromkeywords(qname="r002", rname="ref", pos=9, mapq=30, cigar="3S6M1P1I4M", seq="AAAAGATAAGGATA", optional=None)
- classmethod fromstr(cls, unicode s)¶
Create a new
AlignmentRecord
from a given string (as found in SAM files, i.e. 11+ tab delimited columns), with 1-based positions. AlignmentRecords are 0-based internally, only SAM string representations use 1-based positions (as per SAM specifications).- Parameters:
s (str) – A string describing an AlignmentRecord (as found in SAM files, i.e. 11+ tab delimited columns)
Examples
No optional column, literal tabs:
from dinopy.sambam import AlignmentRecord ar = AlignmentRecord.fromstr("r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG *")
No optional column, escaped tabs:
from dinopy.sambam import AlignmentRecord ar = AlignmentRecord.fromstr("r004\t0\tref\t16\t30\t6M14N5M\t*\t0\t0\tATAGCTTCAGC\t*")
With optional column, literal tabs:
from dinopy.sambam import AlignmentRecord ar = AlignmentRecord.fromstr("r003 2064 ref 29 17 6H5M * 0 0 TAGGC * SA:Z:ref,9,+,5S6M,30,1;")
- classmethod fromvalues(cls, unicode qname, int flag, unicode rname, int pos, int mapq, unicode cigar, unicode rnext, int pnext, int tlen, unicode seq, unicode qual, dict optional)¶
Create a new
AlignmentRecord
using the specified arguments.optional
is a dictionary of the form{'XY' : Z, }
(where'XY'
is a two character tag and Z its value, omitting the type usually found in SAM files because it can be inferred) with any optional (surprise!) columns. For more information on optional columns, see the SAM specification. Also seecreate_flag
for a convenient way to calculate the flag argument. 0-based positions.Examples
No optional column:
from dinopy.sambam import AlignmentRecord ar = AlignmentRecord.fromvalues('r004', 0, 'ref', 16, 30, '6M14N5M', '*', 0, 0, 'ATAGCTTCAGC', '*', None)
With optional column:
from dinopy.sambam import AlignmentRecord ar = AlignmentRecord.fromvalues('r003', 0, 'ref', 9, 30, '5S6M', '*', 0, 0, 'GCCTAAGCTAA', '*', {'SA': 'ref,29,-,6H5M,17,0;'})
Important
Does not perform any validation whatsoever.
- get_sam_repr(self) unicode ¶
Returns the AlignmentRecords representation as seen in SAM files, i.e. 11 TAB-delimited values if the optional (column) is None, 12 TAB-delimited values otherwise. Note that positions are stored 0-based internally but displayed as 1-based positions as per the SAM specification.
- mapping_quality¶
‘int’
- Type:
mapping_quality
- optional¶
dict
- Type:
optional
- optional_raw¶
- pnext¶
‘int’
- Type:
pnext
- pos¶
‘int’
- Type:
pos
- qual¶
unicode
- Type:
qual
- query_name¶
unicode
- Type:
query_name
- query_sequence¶
unicode
- Type:
query_sequence
- rname¶
unicode
- Type:
rname
- rnext¶
unicode
- Type:
rnext
- template_length¶
‘int’
- Type:
template_length
- dinopy.sambam.cigar_str_from_pysam_cigartuples(list tuples) unicode ¶
- dinopy.sambam.cigar_str_from_tuples(list tuples) unicode ¶
- dinopy.sambam.create_flag(bool template_having_multiple_segments_in_sequencing=False, bool each_segment_properly_aligned=False, bool segment_unmapped=False, bool next_segment_in_template_unmapped=False, bool reverse_complemented=False, bool next_segment_reverse_complemented=False, bool first_segment=False, bool last_segment=False, bool secondary_alignment=False, bool not_passing_filters=False, bool pcr_or_optical_duplicate=False, bool supplementary_alignment=False) int ¶
Calculates the integer representation (“Flag”) of the combination of given boolean attributes
- Parameters:
template_having_multiple_segments_in_sequencing –
each_segment_properly_aligned –
segment_unmapped –
next_segment_in_template_unmapped –
reverse_complemented –
next_segment_reverse_complemented –
first_segment –
last_segment –
secondary_alignment –
not_passing_filters –
pcr_or_optical_duplicate –
supplementary_alignment –
- Returns:
the integer representation (“Flag”) of the combination of given boolean attributes
Examples
Integer describing the flag when
reverse_complemented
andfirst_segment
are set:from dinopy.sambam import create_flag create_flag(reverse_complemented=True, first_segment=True) # 80 == 16 + 64 == 0b1000000 | 0b0010000
- dinopy.sambam.get_flag_description(AlignmentRecord al) unicode ¶
Returns a human-readable version of an `AlignmentRecord`s flag attribute.
- Parameters:
al –
- Returns:
a human-readable version of the
AlignmentRecord
flag attribute. A flag value of17 == 0x11 == 0x10 | 0x1
corresponds to template_having_multiple_segments_in_sequencing and reverse_complemented, so the resulting string will equal ‘template_having_multiple_segments_in_sequencing, reverse_complemented’.
- dinopy.sambam.pysam_cigartuples_from_cigar_str(unicode cigar) list ¶
- dinopy.sambam.tuples_from_cigar_str(unicode cigar) list ¶