dinopy.fai_io module¶
Small module to read and write .fa.fai files.
A .fa.fai file contains a line for each chromosome in the corresponding fasta file, each consisting of 5 tab separated columns:
name of the chromosome
length of the chromosome in bytes
starting position of the chromosomes in the fasta file (in bytes)
length of a line in the FASTA file (in characters)
length of a line in the FASTA file (in bytes) this includes trailing \n
- dinopy.fai_io.chromosome_info_to_fai(chr_info, line_length=80)[source]¶
Convert dinopy chromosome info to fai-lines.
- Parameters:
chr_info (list) – Containing chromosome info entries in the format:
chr_name, chr_length, (chr_start, chr_stop)
line_length (int) – Line length in the FASTA file.
- Returns:
List containing a valid fai-entry for each chromosome.
- Return type:
list
- dinopy.fai_io.fai_entry_to_chromosome_info_entry(fai_entry)[source]¶
Convert a fai-entry to dinopy chromosome info format.
Converts from
[chr_name, chr_len, chr_start, line_length, line_length_bytes]
(file-view) to[chr_name, chr_len, (chr_start, chr_stop)]
entries (genome-array-view i.e. without names, newlines, and ‘>’).- Parameters:
fai_entry (list) – Containing a valid fai entry
(chr_name, chr_len, chr_start, line_length, line_length_bytes)
- Returns:
A valid chromosome info entry.
- Return type:
list
- dinopy.fai_io.fai_to_chromosome_info(fai_entries)[source]¶
Convert the given fai-entries to dinopy chromosome info format.
Converts from a list of
[chr_name, chr_len, chr_start, line_length, line_length_bytes]
(file-view) to a list of[chr_name, chr_len, (chr_start, chr_stop)]
entries (genome-array-view i.e. without names, newlines, and ‘>’)- Parameters:
fai_entries (Iterable) – An iterable of valid fai-entries.
- Returns:
A list containing valid chromosome info entries.
- Return type:
list
- dinopy.fai_io.is_valid_fai(fai_entries)[source]¶
Check if the given list of potential fai entries is valid.
- Parameters:
fai_entries (list) – List of fai entries that will be validated.
- Returns:
True if all entries in the list are valid. False if not.
- Return type:
bool
Note
A valid fai entry has the following structure:
name of the chromosome
length of the chromosome in bytes
starting position of the chromosomes in the FASTA file (in bytes)
length of a line in the FASTA file (in characters)
length of a line in the FASTA file (in bytes) this includes trailing
- dinopy.fai_io.is_valid_fai_entry(fai_entry)[source]¶
Check if the given fai entry is valid.
- Parameters:
fai_entry (collection) – Collection that will be checked for fulfillment of all prerequisites of a fai-entry.
- Returns:
True if the entry is valid, False if not.
- Return type:
bool
Note
A valid fai entry has the following structure:
name of the chromosome
length of the chromosome in bytes
starting position of the chromosomes in the FASTA file (in bytes)
length of a line in the FASTA file (in characters)
length of a line in the FASTA file (in bytes) this includes trailing
- dinopy.fai_io.read_fai(path)[source]¶
Read and parse a .fa.fai (FASTA annotation index) file.
- Parameters:
path (str) – Path to a .fa.fai file
- Returns:
A list of all fai entries as a list of tuples, each containing (name, length in bytes, startpos (bytes), line_length, line_length_bytes) as string.
- Return type:
list
- dinopy.fai_io.write_chromosomes_as_fai(path, chromosomes, line_length)[source]¶
Write a fai file from a given chromosome list.
- Parameters:
path (str) – Path where the fai-file will be written to.
chromosomes (list) – Each item should contain information about one chromosome in dinopy format (name, length, (start, stop)).
line_length (int) – length of the lines (in charcters) in the FASTA file.
line_bytes (int) – length of the lines (in bytes) in the FASTA file.
- dinopy.fai_io.write_fai(target, fai_entries)[source]¶
Write specified fai to given target.
- Parameters:
target (str) – Target where the fai-file will be written to.
fai_entries (list) – Each item should contain all needed tokens for a valid fai-line. (name, length, start index, line length (in characters), line length (in bytes)).