dinopy.fai_io module

Small module to read and write .fa.fai files.

A .fa.fai file contains a line for each chromosome in the corresponding fasta file, each consisting of 5 tab separated columns:

  1. name of the chromosome

  2. length of the chromosome in bytes

  3. starting position of the chromosomes in the fasta file (in bytes)

  4. length of a line in the FASTA file (in characters)

  5. length of a line in the FASTA file (in bytes) this includes trailing \n

dinopy.fai_io.chromosome_info_to_fai(chr_info, line_length=80)[source]

Convert dinopy chromosome info to fai-lines.

Parameters:
  • chr_info (list) – Containing chromosome info entries in the format: chr_name, chr_length, (chr_start, chr_stop)

  • line_length (int) – Line length in the FASTA file.

Returns:

List containing a valid fai-entry for each chromosome.

Return type:

list

dinopy.fai_io.fai_entry_to_chromosome_info_entry(fai_entry)[source]

Convert a fai-entry to dinopy chromosome info format.

Converts from [chr_name, chr_len, chr_start, line_length, line_length_bytes] (file-view) to [chr_name, chr_len, (chr_start, chr_stop)] entries (genome-array-view i.e. without names, newlines, and ‘>’).

Parameters:

fai_entry (list) – Containing a valid fai entry (chr_name, chr_len, chr_start, line_length, line_length_bytes)

Returns:

A valid chromosome info entry.

Return type:

list

dinopy.fai_io.fai_to_chromosome_info(fai_entries)[source]

Convert the given fai-entries to dinopy chromosome info format.

Converts from a list of [chr_name, chr_len, chr_start, line_length, line_length_bytes] (file-view) to a list of [chr_name, chr_len, (chr_start, chr_stop)] entries (genome-array-view i.e. without names, newlines, and ‘>’)

Parameters:

fai_entries (Iterable) – An iterable of valid fai-entries.

Returns:

A list containing valid chromosome info entries.

Return type:

list

dinopy.fai_io.is_valid_fai(fai_entries)[source]

Check if the given list of potential fai entries is valid.

Parameters:

fai_entries (list) – List of fai entries that will be validated.

Returns:

True if all entries in the list are valid. False if not.

Return type:

bool

Note

A valid fai entry has the following structure:

  1. name of the chromosome

  2. length of the chromosome in bytes

  3. starting position of the chromosomes in the FASTA file (in bytes)

  4. length of a line in the FASTA file (in characters)

  5. length of a line in the FASTA file (in bytes) this includes trailing

dinopy.fai_io.is_valid_fai_entry(fai_entry)[source]

Check if the given fai entry is valid.

Parameters:

fai_entry (collection) – Collection that will be checked for fulfillment of all prerequisites of a fai-entry.

Returns:

True if the entry is valid, False if not.

Return type:

bool

Note

A valid fai entry has the following structure:

  1. name of the chromosome

  2. length of the chromosome in bytes

  3. starting position of the chromosomes in the FASTA file (in bytes)

  4. length of a line in the FASTA file (in characters)

  5. length of a line in the FASTA file (in bytes) this includes trailing

dinopy.fai_io.read_fai(path)[source]

Read and parse a .fa.fai (FASTA annotation index) file.

Parameters:

path (str) – Path to a .fa.fai file

Returns:

A list of all fai entries as a list of tuples, each containing (name, length in bytes, startpos (bytes), line_length, line_length_bytes) as string.

Return type:

list

dinopy.fai_io.write_chromosomes_as_fai(path, chromosomes, line_length)[source]

Write a fai file from a given chromosome list.

Parameters:
  • path (str) – Path where the fai-file will be written to.

  • chromosomes (list) – Each item should contain information about one chromosome in dinopy format (name, length, (start, stop)).

  • line_length (int) – length of the lines (in charcters) in the FASTA file.

  • line_bytes (int) – length of the lines (in bytes) in the FASTA file.

dinopy.fai_io.write_fai(target, fai_entries)[source]

Write specified fai to given target.

Parameters:
  • target (str) – Target where the fai-file will be written to.

  • fai_entries (list) – Each item should contain all needed tokens for a valid fai-line. (name, length, start index, line length (in characters), line length (in bytes)).