dinopy.fasta_writer module¶
- class dinopy.fasta_writer.FastaWriter(target, write_fai=False, force_overwrite=False, append=False)¶
FastaWriter for writing genomes (or reads) to disk in FASTA format.
- Parameters:
target (str, bytes, file or sys.stdout) – Path where the file will be written to. If the path ends with the suffix .gz, a gzipped file will be created.
force_overwrite (bool) – If set to
True
overwrites existing FASTA files with the same name. (Default: False)append (bool) – If set to
True
, existing files will not be overwritten. Reads will be appended to the end of the file. (Default: False)write_fai (bool, bytes or string) – If
write_fai
denotes a path, write FASTA annotation information file to the specified path. Ifwrite_fai
isTrue
(and does not resemble a path, i.e. is not an instance ofstr
orbytes
) annotation information will be written tofilepath + '.fai'
, which is the default behaviour. Note thatforce_overwrite
andappend
apply to fai-files aswell. (Default: False)line_width (int) – The maximum number of characters per line, excluding newlines. (Default: 80)
- Raises:
ValueError – If the filename is invalid.
ValueError – If contradicting parameters are passed (overwrite=True and append=True).
TypeError – If target is neither a file, nor a path nor stdout.
IOError – If target is a file opened in the wrong mode.
FileExistsError – If target file (FASTA or fai) already exists and neither overwrite nor append have been specified.
Methods intended for public use are:
write_genome()
: Write a whole genome to file.write_entries()
: Writes a list of tuples containing entry names and entry sequences. The tuples have to be in the format:(entry_sequence, entry_name)
An entry can be a chromosome or a read.
write_entry()
: Write a single entry to the openend file.write_chromosomes()
: Writes a list of chromosomes to file. Each chromosome must consist of a tuple containing:(chromosome_sequence, chromosome_name)
write_chromosome()
: Writes a single chromosome to file.
Examples
Write a genome of three chromosomes from a single sequence:
seq = b"ACGTAACCGGTTAAACCCGGGTTT" chr_info = [ (b"single", 4, (0,4)), (b"double", 8, (4,12)), (b"triple", 12, (12,24)), ] with dinopy.FastaWriter('somefile.fasta') as faw: faw.write_genome(seq, chr_info)
Write a genome of three chromosomes from separate chromosomes:
chromosomes = [ ('ACGTACGT', b'chr1'), ('GCGTAGGATGGGCCTATCGA', b'chr2'), ('CCATAGGATAGACCANNACAGATCAN', b'chr3'), ] with dinopy.FastaWriter('somefile.fasta') as faw: faw.write_chromosomes(chromosomes, dtype=str)
- close(self)¶
Close the file (after writing).
Note
This should only be used if the exact number of files is not known at develpoment time. Otherwise the use of the environment is encouraged, as it is much harder to ‘forget’ closing an opened file.
- write_chromosome(self, chromosome, type dtype=bytes)¶
Write a single chromosome to the opened FASTA file.
Note: Alias for
write_entry()
.- Parameters:
chromosome (tuple) – Containing chromosome sequence (as dtype) and chromosome name (bytes).
dtype (type) – Type of the sequence. (See dtype; Default: bytes)
- write_chromosomes(self, chromosomes, type dtype=bytes)¶
Write chromosomes to the specified filepath.
Note: Alias for
write_entries(entry, dtype).
- Parameters:
chromosomes (iterable) – Iterable of (sequence, name) tuples, where seq is the sequence of the chromosome (as dtype) and name is the chromosome name as bytes.
dtype (type) – Type of the sequence. (See dtype; Default: bytes)
- Raises:
IOError – If no output FASTA file has been opened.
Note
This method is used to write a list of separate chromosomes to file. To split up a long sequence into chromosomes please use
write_genome(genome, chr_info, ...)
, wherechr_info
is a list of tuples that contain name, length (start, stop) for each chromosome, or just a name (as str/bytes) if the organism only has one chromosome.
- write_entries(self, entries, type dtype=bytes)¶
Write entries to the specified filepath.
- Parameters:
entries (iterable) – Iterable of (seq, name) tuples, where seq is the sequence of the entry (as dtype) and name is the entry’s name as bytes.
dtype (type) – Type of the sequence. (See dtype; Default: bytes)
- Raises:
IOError – If no output FASTA file has been opened.
Note
This method is used to write a list of separate entries to file. To split up a long sequence into chromosomes please use
write_genome(genome, chr_info, ...)
, wherechr_info
is a list of tuples that contain name, length (start, stop) for each chromosome, or just a name (as str/bytes) if the organism only has one chromosome.
- write_entry(self, entry, type dtype=bytes)¶
Write a single entry to the opened FASTA file.
- Parameters:
entry (tuple) – Containing entry sequence (as dtype) and entry name (bytes).
dtype (type) – Type of the sequence. (See dtype; Default: bytes)
- Raises:
IOError – If no output FASTA file has been opened.
- write_genome(self, genome, chromosome_info=None, type dtype=bytes)¶
Write a genome to the specified filepath.
- Parameters:
genome (dtype) – Genome sequence to be written to file as a single iterable of dtype.
chr_info (tuple, str or bytes) – Chromosome names and borders in the format:
(chr_name[str], length[int], chr_interval[tuple of two ints])
or a single (byte)string. If a single (byte)string is encountered, it will be used as a genome name and the whole genome sequence will be written as a “single chromosome”.dtype (type) – Type of the sequence. (See dtype; Default: bytes)
- Raises:
IOError – If no output FASTA file has been opened.
Note
The separation of the genome is handled according to the given chromosome info. If the sequences is already split up into chromosomes please use write_chromosome / write_entry which do not need chromosome info to be specified.
If
chromosome_info
is a string or bytes, the genome is treated as a single chromosome with the string as name. If multiple chromosomes are to be writtenchromosome_info
has to be a list of tuples in the format:(chr_name[str], length[int], chr_interval[tuple of two ints])