dragonn.simulations module

dragonn.simulations.get_distribution(GC_fraction)[source]
dragonn.simulations.motif_density(motif_name, seq_length, num_seqs, min_counts, max_counts, GC_fraction, central_bp=None)[source]

returns sequences with motif density.

dragonn.simulations.simple_motif_embedding(motif_name, seq_length, num_seqs, GC_fraction)[source]

returns sequence array

dragonn.simulations.simulate_differential_accessibility(pos_motif_names, neg_motif_names, seq_length, min_num_motifs, max_num_motifs, num_pos, num_neg, GC_fraction)[source]

Generates data for differential accessibility task.

pos_motif_names : list
List of strings.
neg_motif_names : list
List of strings.

seq_length : int min_num_motifs : int max_num_motifs : int num_pos : int num_neg : int GC_fraction : float

sequence_arr : 1darray
Contains sequence strings.
y : 1darray
Contains labels.
dragonn.simulations.simulate_heterodimer_grammar(motif1, motif2, seq_length, min_spacing, max_spacing, num_pos, num_neg, GC_fraction)[source]
Simulates two classes of sequences with motif1 and motif2:
  • Positive class sequences with motif1 and motif2 positioned min_spacing and max_spacing
  • Negative class sequences with independent motif1 and motif2 positioned

anywhere in the sequence, not as a heterodimer grammar

seq_length : int, length of sequence GC_fraction : float, GC fraction in background sequence num_pos : int, number of positive class sequences num_neg : int, number of negatice class sequences motif1 : str, encode motif name motif2 : str, encode motif name min_spacing : int, minimum inter motif spacing max_spacing : int, maximum inter motif spacing

sequence_arr : 1darray
Array with sequence strings.
y : 1darray
Array with positive/negative class labels.
dragonn.simulations.simulate_motif_counting(motif_name, seq_length, pos_counts, neg_counts, num_pos, num_neg, GC_fraction)[source]

Generates data for motif counting task. Parameters ———- motif_name : str seq_length : int pos_counts : list

(min_counts, max_counts) for positive set.
neg_counts : list
(min_counts, max_counts) for negative set.

num_pos : int num_neg : int GC_fraction : float Returns ——- sequence_arr : 1darray

Contains sequence strings.
y : 1darray
Contains labels.
dragonn.simulations.simulate_motif_density_localization(motif_name, seq_length, center_size, min_motif_counts, max_motif_counts, num_pos, num_neg, GC_fraction)[source]
Simulates two classes of seqeuences:
  • Positive class sequences with multiple motif instances in center of the sequence.
  • Negative class sequences with multiple motif instances anywhere in the sequence.

The number of motif instances is uniformly sampled between minimum and maximum motif counts.

motif_name : str
encode motif name
seq_length : int
length of sequence
center_size : int
length of central part of the sequence where motifs can be positioned
min_motif_counts : int
minimum number of motif instances
max_motif_counts : int
maximum number of motif instances
num_pos : int
number of positive class sequences
num_neg : int
number of negative class sequences
GC_fraction : float
GC fraction in background sequence
sequence_arr : 1darray
Contains sequence strings.
y : 1darray
Contains labels.
dragonn.simulations.simulate_multi_motif_embedding(motif_names, seq_length, min_num_motifs, max_num_motifs, num_seqs, GC_fraction)[source]

Generates data for multi motif recognition task. Parameters ———- motif_names : list

List of strings.

seq_length : int min_num_motifs : int max_num_motifs : int num_seqs : int GC_fraction : float Returns ——- sequence_arr : 1darray

Contains sequence strings.
y : ndarray
Contains labels for each motif.
dragonn.simulations.simulate_single_motif_detection(motif_name, seq_length, num_pos, num_neg, GC_fraction)[source]
Simulates two classes of seqeuences:
  • Positive class sequence with a motif embedded anywhere in the sequence
  • Negative class sequence without the motif
motif_name : str
encode motif name
seq_length : int
length of sequence
num_pos : int
number of positive class sequences
num_neg : int
number of negative class sequences
GC_fraction : float
GC fraction in background sequence
sequence_arr : 1darray
Array with sequence strings.
y : 1darray
Array with positive/negative class labels.