dragonn.simulations module¶
-
dragonn.simulations.
motif_density
(motif_name, seq_length, num_seqs, min_counts, max_counts, GC_fraction, central_bp=None)[source]¶ returns sequences with motif density.
-
dragonn.simulations.
simple_motif_embedding
(motif_name, seq_length, num_seqs, GC_fraction)[source]¶ returns sequence array
-
dragonn.simulations.
simulate_differential_accessibility
(pos_motif_names, neg_motif_names, seq_length, min_num_motifs, max_num_motifs, num_pos, num_neg, GC_fraction)[source]¶ Generates data for differential accessibility task.
- pos_motif_names : list
- List of strings.
- neg_motif_names : list
- List of strings.
seq_length : int min_num_motifs : int max_num_motifs : int num_pos : int num_neg : int GC_fraction : float
- sequence_arr : 1darray
- Contains sequence strings.
- y : 1darray
- Contains labels.
-
dragonn.simulations.
simulate_heterodimer_grammar
(motif1, motif2, seq_length, min_spacing, max_spacing, num_pos, num_neg, GC_fraction)[source]¶ - Simulates two classes of sequences with motif1 and motif2:
- Positive class sequences with motif1 and motif2 positioned min_spacing and max_spacing
- Negative class sequences with independent motif1 and motif2 positioned
anywhere in the sequence, not as a heterodimer grammar
seq_length : int, length of sequence GC_fraction : float, GC fraction in background sequence num_pos : int, number of positive class sequences num_neg : int, number of negatice class sequences motif1 : str, encode motif name motif2 : str, encode motif name min_spacing : int, minimum inter motif spacing max_spacing : int, maximum inter motif spacing
- sequence_arr : 1darray
- Array with sequence strings.
- y : 1darray
- Array with positive/negative class labels.
-
dragonn.simulations.
simulate_motif_counting
(motif_name, seq_length, pos_counts, neg_counts, num_pos, num_neg, GC_fraction)[source]¶ Generates data for motif counting task. Parameters ———- motif_name : str seq_length : int pos_counts : list
(min_counts, max_counts) for positive set.- neg_counts : list
- (min_counts, max_counts) for negative set.
num_pos : int num_neg : int GC_fraction : float Returns ——- sequence_arr : 1darray
Contains sequence strings.- y : 1darray
- Contains labels.
-
dragonn.simulations.
simulate_motif_density_localization
(motif_name, seq_length, center_size, min_motif_counts, max_motif_counts, num_pos, num_neg, GC_fraction)[source]¶ - Simulates two classes of seqeuences:
- Positive class sequences with multiple motif instances in center of the sequence.
- Negative class sequences with multiple motif instances anywhere in the sequence.
The number of motif instances is uniformly sampled between minimum and maximum motif counts.
- motif_name : str
- encode motif name
- seq_length : int
- length of sequence
- center_size : int
- length of central part of the sequence where motifs can be positioned
- min_motif_counts : int
- minimum number of motif instances
- max_motif_counts : int
- maximum number of motif instances
- num_pos : int
- number of positive class sequences
- num_neg : int
- number of negative class sequences
- GC_fraction : float
- GC fraction in background sequence
- sequence_arr : 1darray
- Contains sequence strings.
- y : 1darray
- Contains labels.
-
dragonn.simulations.
simulate_multi_motif_embedding
(motif_names, seq_length, min_num_motifs, max_num_motifs, num_seqs, GC_fraction)[source]¶ Generates data for multi motif recognition task. Parameters ———- motif_names : list
List of strings.seq_length : int min_num_motifs : int max_num_motifs : int num_seqs : int GC_fraction : float Returns ——- sequence_arr : 1darray
Contains sequence strings.- y : ndarray
- Contains labels for each motif.
-
dragonn.simulations.
simulate_single_motif_detection
(motif_name, seq_length, num_pos, num_neg, GC_fraction)[source]¶ - Simulates two classes of seqeuences:
- Positive class sequence with a motif embedded anywhere in the sequence
- Negative class sequence without the motif
- motif_name : str
- encode motif name
- seq_length : int
- length of sequence
- num_pos : int
- number of positive class sequences
- num_neg : int
- number of negative class sequences
- GC_fraction : float
- GC fraction in background sequence
- sequence_arr : 1darray
- Array with sequence strings.
- y : 1darray
- Array with positive/negative class labels.