simdna package¶
simdna.simulations module¶
- 
simdna.simulations.get_distribution(GC_fraction)¶
- 
simdna.simulations.motif_density(motif_name, seq_length, num_seqs, min_counts, max_counts, GC_fraction, central_bp=None)¶
- returns sequences with motif density. 
- 
simdna.simulations.simple_motif_embedding(motif_name, seq_length, num_seqs, GC_fraction)¶
- returns sequence array 
- 
simdna.simulations.simulate_differential_accessibility(pos_motif_names, neg_motif_names, seq_length, min_num_motifs, max_num_motifs, num_pos, num_neg, GC_fraction)¶
- Generates data for differential accessibility task. - Parameters: - pos_motif_names (list) – List of strings.
- neg_motif_names (list) – List of strings.
- seq_length (int) –
- min_num_motifs (int) –
- max_num_motifs (int) –
- num_pos (int) –
- num_neg (int) –
- GC_fraction (float) –
 - Returns: - sequence_arr (1darray) – Contains sequence strings.
- y (1darray) – Contains labels.
 
- 
simdna.simulations.simulate_heterodimer_grammar(motif1, motif2, seq_length, min_spacing, max_spacing, num_pos, num_neg, GC_fraction)¶
- Simulates two classes of sequences with motif1 and motif2:
- Positive class sequences with motif1 and motif2 positioned min_spacing and max_spacing
- Negative class sequences with independent motif1 and motif2 positioned
 - anywhere in the sequence, not as a heterodimer grammar 
 - Parameters: - seq_length (int, length of sequence) –
- GC_fraction (float, GC fraction in background sequence) –
- num_pos (int, number of positive class sequences) –
- num_neg (int, number of negatice class sequences) –
- motif1 (str, encode motif name) –
- motif2 (str, encode motif name) –
- min_spacing (int, minimum inter motif spacing) –
- max_spacing (int, maximum inter motif spacing) –
 - Returns: - sequence_arr (1darray) – Array with sequence strings.
- y (1darray) – Array with positive/negative class labels.
 
- 
simdna.simulations.simulate_motif_counting(motif_name, seq_length, pos_counts, neg_counts, num_pos, num_neg, GC_fraction)¶
- Generates data for motif counting task. - Parameters: - motif_name (str) –
- seq_length (int) –
- pos_counts (list) – (min_counts, max_counts) for positive set.
- neg_counts (list) – (min_counts, max_counts) for negative set.
- num_pos (int) –
- num_neg (int) –
- GC_fraction (float) –
 - Returns: - sequence_arr (1darray) – Contains sequence strings.
- y (1darray) – Contains labels.
 
- 
simdna.simulations.simulate_motif_density_localization(motif_name, seq_length, center_size, min_motif_counts, max_motif_counts, num_pos, num_neg, GC_fraction)¶
- Simulates two classes of seqeuences:
- Positive class sequences with multiple motif instances in center of the sequence.
- Negative class sequences with multiple motif instances anywhere in the sequence.
 
 - The number of motif instances is uniformly sampled between minimum and maximum motif counts. - Parameters: - motif_name (str) – encode motif name
- seq_length (int) – length of sequence
- center_size (int) – length of central part of the sequence where motifs can be positioned
- min_motif_counts (int) – minimum number of motif instances
- max_motif_counts (int) – maximum number of motif instances
- num_pos (int) – number of positive class sequences
- num_neg (int) – number of negative class sequences
- GC_fraction (float) – GC fraction in background sequence
 - Returns: - sequence_arr (1darray) – Contains sequence strings.
- y (1darray) – Contains labels.
 
- 
simdna.simulations.simulate_multi_motif_embedding(motif_names, seq_length, min_num_motifs, max_num_motifs, num_seqs, GC_fraction)¶
- Generates data for multi motif recognition task. - Parameters: - motif_names (list) – List of strings.
- seq_length (int) –
- min_num_motifs (int) –
- max_num_motifs (int) –
- num_seqs (int) –
- GC_fraction (float) –
 - Returns: - sequence_arr (1darray) – Contains sequence strings.
- y (ndarray) – Contains labels for each motif.
 
- 
simdna.simulations.simulate_single_motif_detection(motif_name, seq_length, num_pos, num_neg, GC_fraction)¶
- Simulates two classes of seqeuences:
- Positive class sequence with a motif embedded anywhere in the sequence
- Negative class sequence without the motif
 
 - Parameters: - motif_name (str) – encode motif name
- seq_length (int) – length of sequence
- num_pos (int) – number of positive class sequences
- num_neg (int) – number of negative class sequences
- GC_fraction (float) – GC fraction in background sequence
 - Returns: - sequence_arr (1darray) – Array with sequence strings.
- y (1darray) – Array with positive/negative class labels.
 
simdna.synthetic module¶
- 
class simdna.synthetic.AbstractApplySingleMutationFromSet(setOfMutations, name=None)¶
- Bases: - simdna.synthetic.AbstractTransformation- Class for applying a single mutation from a set of mutations; used to transform substrings generated by another method - Parameters: - setOfMutations – instance of AbstractSetOfMutations
- name – see DefaultNameMixin.
 - 
getJsonableObject()¶
- See superclass. 
 - 
selectMutation()¶
- Chooses a mutation from the set of mutations to apply. - Returns: - an instance of - Mutation
 - 
transform(stringArr)¶
- See superclass. 
 
- setOfMutations – instance of 
- 
class simdna.synthetic.AbstractBackgroundGenerator¶
- Bases: - object- Returns the sequence that - AbstractEmbeddableobjects are to be embedded into.- 
generateBackground()¶
- Returns a sequence that is the background. 
 - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 
- 
- 
class simdna.synthetic.AbstractEmbeddable¶
- Bases: - object- Represents a thing which can be embedded. - An - AbstractEmbeddable+ a position = an- Embedding- 
canEmbed(priorEmbeddedThings, startPos)¶
- Checks whether embedding is possible at a given pos. - Accepts an instance of - AbstractPriorEmbeddedThingsand a- startPos, and checks if- startPosis viable given the contents of- priorEmbeddedThings.- Parameters: - priorEmbeddedThings – instance of startPos: int; the position you are considering embedding self at- Returns: - A boolean indicating whether self can be embedded at startPos, - given the things that have already been embedded. 
 - 
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)¶
- Embed self in a background string. - Will embed self at - startPosin- backgroundStringArr, and will update- priorEmbeddedThingsaccordingly.- Parameters: - priorEmbeddedThings – instance of - AbstractPriorEmbeddedThings
- backgroundStringArr: an array of characters representing
- the background
- startPos: integer; the position to embed self at
 
 - 
classmethod fromString(theString)¶
- Generate an instance of the embeddable from the provided string. 
 - 
getDescription()¶
- Return a concise description of the embeddable. - This should be concise and shouldn’t contain spaces. It will often be used when generating the __str__ representation of the embedabled. 
 
- 
- 
class simdna.synthetic.AbstractEmbeddableGenerator(name)¶
- Bases: - simdna.synthetic.DefaultNameMixin- Generates an embeddable, usually for embedding in a background sequence. - 
generateEmbeddable()¶
- Generate an embeddable object. - Returns: - An instance of - AbstractEmbeddable
 - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 
- 
- 
class simdna.synthetic.AbstractEmbedder(name)¶
- Bases: - simdna.synthetic.DefaultNameMixin- Produces - AbstractEmbeddableobjects and embeds them in a sequence.- 
embed(backgroundStringArr, priorEmbeddedThings, additionalInfo=None)¶
- Embeds things in the provided - backgroundStringArr.- Modifies backgroundStringArr to include whatever has been embedded. - Parameters: - backgroundStringArr – array of characters representing the background string
- priorEmbeddedThings – instance of        AbstractPriorEmbeddedThings
- additionalInfo – instance of AdditionalInfo; allows the embedder to send back info about what it did
 - Returns: - The modifed - backgroundStringArr
 - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 
- 
- 
class simdna.synthetic.AbstractLoadedMotifs(fileName, pseudocountProb=0.0, background=OrderedDict([('A', 0.27), ('C', 0.23), ('G', 0.23), ('T', 0.27)]))¶
- Bases: - object- Class representing loaded PWMs. - A class that contains instances of - pwm.PWMloaded from a file. The pwms can be accessed by name.- Parameters: - fileName – string, the path to the file to load
- pseudocountProb – if some of the pwms have 0 probability for    some of the positions, will add the specified pseudocountProbto the rows of the pwm and renormalise.
- background – a dictionary with ACGT as the keys and the frequency as    the values. Defaults to util.DEFAULT_BACKGROUND_FREQ
 - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 - 
getPwm(name)¶
- Get a specific PWM. - Returns: - The - pwm.PWMinstance with the specified name.
 - 
getReadPwmAction(recordedPwms)¶
- Action performed when each line of the pwm text file is read in. - This function is to be overridden by a specific implementation. It is executed on each line of the file when it is read in, and when PWMs are ready they will get inserted into - recordedPwms.- Parameters: - recordedPwms – an - OrderedDictthat will be filled with PWMs.- The keys will be the names of the PWMs and the values will be instances of - pwm.PWM
 
- 
class simdna.synthetic.AbstractPositionGenerator(name)¶
- Bases: - simdna.synthetic.DefaultNameMixin- Generate a start position at which to embed something - Given the length of the background sequence and the length of the substring you are trying to embed, will return a start position to embed the substring at. - 
generatePos(lenBackground, lenSubstring, additionalInfo=None)¶
- Generate the position to embed in. - Parameters: - lenBackground – int, length of background sequence
- lenSubstring – int, lenght of substring to embed
- additionalInfo – optional, instance of AdditionalInfo. Is used to leave a trace that this positionGenerator was called
 - Returns: - An integer which is the start index to embed in. 
 - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 
- 
- 
class simdna.synthetic.AbstractPriorEmbeddedThings¶
- Bases: - object- Keeps track of what has already been embedded in a sequence. - 
addEmbedding(startPos, what)¶
- Records the embedding of a - AbstractEmbeddable.- Embeds - whatfrom- startPosto- startPos+len(what). Creates an- Embeddingobject.- Parameters: - startPos – int, the starting position at which to embed.
- what – instance of AbstractEmbeddable
 
 - 
canEmbed(startPos, endPos)¶
- Test whether startPos-endPos is available for embedding. - Parameters: - startPos – int, starting index
- endPos – int, ending index+1 (same semantics as array-slicing)
 - Returns: - endPos is available for embedding - Return type: - True if startPos 
 - 
getEmbeddings()¶
- Returns: - A collection of Embedding objects 
 - 
getNumOccupiedPos()¶
- Returns: - Number of posiitons that are filled with some kind of embedding 
 - 
getTotalPos()¶
- Returns: - Total number of positions (occupied and unoccupoed) available - to embed things in. 
 
- 
- 
class simdna.synthetic.AbstractQuantityGenerator(name)¶
- Bases: - simdna.synthetic.DefaultNameMixin- Class for sampling values from a distribution. - 
generateQuantity()¶
- Sample a quantity from a distribution. - Returns: - The sampled value. 
 - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 
- 
- 
class simdna.synthetic.AbstractSequenceSetGenerator¶
- Bases: - object- A generator for a collection of generated sequences. - 
generateSequences()¶
- The generator; implementation should have a yield. - Called as - generatedSequences = sequenceSetGenerator.generateSequences()- generateSequencescan then be iterated over.- Returns: - A generator of GeneratedSequence objects 
 - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 
- 
- 
class simdna.synthetic.AbstractSetOfMutations(mutationsArr)¶
- Bases: - object- Represents a collection of - Mutationobjects.- Parameters: - mutationsArr – array of - Mutationobjects- 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 - 
getMutationsArr()¶
- Returns - self.mutationsArr- Returns: - self.mutationsArr
 
- 
- 
class simdna.synthetic.AbstractSingleSequenceGenerator(namePrefix=None)¶
- Bases: - object- Generate a single sequence. - Parameters: - namePrefix – the GeneratedSequence object has a field for the object’s name; this is the prefix associated with that name. The suffix is the value of a counter that is incremented every time - 
generateSequence()¶
- Generate the sequence. - Returns: - An instance of - GeneratedSequence
 - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 
- 
- 
class simdna.synthetic.AbstractSubstringGenerator(name)¶
- Bases: - simdna.synthetic.DefaultNameMixin- Generates a substring, usually for embedding in a background sequence. - 
generateSubstring()¶
- Returns: - A tuple of - (string, stringDescription); the result can be- wrapped in an instance of - StringEmbeddable.- stringDescriptionis a short descriptor that does not contain spaces and may be prefixed in front of string when generating the __str__ representation for- StringEmbeddable.
 - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 
- 
- 
class simdna.synthetic.AbstractTransformation(name)¶
- Bases: - simdna.synthetic.DefaultNameMixin- Class representing a transformation applied to a character array. - Takes an array of characters, applies some transformation. - 
getJsonableObject()¶
- Get JSON object representation. - Returns: - A json-friendly object (built of dictionaries, lists and - python primitives), which can be converted to json to record the exact details of what was simualted. 
 - 
transform(stringArr)¶
- Applies a transformation to stringArr. - Parameters: - stringArr – an array of characters. - Returns: - An array of characters that has the transformation applied. - May mutate - stringArr
 
- 
- 
class simdna.synthetic.AdditionalInfo¶
- Bases: - object- Used to keep track of which embedders/ops were called and how many times. - An instance of AdditionalInfo is meant to be an attribute of
- a GeneratedSequenceobject. It keeps track of things like embedders, position generators, etc.
- Has self.trace which is a dictionary from operatorName->int
- and which records operations that were called in the process of embedding things in the sequence. At the time of writing, operatorName is typically just the name of the embedder.
 - 
isInTrace(operatorName)¶
- Return True if operatorName has been called on the sequence. 
 - 
updateAdditionalInfo(operatorName, value)¶
- Can be used to store any additional information on operatorName. 
 - 
updateTrace(operatorName)¶
- Increment count for the number of times operatorName was called. 
 
- 
class simdna.synthetic.AllEmbedders(embedders, name=None)¶
- Bases: - simdna.synthetic.AbstractEmbedder- Wrapper around a list of embedders that calls each one in turn. - Useful to nest under a - RandomSubsetOfEmbedders- Parameters: - embedders – an iterable of - AbstractEmbedderobjects.- 
getJsonableObject()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.BernoulliQuantityGenerator(prob, name=None)¶
- Bases: - simdna.synthetic.AbstractQuantityGenerator- Generates 1 or 0 according to a bernoulli distribution. - Parameters: - prob – probability of 1 - 
generateQuantity()¶
- See superclass. 
 - 
getJsonableObject()¶
- See sueprclass. 
 
- 
- 
class simdna.synthetic.BestHitPwm(pwm, bestHitMode='pwmProb', name=None)¶
- Bases: - simdna.synthetic.AbstractSubstringGenerator- Always return the best possible match to a - pwm.PWMwhen called.- Parameters: - pwm – an instance of pwm.PWM
- bestHitMode – one of the values in pwm.BEST_HIT_MODE. IfpwmProb
 - then the best match will be determined according what is most likely to be sampled from the pwm matrix (this is the default). If - logOdds, then the best match will be determined according to what would result in the best match according to the log-odds matrix (so, taking the background into account).name: see- DefaultNameMixin- 
generateSubstring()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- pwm – an instance of 
- 
class simdna.synthetic.BestHitPwmFromLoadedMotifs(loadedMotifs, motifName, bestHitMode='pwmProb', name=None)¶
- Bases: - simdna.synthetic.BestHitPwm- Instantiates - BestHitPwmusing a- LoadedMotifsfile. Analogous to- PwmSamplerFromLoadedMotifs.- 
getJsonableObject()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.ChooseMutationAtRandom(setOfMutations, name=None)¶
- Bases: - simdna.synthetic.AbstractApplySingleMutationFromSet- Selects a mutation at random from self.setOfMutations to apply. - 
selectMutation()¶
 
- 
- 
class simdna.synthetic.ChooseValueFromASet(setOfPossibleValues, name=None)¶
- Bases: - simdna.synthetic.AbstractQuantityGenerator- Randomly samples a particular value from a set of values. - Parameters: - setOfPossibleValues – array of values that will be randomly sampled - from. name: see- DefaultNameMixin.- 
generateQuantity()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.DefaultNameMixin(name)¶
- Bases: - object- Basic functionality for classes that have a self.name attribute. - The self.name attribute is typically used to leave a trace in an instance of - AdditionalInfo- Parameters: - name – string - 
getDefaultName()¶
 
- 
- 
class simdna.synthetic.EmbedInABackground(backgroundGenerator, embedders, namePrefix=None)¶
- Bases: - simdna.synthetic.AbstractSingleSequenceGenerator- Generate a background sequence and embed smaller sequences in it. - Takes a backgroundGenerator and a series of embedders. Will generate the background and then call each of the embedders in succession. Then returns the result. - Parameters: - backgroundGenerator – instance of
AbstractBackgroundGenerator
- embedders – array of instances of AbstractEmbedder
- namePrefix – see parent
 - 
generateSequence()¶
- Produce the sequence. - Generates a background using self.backgroundGenerator, splits it into an array, and passes it to each of self.embedders in turn for embedding things. - Returns: - An instance of - GeneratedSequence
 - 
getJsonableObject()¶
- See superclass. 
 
- backgroundGenerator – instance of
- 
class simdna.synthetic.EmbeddableEmbedder(embeddableGenerator, positionGenerator=<simdna.synthetic.UniformPositionGenerator object>, name=None)¶
- Bases: - simdna.synthetic.AbstractEmbedder- Embeds an instance of - AbstractEmbeddableat a sampled pos.- Embeds instances of - AbstractEmbeddablewithin the background sequence, at a position sampled from a distribution. Only embeds at unoccupied positions.- Parameters: - embeddableGenerator – instance of AbstractEmbeddableGenerator
- positionGenerator – instance of AbstractPositionGenerator
 - 
getJsonableObject()¶
- See superclass. 
 
- embeddableGenerator – instance of 
- 
class simdna.synthetic.Embedding(what, startPos)¶
- Bases: - object- Represents something that has been embedded in a sequence. - Think of this as a combination of an embeddable + a start position. - Parameters: - what – object representing the thing that has been embedded.            Should have`` __str__`` and __len__defined. Often is an instance ofAbstractEmbeddable
- startPos – int, the position relative to the start of the parent sequence at which seq has been embedded
 - 
classmethod fromString(string, whatClass=None)¶
- Recreate an - Embeddingobject from a string.- Parameters: - string – assumed to have format:                description[-|_]startPos[-|_]whatString, wherewhatStringwill be provided towhatClass
- whatClass – the class (usually a AbstractEmbeddable) that will be used to instantiate the what from the whatString
 - Returns: - The Embedding class called with - what=whatClass.fromString(whatString)and- startPos=int(startPos)
- string – assumed to have format:                
 
- what – object representing the thing that has been embedded.            Should have`` __str__`` and 
- 
class simdna.synthetic.FixedQuantityGenerator(quantity, name=None)¶
- Bases: - simdna.synthetic.AbstractQuantityGenerator- Returns a fixed number every time generateQuantity is called. - Parameters: - quantity – the value to return when generateQuantity is called. - 
generateQuantity()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.FixedSubstringGenerator(fixedSubstring, name=None)¶
- Bases: - simdna.synthetic.AbstractSubstringGenerator- Generates the same string every time. - When generateSubstring() is called, always returns the same string. The string also serves as its own description - Parameters: - fixedSubstring – the string to be generated
- name – see DefaultNameMixin
 - 
generateSubstring()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- 
class simdna.synthetic.GenerateSequenceNTimes(singleSetGenerator, N)¶
- Bases: - simdna.synthetic.AbstractSequenceSetGenerator- Call a - AbstractSingleSequenceGeneratorN times.- Parameters: - singleSetGenerator – an instance of
AbstractSequenceSetGenerator
- N – integer, the number of times to call singleSetGenerator
 - 
generateSequences()¶
- A generator that calls self.singleSetGenerator N times. - Returns: - a generator that will call self.singleSetGenerator N times. 
 - 
getJsonableObject()¶
- See superclass. 
 
- singleSetGenerator – an instance of
- 
class simdna.synthetic.GeneratedSequence(seqName, seq, embeddings, additionalInfo)¶
- Bases: - object- An object representing a sequence that has been generated. - Parameters: - seqName – string representing the name/id of the sequence
- seq – string representing the final generated sequence
- embeddings – an array of Embeddingobjects.
- additionalInfo – an instance of AdditionalInfo
 
- 
class simdna.synthetic.InsideCentralBp(centralBp, name=None)¶
- Bases: - simdna.synthetic.AbstractPositionGenerator- For embedding within only the central region of a background. - Returns a position within the central region of a background
- sequence, sampled uniformly at random
 - Parameters: - centralBp – int, the number of bp, centered in the middle of the background, from which to sample the position. Is NOT +/- centralBp around the middle (is +/- centralBp/2 around the middle). If the background sequence is even and centralBp is odd, the shorter region will go on the left.
- name – string - see DefaultNameMixin
 - 
getJsonableObject()¶
- See superclass. 
 
- 
class simdna.synthetic.IsInTraceLabelGenerator(labelNames)¶
- Bases: - simdna.synthetic.LabelGenerator- LabelGenerator where labels match which embedders are called. - A special kind of LabelGenerator where the names of the labels
- are the names of embedders, and the label is 1 if a particular embedder has been called on the sequence and 0 otherwise.
 
- 
class simdna.synthetic.LabelGenerator(labelNames, labelsFromGeneratedSequenceFunction)¶
- Bases: - object- Generate labels for a generated sequence. - Parameters: - labelNames – an array of strings that are the names of the labels
- labelsFromGeneratedSequenceFunction – function that accepts
an instance of GeneratedSequenceand returns an array of the labels (eg: an array of ones and zeros indicating if the criteria for various labels are met)
 - 
generateLabels(generatedSequence)¶
- calls self.labelsFromGeneratedSequenceFunction. - Parameters: - generatedSequence – an instance of - GeneratedSequence
 
- 
class simdna.synthetic.LoadedEncodeMotifs(fileName, pseudocountProb=0.0, background=OrderedDict([('A', 0.27), ('C', 0.23), ('G', 0.23), ('T', 0.27)]))¶
- Bases: - simdna.synthetic.AbstractLoadedMotifs- A class for reading in a motifs file in the ENCODE motifs format. - This class is specifically for reading files in the encode motif format - specifically the motifs.txt file that contains Pouya’s motifs (http://compbio.mit.edu/encode-motifs/motifs.txt) - Basically, the motif declarations start with a >, the first characters after > until the first space are taken as the motif name, the lines after the line with a > have the format: “<ignored character> <prob of A> <prob of C> <prob of G> <prob of T>” - 
getReadPwmAction(recordedPwms)¶
- See superclass. 
 
- 
- 
class simdna.synthetic.MinMaxWrapper(quantityGenerator, theMin=None, theMax=None, name=None)¶
- Bases: - simdna.synthetic.AbstractQuantityGenerator- Compress a distribution to lie within a min and a max. - Wrapper that restricts a distribution to only return values between the min and the max. If a value outside the range is returned, resamples until it obtains a value within the range. Warns every time it tries to resample 10 times without successfully finding a value in the correct range. - Parameters: - quantityGenerator – instance of - AbstractQuantityGenerator.- Used to draw samples from the distribution to truncate - theMin: can be None; if so will be ignored. - theMax: can be None; if so will be ignored. - 
generateQuantity()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.Mutation(index, previous, new, parentLength=None)¶
- Bases: - object- Represent a single bp mutation in a motif sequence. - Useful for creating simulations involving SNPs. - Parameters: - index – the position idx within the motif of the mutation
- previous – character, the previous base at this position
- new – character, the new base at this position after the mutation
- parentLength – optional; length of the motif. Used for assertion checks.
 - 
applyMutation(stringArr)¶
- Set the base at the position of the mutation to the mutated value. - Modifies stringArr which is an array of characters. - Parameters: - stringArr – an array of characters, which gets modified. 
 - 
parentLengthAssertionCheck(stringArr)¶
- Checks that stringArr is consistent with parentLength if defined. 
 - 
revert(stringArr)¶
- Set the base at the position of the mutation to the unmutated value. - Modifies stringArr which is an array of characters. - Parameters: - stringArr – an array of characters, which gets modified. 
 
- 
class simdna.synthetic.OutsideCentralBp(centralBp, name=None)¶
- Bases: - simdna.synthetic.AbstractPositionGenerator- For embedding only OUTSIDE a central region of a background seq. - Returns a position OUTSIDE the central region of a background sequence,
- sampled uniformly at random. Complement of InsideCentralBp.
 - Parameters: - centralBp – int, the centralBp to avoid embedding in. See the docs for - InsideCentralBpfor more details (this is the complement).- 
getJsonableObject()¶
- See superclass. 
 
- 
class simdna.synthetic.PairEmbeddable(embeddable1, embeddable2, separation, embeddableDescription='', nothingInBetween=True)¶
- Bases: - simdna.synthetic.AbstractEmbeddable- Embed two embeddables with some separation. - Parameters: - embeddable1 – instance of AbstractEmbeddable. First embeddable to be embedded. If a string is provided, will be wrapped inStringEmbeddable
- embeddable2 – second embeddable to be embedded. Type information        similar to that of embeddable1
- separation – int of distance separating embeddable1 and embeddable2
- embeddableDescription – a concise descriptive string prefixed in front when generating a __str__ representation of the embeddable. Should not contain a hyphen.
- nothingInBetween – if true, then nothing else is allowed to be embedded in the gap between embeddable1 and embeddable2.
 - 
canEmbed(priorEmbeddedThings, startPos)¶
- See superclass. 
 - 
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)¶
- See superclass. - If - self.nothingInBetween, then all the intervening positions between the two embeddables will be marked as occupied. Otherwise, only the positions occupied by the embeddables will be marked as occupied.
 - 
getDescription()¶
- See superclass. 
 
- embeddable1 – instance of 
- 
class simdna.synthetic.PairEmbeddableGenerator(embeddableGenerator1, embeddableGenerator2, separationGenerator, name=None)¶
- Bases: - simdna.synthetic.AbstractEmbeddableGenerator- Embed a pair of embeddables with some separation. - Parameters: - emeddableGenerator1 – instance of - AbstractEmbeddableGenerator. If an- :param - AbstractSubstringGeneratoris provided, will be wrapped in an instance of- SubstringEmbeddableGenerator.: :param embeddableGenerator2: same type information as for- embeddableGenerator1:param separationGenerator: instance of- AbstractQuantityGenerator:param name: string, see- DefaultNameMixin- 
generateEmbeddable()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.PoissonQuantityGenerator(mean, name=None)¶
- Bases: - simdna.synthetic.AbstractQuantityGenerator- Generates values according to a poisson distribution. - Parameters: - mean – the mean of the poisson distribution - 
generateQuantity()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.PriorEmbeddedThings_numpyArrayBacked(seqLen)¶
- Bases: - simdna.synthetic.AbstractPriorEmbeddedThings- A numpy-array based implementation of - AbstractPriorEmbeddedThings.- Uses a numpy array where positions are set to 1 if they are occupied, to determine which positions are occupied and which are not. See superclass for more documentation. - Parameters: - seqLen – integer indicating length of the sequence you are embedding in - 
addEmbedding(startPos, what)¶
- See superclass. 
 - 
canEmbed(startPos, endPos)¶
- See superclass. 
 - 
getEmbeddings()¶
- See superclass. 
 - 
getNumOccupiedPos()¶
- See superclass. 
 - 
getTotalPos()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.PwmSampler(pwm, name=None)¶
- Bases: - simdna.synthetic.AbstractSubstringGenerator- Samples from a pwm by calling - self.pwm.sampleFromPwm- Parameters: - pwm – an instance of pwm.PWM
- name – see DefaultNameMixin
 - 
generateSubstring()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- pwm – an instance of 
- 
class simdna.synthetic.PwmSamplerFromLoadedMotifs(loadedMotifs, motifName, name=None)¶
- Bases: - simdna.synthetic.PwmSampler- Instantiates a - PwmSamplerfrom a- LoadedEncodeMotifsfile.- Convenience wrapper class for instantiating - PwmSamplerby pulling the pwm.PWM object using the provided name from an- AbstractLoadedMotifsobject- Parameters: - loadedMotifs – instance of AbstractLoadedMotifs
- motifName – string, name of a motif in AbstractLoadedMotifs
- name – see DefaultNameMixin
 - 
getJsonableObject()¶
- See superclass. 
 
- loadedMotifs – instance of 
- 
class simdna.synthetic.RandomSubsetOfEmbedders(quantityGenerator, embedders, name=None)¶
- Bases: - simdna.synthetic.AbstractEmbedder- Call some random subset of supplied embedders. - Takes a quantity generator that generates a quantity of embedders, and executes that many embedders from a supplied set, in sequence - Parameters: - quantityGenerator – instance of AbstractQuantityGenerator
- embedders – a list of AbstractEmbedderobjects
 - 
getJsonableObject()¶
- See superclass. 
 
- quantityGenerator – instance of 
- 
class simdna.synthetic.RepeatedEmbedder(embedder, quantityGenerator, name=None)¶
- Bases: - simdna.synthetic.AbstractEmbedder- Call an embedded multiple times. - Wrapper around an embedder to call it multiple times according to samples from a distribution. First calls - self.quantityGeneratorto get the quantity, then calls- self.embeddera number of times equal to the value returned.- Parameters: - embedder – instance of AbstractEmbedder
- quantityGenerator – instance of AbstractQuantityGenerator
 - 
getJsonableObject()¶
- See superclass. 
 
- embedder – instance of 
- 
class simdna.synthetic.RepeatedSubstringBackgroundGenerator(substringGenerator, repetitions)¶
- Bases: - simdna.synthetic.AbstractBackgroundGenerator- Repeatedly call a substring generator and concatenate the result. - Can be used to generate variable-length sequences. - Parameters: - substringGenerator – instance of AbstractSubstringGenerator
- repetitions – instance of AbstractQuantityGenerator. If pass an int, will create aFixedQuantityGeneratorfrom the int. This will be called to determine the number of times to generate a substring fromself.substringGenerator
 - Returns: - The concatenation of all the calls to - self.substringGenerator- 
generateBackground()¶
 - 
getJsonableObject()¶
- See superclass. 
 
- substringGenerator – instance of 
- 
class simdna.synthetic.ReverseComplementWrapper(substringGenerator, reverseComplementProb=0.5, name=None)¶
- Bases: - simdna.synthetic.AbstractSubstringGenerator- Reverse complements a string with a specified probability. - Wrapper around an instance of - AbstractSubstringGeneratorthat reverse complements the generated string with a specified probability.- Parameters: - substringGenerator – instance of .AbstractSubstringGenerator
- reverseComplementProb – probability of reverse complementation.
 - Defaults to 0.5. name: see- DefaultNameMixin.- 
generateSubstring()¶
 - 
getJsonableObject()¶
- See superclass. 
 
- 
class simdna.synthetic.RevertToReference(setOfMutations, name=None)¶
- Bases: - simdna.synthetic.AbstractTransformation- For a series of mutations, reverts the supplied character to the reference (“unmutated”) string. - Parameters: - setOfMutations – instance of AbstractSetOfMutations
- name – see DefaultNameMixin.
 - 
getJsonableObject()¶
- See superclass. 
 - 
transform(stringArr)¶
- See superclass. 
 
- 
class simdna.synthetic.SampleFromDiscreteDistributionSubstringGenerator(discreteDistribution)¶
- Bases: - simdna.synthetic.AbstractSubstringGenerator- Generate a substring by sampling from a distribution. - If the “substrings” are single characters (A/C/G/T), can be used in conjunction with - RepeatedSubstringBackgroundGeneratorto generate sequences with a certain GC content.- Parameters: - discreteDistribution – instance of - util.DiscreteDistribution- 
generateSubstring()¶
 - 
getJsonableObject()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.StringEmbeddable(string, stringDescription='')¶
- Bases: - simdna.synthetic.AbstractEmbeddable- A string that is to be embedded in a background. - Represents a string (such as a sampling from a pwm) that is to be embedded in a background. See docs for superclass. - Parameters: - string – the core string to be embedded
- stringDescription – a short descriptor prefixed before the        __str__representation of the embeddable. Should not contain a hyphen. Defaults to “”.
 - 
canEmbed(priorEmbeddedThings, startPos)¶
- See superclass. 
 - 
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)¶
- See superclass. 
 - 
classmethod fromString(theString)¶
- Generates a StringEmbeddable from the provided string. - Parameters: - theString – string of the format - stringDescription-coreString. Will then return:- StringEmbeddable(string=coreString, stringDescription=stringDescription)- Returns: - An instance of - StringEmbeddable
 - 
getDescription()¶
- See superclass. 
 
- 
class simdna.synthetic.SubstringEmbeddableGenerator(substringGenerator, name=None)¶
- Bases: - simdna.synthetic.AbstractEmbeddableGenerator- Generates a - StringEmbeddable- Calls - substringGenerator, wraps the result in a- StringEmbeddableand returns it.- Parameters: - substringGenerator – instance of - AbstractSubstringGenerator- 
generateEmbeddable()¶
 - 
getJsonableObject()¶
- See superclass. 
 
- 
- 
class simdna.synthetic.SubstringEmbedder(substringGenerator, positionGenerator=<simdna.synthetic.UniformPositionGenerator object>, name=None)¶
- Bases: - simdna.synthetic.EmbeddableEmbedder- Used to embed substrings. - Embeds a single generated substring within the background sequence, at a position sampled from a distribution. Only embeds at unoccupied positions - Parameters: - substringGenerator – instance of AbstractSubstringGenerator
- positionGenerator – instance of AbstractPositionGenerator
- name – see DefaultNameMixin.
 
- substringGenerator – instance of 
- 
class simdna.synthetic.TransformedSubstringGenerator(substringGenerator, transformations, transformationsDescription='transformations', name=None)¶
- Bases: - simdna.synthetic.AbstractSubstringGenerator- Generates a substring and applies a series of transformations. - Takes a substringGenerator and a set of AbstractTransformation objects, applies the transformations to the generated substring - Parameters: - substringGenerator – instance of AbstractSubstringGenerator
- transformations – an iterable of AbstractTransformation
- transformationsDescription – a string that will be prefixed in        front of substringDescription(generated bysubstringGenerator.generateSubstring())to produce thestringDescription.
- name – see DefaultNameMixin.
 - 
generateSubstring()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- substringGenerator – instance of 
- 
class simdna.synthetic.UniformIntegerGenerator(minVal, maxVal, name=None)¶
- Bases: - simdna.synthetic.AbstractQuantityGenerator- Randomly samples an integer from minVal to maxVal, inclusive. - Parameters: - minVal – minimum integer that can be sampled
- maxVal – maximum integers that can be sampled
- name – See superclass.
 - 
generateQuantity()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- 
class simdna.synthetic.UniformPositionGenerator(name=None)¶
- Bases: - simdna.synthetic.AbstractPositionGenerator- Sample position uniformly at random. - Samples a start position to embed the substring in uniformly at random;
- does not return positions that are too close to the end of the background sequence to embed the full substring.
 - Parameters: - name – string, see - DefaultNameMixin- 
getJsonableObject()¶
- See superclass. 
 
- 
class simdna.synthetic.XOREmbedder(embedder1, embedder2, probOfFirst, name=None)¶
- Bases: - simdna.synthetic.AbstractEmbedder- Calls exactly one of the supplied embedders. - Parameters: - embedder1 – instance of AbstractEmbedder
- embedder2 – instance of AbstractEmbedder
- probOfFirst – probability of calling the first embedder
 - 
getJsonableObject()¶
- See superclass. 
 
- embedder1 – instance of 
- 
class simdna.synthetic.ZeroInflater(quantityGenerator, zeroProb, name=None)¶
- Bases: - simdna.synthetic.AbstractQuantityGenerator- Inflate a particular distribution with zeros. - Wrapper that inflates the number of zeros returned. Flips a coin; if positive, will return zero - otherwise will sample from the wrapped distribution (which may still return 0) - Parameters: - quantityGenerator – an instance of AbstractQuantityGenerator; represents the distribution to sample from with probability1-zeroProb
- zeroProb – the probability of just returning 0 without sampling    from quantityGenerator
- name – see DefaultNameMixin.
 - 
generateQuantity()¶
- See superclass. 
 - 
getJsonableObject()¶
- See superclass. 
 
- quantityGenerator – an instance of 
- 
class simdna.synthetic.ZeroOrderBackgroundGenerator(seqLength, discreteDistribution=<simdna.util.DiscreteDistribution object>)¶
- Bases: - simdna.synthetic.RepeatedSubstringBackgroundGenerator- Returns a sequence with a certain GC content. - Each base is sampled independently. - Parameters: - seqLength – int, length of the background
- discreteDistribution – instance of util.DiscreteDistribution`, defaults to ``util.DEFAULT_BASE_DISCRETE_DISTRIBUTION
 
- 
simdna.synthetic.getEmbeddingsFromString(string)¶
- Get a series of - Embeddingobjects from a string.- Splits the string on commas, and then passes the comma-separated vals
- to Embedding.fromString()
 - Parameters: - string – The string to turn into an array of Embedding objects - Returns: - an array of - Embeddingobjects
- 
simdna.synthetic.printSequences(outputFileName, sequenceSetGenerator, includeEmbeddings=False, labelGenerator=None, includeFasta=False, prefix=None)¶
- Print a series of synthetic sequences. - Given an output filename, and an instance of
- AbstractSequenceSetGenerator, will call the sequenceSetGenerator and print the generated sequences to the output file. Will also create a file “info_outputFileName.txt” in the same directory as outputFileName that contains all the information about sequenceSetGenerator.
 - Parameters: - outputFileName – string
- sequenceSetGenerator – instance of
AbstractSequenceSetGenerator
- includeEmbeddings – a boolean indicating whether to print a column that lists the embeddings
- labelGenerator – optional instance of LabelGenerator
- includeFasta – optional boolean indicating whether to also print out the generated sequences in fasta format (the file will be produced with a .fa extension)
- prefix – string - this will be prefixed in front of the generated sequence ids, followed by a hyphen
 
- 
simdna.synthetic.sampleIndexWithinRegionOfLength(length, lengthOfThingToEmbed)¶
- Uniformly at random samples integers from 0 to - length-- lengthOfThingToEmbedIn.- Parameters: - length – length of full region that could be embedded in
- lengthOfThingToEmbed – length of thing being embedded in larger region