simdna package¶
simdna.simulations module¶
-
simdna.simulations.get_distribution(GC_fraction)¶
-
simdna.simulations.motif_density(motif_name, seq_length, num_seqs, min_counts, max_counts, GC_fraction, central_bp=None)¶ returns sequences with motif density.
-
simdna.simulations.simple_motif_embedding(motif_name, seq_length, num_seqs, GC_fraction)¶ returns sequence array
-
simdna.simulations.simulate_differential_accessibility(pos_motif_names, neg_motif_names, seq_length, min_num_motifs, max_num_motifs, num_pos, num_neg, GC_fraction)¶ Generates data for differential accessibility task.
Parameters: - pos_motif_names (list) – List of strings.
- neg_motif_names (list) – List of strings.
- seq_length (int) –
- min_num_motifs (int) –
- max_num_motifs (int) –
- num_pos (int) –
- num_neg (int) –
- GC_fraction (float) –
Returns: - sequence_arr (1darray) – Contains sequence strings.
- y (1darray) – Contains labels.
-
simdna.simulations.simulate_heterodimer_grammar(motif1, motif2, seq_length, min_spacing, max_spacing, num_pos, num_neg, GC_fraction)¶ - Simulates two classes of sequences with motif1 and motif2:
- Positive class sequences with motif1 and motif2 positioned min_spacing and max_spacing
- Negative class sequences with independent motif1 and motif2 positioned
anywhere in the sequence, not as a heterodimer grammar
Parameters: - seq_length (int, length of sequence) –
- GC_fraction (float, GC fraction in background sequence) –
- num_pos (int, number of positive class sequences) –
- num_neg (int, number of negatice class sequences) –
- motif1 (str, encode motif name) –
- motif2 (str, encode motif name) –
- min_spacing (int, minimum inter motif spacing) –
- max_spacing (int, maximum inter motif spacing) –
Returns: - sequence_arr (1darray) – Array with sequence strings.
- y (1darray) – Array with positive/negative class labels.
-
simdna.simulations.simulate_motif_counting(motif_name, seq_length, pos_counts, neg_counts, num_pos, num_neg, GC_fraction)¶ Generates data for motif counting task.
Parameters: - motif_name (str) –
- seq_length (int) –
- pos_counts (list) – (min_counts, max_counts) for positive set.
- neg_counts (list) – (min_counts, max_counts) for negative set.
- num_pos (int) –
- num_neg (int) –
- GC_fraction (float) –
Returns: - sequence_arr (1darray) – Contains sequence strings.
- y (1darray) – Contains labels.
-
simdna.simulations.simulate_motif_density_localization(motif_name, seq_length, center_size, min_motif_counts, max_motif_counts, num_pos, num_neg, GC_fraction)¶ - Simulates two classes of seqeuences:
- Positive class sequences with multiple motif instances in center of the sequence.
- Negative class sequences with multiple motif instances anywhere in the sequence.
The number of motif instances is uniformly sampled between minimum and maximum motif counts.
Parameters: - motif_name (str) – encode motif name
- seq_length (int) – length of sequence
- center_size (int) – length of central part of the sequence where motifs can be positioned
- min_motif_counts (int) – minimum number of motif instances
- max_motif_counts (int) – maximum number of motif instances
- num_pos (int) – number of positive class sequences
- num_neg (int) – number of negative class sequences
- GC_fraction (float) – GC fraction in background sequence
Returns: - sequence_arr (1darray) – Contains sequence strings.
- y (1darray) – Contains labels.
-
simdna.simulations.simulate_multi_motif_embedding(motif_names, seq_length, min_num_motifs, max_num_motifs, num_seqs, GC_fraction)¶ Generates data for multi motif recognition task.
Parameters: - motif_names (list) – List of strings.
- seq_length (int) –
- min_num_motifs (int) –
- max_num_motifs (int) –
- num_seqs (int) –
- GC_fraction (float) –
Returns: - sequence_arr (1darray) – Contains sequence strings.
- y (ndarray) – Contains labels for each motif.
-
simdna.simulations.simulate_single_motif_detection(motif_name, seq_length, num_pos, num_neg, GC_fraction)¶ - Simulates two classes of seqeuences:
- Positive class sequence with a motif embedded anywhere in the sequence
- Negative class sequence without the motif
Parameters: - motif_name (str) – encode motif name
- seq_length (int) – length of sequence
- num_pos (int) – number of positive class sequences
- num_neg (int) – number of negative class sequences
- GC_fraction (float) – GC fraction in background sequence
Returns: - sequence_arr (1darray) – Array with sequence strings.
- y (1darray) – Array with positive/negative class labels.
simdna.synthetic module¶
-
class
simdna.synthetic.AbstractApplySingleMutationFromSet(setOfMutations, name=None)¶ Bases:
simdna.synthetic.AbstractTransformationClass for applying a single mutation from a set of mutations; used to transform substrings generated by another method
Parameters: - setOfMutations – instance of
AbstractSetOfMutations - name – see
DefaultNameMixin.
-
getJsonableObject()¶ See superclass.
-
selectMutation()¶ Chooses a mutation from the set of mutations to apply.
Returns: an instance of Mutation
-
transform(stringArr)¶ See superclass.
- setOfMutations – instance of
-
class
simdna.synthetic.AbstractBackgroundGenerator¶ Bases:
objectReturns the sequence that
AbstractEmbeddableobjects are to be embedded into.-
generateBackground()¶ Returns a sequence that is the background.
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
-
class
simdna.synthetic.AbstractEmbeddable¶ Bases:
objectRepresents a thing which can be embedded.
An
AbstractEmbeddable+ a position = anEmbedding-
canEmbed(priorEmbeddedThings, startPos)¶ Checks whether embedding is possible at a given pos.
Accepts an instance of
AbstractPriorEmbeddedThingsand astartPos, and checks ifstartPosis viable given the contents ofpriorEmbeddedThings.Parameters: priorEmbeddedThings – instance of startPos: int; the position you are considering embedding self atReturns: A boolean indicating whether self can be embedded at startPos, given the things that have already been embedded.
-
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)¶ Embed self in a background string.
Will embed self at
startPosinbackgroundStringArr, and will updatepriorEmbeddedThingsaccordingly.Parameters: priorEmbeddedThings – instance of AbstractPriorEmbeddedThings- backgroundStringArr: an array of characters representing
- the background
- startPos: integer; the position to embed self at
-
classmethod
fromString(theString)¶ Generate an instance of the embeddable from the provided string.
-
getDescription()¶ Return a concise description of the embeddable.
This should be concise and shouldn’t contain spaces. It will often be used when generating the __str__ representation of the embedabled.
-
-
class
simdna.synthetic.AbstractEmbeddableGenerator(name)¶ Bases:
simdna.synthetic.DefaultNameMixinGenerates an embeddable, usually for embedding in a background sequence.
-
generateEmbeddable()¶ Generate an embeddable object.
Returns: An instance of AbstractEmbeddable
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
-
class
simdna.synthetic.AbstractEmbedder(name)¶ Bases:
simdna.synthetic.DefaultNameMixinProduces
AbstractEmbeddableobjects and embeds them in a sequence.-
embed(backgroundStringArr, priorEmbeddedThings, additionalInfo=None)¶ Embeds things in the provided
backgroundStringArr.Modifies backgroundStringArr to include whatever has been embedded.
Parameters: - backgroundStringArr – array of characters representing the background string
- priorEmbeddedThings – instance of
AbstractPriorEmbeddedThings - additionalInfo – instance of
AdditionalInfo; allows the embedder to send back info about what it did
Returns: The modifed
backgroundStringArr
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
-
class
simdna.synthetic.AbstractLoadedMotifs(fileName, pseudocountProb=0.0, background=OrderedDict([('A', 0.27), ('C', 0.23), ('G', 0.23), ('T', 0.27)]))¶ Bases:
objectClass representing loaded PWMs.
A class that contains instances of
pwm.PWMloaded from a file. The pwms can be accessed by name.Parameters: - fileName – string, the path to the file to load
- pseudocountProb – if some of the pwms have 0 probability for some of the positions, will add the specified
pseudocountProbto the rows of the pwm and renormalise. - background – a dictionary with ACGT as the keys and the frequency as the values. Defaults to
util.DEFAULT_BACKGROUND_FREQ
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
getPwm(name)¶ Get a specific PWM.
Returns: The pwm.PWMinstance with the specified name.
-
getReadPwmAction(recordedPwms)¶ Action performed when each line of the pwm text file is read in.
This function is to be overridden by a specific implementation. It is executed on each line of the file when it is read in, and when PWMs are ready they will get inserted into
recordedPwms.Parameters: recordedPwms – an OrderedDictthat will be filled with PWMs.The keys will be the names of the PWMs and the values will be instances of
pwm.PWM
-
class
simdna.synthetic.AbstractPositionGenerator(name)¶ Bases:
simdna.synthetic.DefaultNameMixinGenerate a start position at which to embed something
Given the length of the background sequence and the length of the substring you are trying to embed, will return a start position to embed the substring at.
-
generatePos(lenBackground, lenSubstring, additionalInfo=None)¶ Generate the position to embed in.
Parameters: - lenBackground – int, length of background sequence
- lenSubstring – int, lenght of substring to embed
- additionalInfo – optional, instance of
AdditionalInfo. Is used to leave a trace that this positionGenerator was called
Returns: An integer which is the start index to embed in.
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
-
class
simdna.synthetic.AbstractPriorEmbeddedThings¶ Bases:
objectKeeps track of what has already been embedded in a sequence.
-
addEmbedding(startPos, what)¶ Records the embedding of a
AbstractEmbeddable.Embeds
whatfromstartPostostartPos+len(what). Creates anEmbeddingobject.Parameters: - startPos – int, the starting position at which to embed.
- what – instance of
AbstractEmbeddable
-
canEmbed(startPos, endPos)¶ Test whether startPos-endPos is available for embedding.
Parameters: - startPos – int, starting index
- endPos – int, ending index+1 (same semantics as array-slicing)
Returns: endPos is available for embedding
Return type: True if startPos
-
getEmbeddings()¶ Returns: A collection of Embedding objects
-
getNumOccupiedPos()¶ Returns: Number of posiitons that are filled with some kind of embedding
-
getTotalPos()¶ Returns: Total number of positions (occupied and unoccupoed) available to embed things in.
-
-
class
simdna.synthetic.AbstractQuantityGenerator(name)¶ Bases:
simdna.synthetic.DefaultNameMixinClass for sampling values from a distribution.
-
generateQuantity()¶ Sample a quantity from a distribution.
Returns: The sampled value.
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
-
class
simdna.synthetic.AbstractSequenceSetGenerator¶ Bases:
objectA generator for a collection of generated sequences.
-
generateSequences()¶ The generator; implementation should have a yield.
Called as
generatedSequences = sequenceSetGenerator.generateSequences()generateSequencescan then be iterated over.Returns: A generator of GeneratedSequence objects
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
-
class
simdna.synthetic.AbstractSetOfMutations(mutationsArr)¶ Bases:
objectRepresents a collection of
Mutationobjects.Parameters: mutationsArr – array of Mutationobjects-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
getMutationsArr()¶ Returns
self.mutationsArrReturns: self.mutationsArr
-
-
class
simdna.synthetic.AbstractSingleSequenceGenerator(namePrefix=None)¶ Bases:
objectGenerate a single sequence.
Parameters: namePrefix – the GeneratedSequence object has a field for the object’s name; this is the prefix associated with that name. The suffix is the value of a counter that is incremented every time -
generateSequence()¶ Generate the sequence.
Returns: An instance of GeneratedSequence
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
-
class
simdna.synthetic.AbstractSubstringGenerator(name)¶ Bases:
simdna.synthetic.DefaultNameMixinGenerates a substring, usually for embedding in a background sequence.
-
generateSubstring()¶ Returns: A tuple of (string, stringDescription); the result can bewrapped in an instance of
StringEmbeddable.stringDescriptionis a short descriptor that does not contain spaces and may be prefixed in front of string when generating the __str__ representation forStringEmbeddable.
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
-
class
simdna.synthetic.AbstractTransformation(name)¶ Bases:
simdna.synthetic.DefaultNameMixinClass representing a transformation applied to a character array.
Takes an array of characters, applies some transformation.
-
getJsonableObject()¶ Get JSON object representation.
Returns: A json-friendly object (built of dictionaries, lists and python primitives), which can be converted to json to record the exact details of what was simualted.
-
transform(stringArr)¶ Applies a transformation to stringArr.
Parameters: stringArr – an array of characters. Returns: An array of characters that has the transformation applied. May mutate
stringArr
-
-
class
simdna.synthetic.AdditionalInfo¶ Bases:
objectUsed to keep track of which embedders/ops were called and how many times.
- An instance of AdditionalInfo is meant to be an attribute of
- a
GeneratedSequenceobject. It keeps track of things like embedders, position generators, etc. - Has self.trace which is a dictionary from operatorName->int
- and which records operations that were called in the process of embedding things in the sequence. At the time of writing, operatorName is typically just the name of the embedder.
-
isInTrace(operatorName)¶ Return True if operatorName has been called on the sequence.
-
updateAdditionalInfo(operatorName, value)¶ Can be used to store any additional information on operatorName.
-
updateTrace(operatorName)¶ Increment count for the number of times operatorName was called.
-
class
simdna.synthetic.AllEmbedders(embedders, name=None)¶ Bases:
simdna.synthetic.AbstractEmbedderWrapper around a list of embedders that calls each one in turn.
Useful to nest under a
RandomSubsetOfEmbeddersParameters: embedders – an iterable of AbstractEmbedderobjects.-
getJsonableObject()¶ See superclass.
-
-
class
simdna.synthetic.BernoulliQuantityGenerator(prob, name=None)¶ Bases:
simdna.synthetic.AbstractQuantityGeneratorGenerates 1 or 0 according to a bernoulli distribution.
Parameters: prob – probability of 1 -
generateQuantity()¶ See superclass.
-
getJsonableObject()¶ See sueprclass.
-
-
class
simdna.synthetic.BestHitPwm(pwm, bestHitMode='pwmProb', name=None)¶ Bases:
simdna.synthetic.AbstractSubstringGeneratorAlways return the best possible match to a
pwm.PWMwhen called.Parameters: - pwm – an instance of
pwm.PWM - bestHitMode – one of the values in
pwm.BEST_HIT_MODE. IfpwmProb
then the best match will be determined according what is most likely to be sampled from the pwm matrix (this is the default). If
logOdds, then the best match will be determined according to what would result in the best match according to the log-odds matrix (so, taking the background into account).name: seeDefaultNameMixin-
generateSubstring()¶ See superclass.
-
getJsonableObject()¶ See superclass.
- pwm – an instance of
-
class
simdna.synthetic.BestHitPwmFromLoadedMotifs(loadedMotifs, motifName, bestHitMode='pwmProb', name=None)¶ Bases:
simdna.synthetic.BestHitPwmInstantiates
BestHitPwmusing aLoadedMotifsfile. Analogous toPwmSamplerFromLoadedMotifs.-
getJsonableObject()¶ See superclass.
-
-
class
simdna.synthetic.ChooseMutationAtRandom(setOfMutations, name=None)¶ Bases:
simdna.synthetic.AbstractApplySingleMutationFromSetSelects a mutation at random from self.setOfMutations to apply.
-
selectMutation()¶
-
-
class
simdna.synthetic.ChooseValueFromASet(setOfPossibleValues, name=None)¶ Bases:
simdna.synthetic.AbstractQuantityGeneratorRandomly samples a particular value from a set of values.
Parameters: setOfPossibleValues – array of values that will be randomly sampled from.
name: seeDefaultNameMixin.-
generateQuantity()¶ See superclass.
-
getJsonableObject()¶ See superclass.
-
-
class
simdna.synthetic.DefaultNameMixin(name)¶ Bases:
objectBasic functionality for classes that have a self.name attribute.
The self.name attribute is typically used to leave a trace in an instance of
AdditionalInfoParameters: name – string -
getDefaultName()¶
-
-
class
simdna.synthetic.EmbedInABackground(backgroundGenerator, embedders, namePrefix=None)¶ Bases:
simdna.synthetic.AbstractSingleSequenceGeneratorGenerate a background sequence and embed smaller sequences in it.
Takes a backgroundGenerator and a series of embedders. Will generate the background and then call each of the embedders in succession. Then returns the result.
Parameters: - backgroundGenerator – instance of
AbstractBackgroundGenerator - embedders – array of instances of
AbstractEmbedder - namePrefix – see parent
-
generateSequence()¶ Produce the sequence.
Generates a background using self.backgroundGenerator, splits it into an array, and passes it to each of self.embedders in turn for embedding things.
Returns: An instance of GeneratedSequence
-
getJsonableObject()¶ See superclass.
- backgroundGenerator – instance of
-
class
simdna.synthetic.EmbeddableEmbedder(embeddableGenerator, positionGenerator=<simdna.synthetic.UniformPositionGenerator object>, name=None)¶ Bases:
simdna.synthetic.AbstractEmbedderEmbeds an instance of
AbstractEmbeddableat a sampled pos.Embeds instances of
AbstractEmbeddablewithin the background sequence, at a position sampled from a distribution. Only embeds at unoccupied positions.Parameters: - embeddableGenerator – instance of
AbstractEmbeddableGenerator - positionGenerator – instance of
AbstractPositionGenerator
-
getJsonableObject()¶ See superclass.
- embeddableGenerator – instance of
-
class
simdna.synthetic.Embedding(what, startPos)¶ Bases:
objectRepresents something that has been embedded in a sequence.
Think of this as a combination of an embeddable + a start position.
Parameters: - what – object representing the thing that has been embedded. Should have`` __str__`` and
__len__defined. Often is an instance ofAbstractEmbeddable - startPos – int, the position relative to the start of the parent sequence at which seq has been embedded
-
classmethod
fromString(string, whatClass=None)¶ Recreate an
Embeddingobject from a string.Parameters: - string – assumed to have format:
description[-|_]startPos[-|_]whatString, wherewhatStringwill be provided towhatClass - whatClass – the class (usually a
AbstractEmbeddable) that will be used to instantiate the what from the whatString
Returns: The Embedding class called with
what=whatClass.fromString(whatString)andstartPos=int(startPos)- string – assumed to have format:
- what – object representing the thing that has been embedded. Should have`` __str__`` and
-
class
simdna.synthetic.FixedQuantityGenerator(quantity, name=None)¶ Bases:
simdna.synthetic.AbstractQuantityGeneratorReturns a fixed number every time generateQuantity is called.
Parameters: quantity – the value to return when generateQuantity is called. -
generateQuantity()¶ See superclass.
-
getJsonableObject()¶ See superclass.
-
-
class
simdna.synthetic.FixedSubstringGenerator(fixedSubstring, name=None)¶ Bases:
simdna.synthetic.AbstractSubstringGeneratorGenerates the same string every time.
When generateSubstring() is called, always returns the same string. The string also serves as its own description
Parameters: - fixedSubstring – the string to be generated
- name – see
DefaultNameMixin
-
generateSubstring()¶ See superclass.
-
getJsonableObject()¶ See superclass.
-
class
simdna.synthetic.GenerateSequenceNTimes(singleSetGenerator, N)¶ Bases:
simdna.synthetic.AbstractSequenceSetGeneratorCall a
AbstractSingleSequenceGeneratorN times.Parameters: - singleSetGenerator – an instance of
AbstractSequenceSetGenerator - N – integer, the number of times to call singleSetGenerator
-
generateSequences()¶ A generator that calls self.singleSetGenerator N times.
Returns: a generator that will call self.singleSetGenerator N times.
-
getJsonableObject()¶ See superclass.
- singleSetGenerator – an instance of
-
class
simdna.synthetic.GeneratedSequence(seqName, seq, embeddings, additionalInfo)¶ Bases:
objectAn object representing a sequence that has been generated.
Parameters: - seqName – string representing the name/id of the sequence
- seq – string representing the final generated sequence
- embeddings – an array of
Embeddingobjects. - additionalInfo – an instance of
AdditionalInfo
-
class
simdna.synthetic.InsideCentralBp(centralBp, name=None)¶ Bases:
simdna.synthetic.AbstractPositionGeneratorFor embedding within only the central region of a background.
- Returns a position within the central region of a background
- sequence, sampled uniformly at random
Parameters: - centralBp – int, the number of bp, centered in the middle of the background, from which to sample the position. Is NOT +/- centralBp around the middle (is +/- centralBp/2 around the middle). If the background sequence is even and centralBp is odd, the shorter region will go on the left.
- name – string - see
DefaultNameMixin
-
getJsonableObject()¶ See superclass.
-
class
simdna.synthetic.IsInTraceLabelGenerator(labelNames)¶ Bases:
simdna.synthetic.LabelGeneratorLabelGenerator where labels match which embedders are called.
- A special kind of LabelGenerator where the names of the labels
- are the names of embedders, and the label is 1 if a particular embedder has been called on the sequence and 0 otherwise.
-
class
simdna.synthetic.LabelGenerator(labelNames, labelsFromGeneratedSequenceFunction)¶ Bases:
objectGenerate labels for a generated sequence.
Parameters: - labelNames – an array of strings that are the names of the labels
- labelsFromGeneratedSequenceFunction – function that accepts
an instance of
GeneratedSequenceand returns an array of the labels (eg: an array of ones and zeros indicating if the criteria for various labels are met)
-
generateLabels(generatedSequence)¶ calls self.labelsFromGeneratedSequenceFunction.
Parameters: generatedSequence – an instance of GeneratedSequence
-
class
simdna.synthetic.LoadedEncodeMotifs(fileName, pseudocountProb=0.0, background=OrderedDict([('A', 0.27), ('C', 0.23), ('G', 0.23), ('T', 0.27)]))¶ Bases:
simdna.synthetic.AbstractLoadedMotifsA class for reading in a motifs file in the ENCODE motifs format.
This class is specifically for reading files in the encode motif format - specifically the motifs.txt file that contains Pouya’s motifs (http://compbio.mit.edu/encode-motifs/motifs.txt)
Basically, the motif declarations start with a >, the first characters after > until the first space are taken as the motif name, the lines after the line with a > have the format: “<ignored character> <prob of A> <prob of C> <prob of G> <prob of T>”
-
getReadPwmAction(recordedPwms)¶ See superclass.
-
-
class
simdna.synthetic.MinMaxWrapper(quantityGenerator, theMin=None, theMax=None, name=None)¶ Bases:
simdna.synthetic.AbstractQuantityGeneratorCompress a distribution to lie within a min and a max.
Wrapper that restricts a distribution to only return values between the min and the max. If a value outside the range is returned, resamples until it obtains a value within the range. Warns every time it tries to resample 10 times without successfully finding a value in the correct range.
Parameters: quantityGenerator – instance of AbstractQuantityGenerator.Used to draw samples from the distribution to truncate
theMin: can be None; if so will be ignored.
theMax: can be None; if so will be ignored.
-
generateQuantity()¶ See superclass.
-
getJsonableObject()¶ See superclass.
-
-
class
simdna.synthetic.Mutation(index, previous, new, parentLength=None)¶ Bases:
objectRepresent a single bp mutation in a motif sequence.
Useful for creating simulations involving SNPs.
Parameters: - index – the position idx within the motif of the mutation
- previous – character, the previous base at this position
- new – character, the new base at this position after the mutation
- parentLength – optional; length of the motif. Used for assertion checks.
-
applyMutation(stringArr)¶ Set the base at the position of the mutation to the mutated value.
Modifies stringArr which is an array of characters.
Parameters: stringArr – an array of characters, which gets modified.
-
parentLengthAssertionCheck(stringArr)¶ Checks that stringArr is consistent with parentLength if defined.
-
revert(stringArr)¶ Set the base at the position of the mutation to the unmutated value.
Modifies stringArr which is an array of characters.
Parameters: stringArr – an array of characters, which gets modified.
-
class
simdna.synthetic.OutsideCentralBp(centralBp, name=None)¶ Bases:
simdna.synthetic.AbstractPositionGeneratorFor embedding only OUTSIDE a central region of a background seq.
- Returns a position OUTSIDE the central region of a background sequence,
- sampled uniformly at random. Complement of InsideCentralBp.
Parameters: centralBp – int, the centralBp to avoid embedding in. See the docs for InsideCentralBpfor more details (this is the complement).-
getJsonableObject()¶ See superclass.
-
class
simdna.synthetic.PairEmbeddable(embeddable1, embeddable2, separation, embeddableDescription='', nothingInBetween=True)¶ Bases:
simdna.synthetic.AbstractEmbeddableEmbed two embeddables with some separation.
Parameters: - embeddable1 – instance of
AbstractEmbeddable. First embeddable to be embedded. If a string is provided, will be wrapped inStringEmbeddable - embeddable2 – second embeddable to be embedded. Type information similar to that of
embeddable1 - separation – int of distance separating embeddable1 and embeddable2
- embeddableDescription – a concise descriptive string prefixed in front when generating a __str__ representation of the embeddable. Should not contain a hyphen.
- nothingInBetween – if true, then nothing else is allowed to be embedded in the gap between embeddable1 and embeddable2.
-
canEmbed(priorEmbeddedThings, startPos)¶ See superclass.
-
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)¶ See superclass.
If
self.nothingInBetween, then all the intervening positions between the two embeddables will be marked as occupied. Otherwise, only the positions occupied by the embeddables will be marked as occupied.
-
getDescription()¶ See superclass.
- embeddable1 – instance of
-
class
simdna.synthetic.PairEmbeddableGenerator(embeddableGenerator1, embeddableGenerator2, separationGenerator, name=None)¶ Bases:
simdna.synthetic.AbstractEmbeddableGeneratorEmbed a pair of embeddables with some separation.
Parameters: emeddableGenerator1 – instance of AbstractEmbeddableGenerator. If an:param
AbstractSubstringGeneratoris provided, will be wrapped in an instance ofSubstringEmbeddableGenerator.: :param embeddableGenerator2: same type information as forembeddableGenerator1:param separationGenerator: instance ofAbstractQuantityGenerator:param name: string, seeDefaultNameMixin-
generateEmbeddable()¶ See superclass.
-
getJsonableObject()¶ See superclass.
-
-
class
simdna.synthetic.PoissonQuantityGenerator(mean, name=None)¶ Bases:
simdna.synthetic.AbstractQuantityGeneratorGenerates values according to a poisson distribution.
Parameters: mean – the mean of the poisson distribution -
generateQuantity()¶ See superclass.
-
getJsonableObject()¶ See superclass.
-
-
class
simdna.synthetic.PriorEmbeddedThings_numpyArrayBacked(seqLen)¶ Bases:
simdna.synthetic.AbstractPriorEmbeddedThingsA numpy-array based implementation of
AbstractPriorEmbeddedThings.Uses a numpy array where positions are set to 1 if they are occupied, to determine which positions are occupied and which are not. See superclass for more documentation.
Parameters: seqLen – integer indicating length of the sequence you are embedding in -
addEmbedding(startPos, what)¶ See superclass.
-
canEmbed(startPos, endPos)¶ See superclass.
-
getEmbeddings()¶ See superclass.
-
getNumOccupiedPos()¶ See superclass.
-
getTotalPos()¶ See superclass.
-
-
class
simdna.synthetic.PwmSampler(pwm, name=None)¶ Bases:
simdna.synthetic.AbstractSubstringGeneratorSamples from a pwm by calling
self.pwm.sampleFromPwmParameters: - pwm – an instance of
pwm.PWM - name – see
DefaultNameMixin
-
generateSubstring()¶ See superclass.
-
getJsonableObject()¶ See superclass.
- pwm – an instance of
-
class
simdna.synthetic.PwmSamplerFromLoadedMotifs(loadedMotifs, motifName, name=None)¶ Bases:
simdna.synthetic.PwmSamplerInstantiates a
PwmSamplerfrom aLoadedEncodeMotifsfile.Convenience wrapper class for instantiating
PwmSamplerby pulling the pwm.PWM object using the provided name from anAbstractLoadedMotifsobjectParameters: - loadedMotifs – instance of
AbstractLoadedMotifs - motifName – string, name of a motif in
AbstractLoadedMotifs - name – see
DefaultNameMixin
-
getJsonableObject()¶ See superclass.
- loadedMotifs – instance of
-
class
simdna.synthetic.RandomSubsetOfEmbedders(quantityGenerator, embedders, name=None)¶ Bases:
simdna.synthetic.AbstractEmbedderCall some random subset of supplied embedders.
Takes a quantity generator that generates a quantity of embedders, and executes that many embedders from a supplied set, in sequence
Parameters: - quantityGenerator – instance of
AbstractQuantityGenerator - embedders – a list of
AbstractEmbedderobjects
-
getJsonableObject()¶ See superclass.
- quantityGenerator – instance of
-
class
simdna.synthetic.RepeatedEmbedder(embedder, quantityGenerator, name=None)¶ Bases:
simdna.synthetic.AbstractEmbedderCall an embedded multiple times.
Wrapper around an embedder to call it multiple times according to samples from a distribution. First calls
self.quantityGeneratorto get the quantity, then callsself.embeddera number of times equal to the value returned.Parameters: - embedder – instance of
AbstractEmbedder - quantityGenerator – instance of
AbstractQuantityGenerator
-
getJsonableObject()¶ See superclass.
- embedder – instance of
-
class
simdna.synthetic.RepeatedSubstringBackgroundGenerator(substringGenerator, repetitions)¶ Bases:
simdna.synthetic.AbstractBackgroundGeneratorRepeatedly call a substring generator and concatenate the result.
Can be used to generate variable-length sequences.
Parameters: - substringGenerator – instance of
AbstractSubstringGenerator - repetitions – instance of
AbstractQuantityGenerator. If pass an int, will create aFixedQuantityGeneratorfrom the int. This will be called to determine the number of times to generate a substring fromself.substringGenerator
Returns: The concatenation of all the calls to
self.substringGenerator-
generateBackground()¶
-
getJsonableObject()¶ See superclass.
- substringGenerator – instance of
-
class
simdna.synthetic.ReverseComplementWrapper(substringGenerator, reverseComplementProb=0.5, name=None)¶ Bases:
simdna.synthetic.AbstractSubstringGeneratorReverse complements a string with a specified probability.
Wrapper around an instance of
AbstractSubstringGeneratorthat reverse complements the generated string with a specified probability.Parameters: - substringGenerator – instance of .AbstractSubstringGenerator
- reverseComplementProb – probability of reverse complementation.
Defaults to 0.5.
name: seeDefaultNameMixin.-
generateSubstring()¶
-
getJsonableObject()¶ See superclass.
-
class
simdna.synthetic.RevertToReference(setOfMutations, name=None)¶ Bases:
simdna.synthetic.AbstractTransformationFor a series of mutations, reverts the supplied character to the reference (“unmutated”) string.
Parameters: - setOfMutations – instance of AbstractSetOfMutations
- name – see
DefaultNameMixin.
-
getJsonableObject()¶ See superclass.
-
transform(stringArr)¶ See superclass.
-
class
simdna.synthetic.SampleFromDiscreteDistributionSubstringGenerator(discreteDistribution)¶ Bases:
simdna.synthetic.AbstractSubstringGeneratorGenerate a substring by sampling from a distribution.
If the “substrings” are single characters (A/C/G/T), can be used in conjunction with
RepeatedSubstringBackgroundGeneratorto generate sequences with a certain GC content.Parameters: discreteDistribution – instance of util.DiscreteDistribution-
generateSubstring()¶
-
getJsonableObject()¶ See superclass.
-
-
class
simdna.synthetic.StringEmbeddable(string, stringDescription='')¶ Bases:
simdna.synthetic.AbstractEmbeddableA string that is to be embedded in a background.
Represents a string (such as a sampling from a pwm) that is to be embedded in a background. See docs for superclass.
Parameters: - string – the core string to be embedded
- stringDescription – a short descriptor prefixed before the
__str__representation of the embeddable. Should not contain a hyphen. Defaults to “”.
-
canEmbed(priorEmbeddedThings, startPos)¶ See superclass.
-
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)¶ See superclass.
-
classmethod
fromString(theString)¶ Generates a StringEmbeddable from the provided string.
Parameters: theString – string of the format stringDescription-coreString. Will then return:StringEmbeddable(string=coreString, stringDescription=stringDescription)Returns: An instance of StringEmbeddable
-
getDescription()¶ See superclass.
-
class
simdna.synthetic.SubstringEmbeddableGenerator(substringGenerator, name=None)¶ Bases:
simdna.synthetic.AbstractEmbeddableGeneratorGenerates a
StringEmbeddableCalls
substringGenerator, wraps the result in aStringEmbeddableand returns it.Parameters: substringGenerator – instance of AbstractSubstringGenerator-
generateEmbeddable()¶
-
getJsonableObject()¶ See superclass.
-
-
class
simdna.synthetic.SubstringEmbedder(substringGenerator, positionGenerator=<simdna.synthetic.UniformPositionGenerator object>, name=None)¶ Bases:
simdna.synthetic.EmbeddableEmbedderUsed to embed substrings.
Embeds a single generated substring within the background sequence, at a position sampled from a distribution. Only embeds at unoccupied positions
Parameters: - substringGenerator – instance of
AbstractSubstringGenerator - positionGenerator – instance of
AbstractPositionGenerator - name – see
DefaultNameMixin.
- substringGenerator – instance of
-
class
simdna.synthetic.TransformedSubstringGenerator(substringGenerator, transformations, transformationsDescription='transformations', name=None)¶ Bases:
simdna.synthetic.AbstractSubstringGeneratorGenerates a substring and applies a series of transformations.
Takes a substringGenerator and a set of AbstractTransformation objects, applies the transformations to the generated substring
Parameters: - substringGenerator – instance of
AbstractSubstringGenerator - transformations – an iterable of
AbstractTransformation - transformationsDescription – a string that will be prefixed in front of
substringDescription(generated bysubstringGenerator.generateSubstring())to produce thestringDescription. - name – see
DefaultNameMixin.
-
generateSubstring()¶ See superclass.
-
getJsonableObject()¶ See superclass.
- substringGenerator – instance of
-
class
simdna.synthetic.UniformIntegerGenerator(minVal, maxVal, name=None)¶ Bases:
simdna.synthetic.AbstractQuantityGeneratorRandomly samples an integer from minVal to maxVal, inclusive.
Parameters: - minVal – minimum integer that can be sampled
- maxVal – maximum integers that can be sampled
- name – See superclass.
-
generateQuantity()¶ See superclass.
-
getJsonableObject()¶ See superclass.
-
class
simdna.synthetic.UniformPositionGenerator(name=None)¶ Bases:
simdna.synthetic.AbstractPositionGeneratorSample position uniformly at random.
- Samples a start position to embed the substring in uniformly at random;
- does not return positions that are too close to the end of the background sequence to embed the full substring.
Parameters: name – string, see DefaultNameMixin-
getJsonableObject()¶ See superclass.
-
class
simdna.synthetic.XOREmbedder(embedder1, embedder2, probOfFirst, name=None)¶ Bases:
simdna.synthetic.AbstractEmbedderCalls exactly one of the supplied embedders.
Parameters: - embedder1 – instance of
AbstractEmbedder - embedder2 – instance of
AbstractEmbedder - probOfFirst – probability of calling the first embedder
-
getJsonableObject()¶ See superclass.
- embedder1 – instance of
-
class
simdna.synthetic.ZeroInflater(quantityGenerator, zeroProb, name=None)¶ Bases:
simdna.synthetic.AbstractQuantityGeneratorInflate a particular distribution with zeros.
Wrapper that inflates the number of zeros returned. Flips a coin; if positive, will return zero - otherwise will sample from the wrapped distribution (which may still return 0)
Parameters: - quantityGenerator – an instance of
AbstractQuantityGenerator; represents the distribution to sample from with probability1-zeroProb - zeroProb – the probability of just returning 0 without sampling from
quantityGenerator - name – see
DefaultNameMixin.
-
generateQuantity()¶ See superclass.
-
getJsonableObject()¶ See superclass.
- quantityGenerator – an instance of
-
class
simdna.synthetic.ZeroOrderBackgroundGenerator(seqLength, discreteDistribution=<simdna.util.DiscreteDistribution object>)¶ Bases:
simdna.synthetic.RepeatedSubstringBackgroundGeneratorReturns a sequence with a certain GC content.
Each base is sampled independently.
Parameters: - seqLength – int, length of the background
- discreteDistribution – instance of
util.DiscreteDistribution`, defaults to ``util.DEFAULT_BASE_DISCRETE_DISTRIBUTION
-
simdna.synthetic.getEmbeddingsFromString(string)¶ Get a series of
Embeddingobjects from a string.- Splits the string on commas, and then passes the comma-separated vals
- to
Embedding.fromString()
Parameters: string – The string to turn into an array of Embedding objects Returns: an array of Embeddingobjects
-
simdna.synthetic.printSequences(outputFileName, sequenceSetGenerator, includeEmbeddings=False, labelGenerator=None, includeFasta=False, prefix=None)¶ Print a series of synthetic sequences.
- Given an output filename, and an instance of
AbstractSequenceSetGenerator, will call the sequenceSetGenerator and print the generated sequences to the output file. Will also create a file “info_outputFileName.txt” in the same directory as outputFileName that contains all the information about sequenceSetGenerator.
Parameters: - outputFileName – string
- sequenceSetGenerator – instance of
AbstractSequenceSetGenerator - includeEmbeddings – a boolean indicating whether to print a column that lists the embeddings
- labelGenerator – optional instance of
LabelGenerator - includeFasta – optional boolean indicating whether to also print out the generated sequences in fasta format (the file will be produced with a .fa extension)
- prefix – string - this will be prefixed in front of the generated sequence ids, followed by a hyphen
-
simdna.synthetic.sampleIndexWithinRegionOfLength(length, lengthOfThingToEmbed)¶ Uniformly at random samples integers from 0 to
length-lengthOfThingToEmbedIn.Parameters: - length – length of full region that could be embedded in
- lengthOfThingToEmbed – length of thing being embedded in larger region