dragonn.synthetic.synthetic module

class dragonn.synthetic.synthetic.AbstractApplySingleMutationFromSet(setOfMutations, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractTransformation

Class for applying a single mutation from a set of mutations; used to transform substrings generated by another method

getClassName()[source]
getJsonableObject()[source]
selectMutation()[source]
transform(stringArr)[source]
class dragonn.synthetic.synthetic.AbstractBackgroundGenerator[source]

Bases: object

Returns the sequence that the embeddings are subsequently inserted into.

generateBackground()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.AbstractEmbeddable[source]

Bases: object

Represents a thing which can be embedded. Note that an Embeddable + a position = an embedding.

canEmbed(priorEmbeddedThings, startPos)[source]

priorEmbeddedThings: instance of AbstractPriorEmbeddedThings startPos: the position you are considering embedding self at returns a boolean indicating whether self can be embedded at startPos,

given the things that have already been embedded.
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)[source]

Will embed self at startPos in backgroundStringArr, and will update priorEmbeddedThings. priorEmbeddedThings: instance of AbstractPriorEmbeddedThings backgroundStringArr: an array of characters representing the background startPos: the position to embed self at

getDescription()[source]
class dragonn.synthetic.synthetic.AbstractEmbeddableGenerator(name)[source]

Bases: dragonn.synthetic.synthetic.DefaultNameMixin

Generates an embeddable, usually for embedding in a background sequence.

generateEmbeddable()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.AbstractEmbedder(name)[source]

Bases: dragonn.synthetic.synthetic.DefaultNameMixin

class that is used to embed things in a sequence

embed(backgroundStringArr, priorEmbeddedThings, additionalInfo=None)[source]

backgroundStringArr: array of characters representing the background string priorEmbeddedThings: instance of AbstractPriorEmbeddedThings. additionalInfo: instance of AdditionalInfo; allows the embedder to send back info about what it did modifies: backgroundStringArr to include whatever this class has embedded

getJsonableObject()[source]
class dragonn.synthetic.synthetic.AbstractLoadedMotifs(fileName, pseudocountProb=0.0, background=OrderedDict([('A', 0.27), ('C', 0.23), ('G', 0.23), ('T', 0.27)]))[source]

Bases: object

A class that contains instances of pwm.PWM loaded from a file. The pwms can be accessed by name.

getJsonableObject()[source]
getPwm(name)[source]

returns the pwm.PWM instance with the specified name.

getReadPwmAction(recordedPwms)[source]

This is the action that is to be performed on each line of the file when it is read in. recordedPwms is an OrderedDict that stores instances of pwm.PWM

class dragonn.synthetic.synthetic.AbstractPositionGenerator(name)[source]

Bases: dragonn.synthetic.synthetic.DefaultNameMixin

Given the length of the background sequence and the length of the substring you are trying to embed, will return a start position to embed the substring at.

generatePos(lenBackground, lenSubstring, additionalInfo=None)[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.AbstractPriorEmbeddedThings[source]

Bases: object

class that is used to keep track of what has already been embedded in a sequence

addEmbedding(startPos, what)[source]

embeds “what” from startPos to startPos+len(what). Creates an Embedding object

canEmbed(startPos, endPos)[source]

returns a boolean indicating whether the region from startPos to endPos is available for embedding

getEmbeddings()[source]

returns a collection of Embedding objects

getNumOccupiedPos()[source]

returns the number of posiitons that are filled with some kind of embedding

getTotalPos()[source]

returns the total number of positions available to embed things in

class dragonn.synthetic.synthetic.AbstractQuantityGenerator(name)[source]

Bases: dragonn.synthetic.synthetic.DefaultNameMixin

class to sample according to a distribution

generateQuantity()[source]

returns the sampled value

getJsonableObject()[source]
class dragonn.synthetic.synthetic.AbstractSequenceSetGenerator[source]

Bases: object

class that is used to return a generator for a collection of generated sequences.

generateSequences()[source]

returns a generator of GeneratedSequence objects

getJsonableObject()[source]

returns an object representing the details of this, which can be converted to json.

class dragonn.synthetic.synthetic.AbstractSetOfMutations(mutationsArr)[source]

Bases: object

Represents a collection of pwm.Mutation objects

getJsonableObject()[source]
getMutationsArr()[source]
class dragonn.synthetic.synthetic.AbstractSingleSequenceGenerator(namePrefix=None)[source]

Bases: object

When called, generates a single sequence

generateSequence()[source]

returns GeneratedSequence object

getJsonableObject()[source]

returns an object representing the details of this, which can be converted to json.

class dragonn.synthetic.synthetic.AbstractSubstringGenerator(name)[source]

Bases: dragonn.synthetic.synthetic.DefaultNameMixin

Generates a substring, usually for embedding in a background sequence.

generateSubstring()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.AbstractTransformation(name)[source]

Bases: dragonn.synthetic.synthetic.DefaultNameMixin

takes an array of characters, applies some transformation, returns an array of characters (may be the same (mutated) one or a different one)

getJsonableObject()[source]
transform(stringArr)[source]

stringArr is an array of characters. Returns an array of characters that has the transformation applied. May mutate stringArr

class dragonn.synthetic.synthetic.AdditionalInfo[source]

Bases: object

isInTrace(operatorName)[source]
updateAdditionalInfo(operatorName, value)[source]
updateTrace(operatorName)[source]
class dragonn.synthetic.synthetic.AllEmbedders(embedders, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbedder

Wrapper around a list of embedders to make sure all are called Useful in conjunciton with RandomSubsetOfEmbedders

getJsonableObject()[source]
class dragonn.synthetic.synthetic.BernoulliQuantityGenerator(prob, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractQuantityGenerator

Generates 1 or 0 according to a bernoulli distribution

generateQuantity()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.BestHitPwm(pwm, bestHitMode='pwmProb', name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractSubstringGenerator

always returns the best possible match to the pwm in question when called

generateSubstring()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.BestHitPwmFromLoadedMotifs(loadedMotifs, motifName, bestHitMode='pwmProb', name=None)[source]

Bases: dragonn.synthetic.synthetic.BestHitPwm

convenience wrapper class for instantiating parent by pulling the pwm given the name from an AbstractLoadedMotifs object (it basically extracts the pwm for you)

getJsonableObject()[source]
class dragonn.synthetic.synthetic.ChooseMutationAtRandom(setOfMutations, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractApplySingleMutationFromSet

Selects a mutation at random from self.setOfMutations to apply; see parent docs.

getClassName()[source]
selectMutation()[source]
class dragonn.synthetic.synthetic.ChooseValueFromASet(setOfPossibleValues, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractQuantityGenerator

Randomly samples a particular value from a set of values

generateQuantity()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.DefaultNameMixin(name)[source]

Bases: object

getDefaultName()[source]
class dragonn.synthetic.synthetic.EmbedInABackground(backgroundGenerator, embedders, namePrefix=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractSingleSequenceGenerator

Takes a backgroundGenerator and a series of embedders. Will generate the background and then call each of the embedders in succession. Then returns the result.

generateSequence()[source]

generates a background using self.backgroundGenerator, splits it into an array, and passes it to each of self.embedders in turn for embedding things. returns an instance of GeneratedSequence

getJsonableObject()[source]

see parent

class dragonn.synthetic.synthetic.EmbeddableEmbedder(embeddableGenerator, positionGenerator=<dragonn.synthetic.synthetic.UniformPositionGenerator object>, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbedder

Embeds instances of AbstractEmbeddable within the background sequence, at a position sampled from a distribution. Only embeds at unoccupied positions

getJsonableObject()[source]
class dragonn.synthetic.synthetic.Embedding(what, startPos)[source]

Bases: object

Represents something that has been embedded in a sequence

classmethod fromString(string, whatClass=None)[source]
class dragonn.synthetic.synthetic.FixedQuantityGenerator(quantity, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractQuantityGenerator

returns a fixed number every time generateQuantity is called

generateQuantity()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.FixedSubstringGenerator(fixedSubstring, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractSubstringGenerator

When generateSubstring() is called, always returns the same string. The string also serves as its own description

generateSubstring()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.GenerateSequenceNTimes(singleSetGenerator, N)[source]

Bases: dragonn.synthetic.synthetic.AbstractSequenceSetGenerator

If you just want to use a generator of a single sequence and call it N times, use this class.

generateSequences()[source]

calls singleSetGenerator N times.

getJsonableObject()[source]
class dragonn.synthetic.synthetic.GeneratedSequence(seqName, seq, embeddings, additionalInfo)[source]

Bases: object

An object representing a sequence that has been generated.

class dragonn.synthetic.synthetic.InsideCentralBp(centralBp, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractPositionGenerator

returns a position within the central region of a background sequence, sampled uniformly at random

getJsonableObject()[source]
class dragonn.synthetic.synthetic.IsInTraceLabelGenerator(labelNames)[source]

Bases: dragonn.synthetic.synthetic.LabelGenerator

class dragonn.synthetic.synthetic.LabelGenerator(labelNames, labelsFromGeneratedSequenceFunction)[source]

Bases: object

generateLabels(generatedSequence)[source]
class dragonn.synthetic.synthetic.LoadedEncodeMotifs(fileName, pseudocountProb=0.0, background=OrderedDict([('A', 0.27), ('C', 0.23), ('G', 0.23), ('T', 0.27)]))[source]

Bases: dragonn.synthetic.synthetic.AbstractLoadedMotifs

This class is specifically for reading files in the encode motif format - specifically the motifs.txt file that contains Pouya’s motifs

getReadPwmAction(recordedPwms)[source]
class dragonn.synthetic.synthetic.MinMaxWrapper(quantityGenerator, theMin=None, theMax=None, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractQuantityGenerator

Wrapper that restricts a distribution to only return values between the min and the max. If a value outside the range is returned, resamples until it obtains a value within the range. Warns if it resamples too many times.

generateQuantity()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.OutsideCentralBp(centralBp, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractPositionGenerator

Returns a position OUTSIDE the central region of a background sequence, sampled uniformly at random. Complement of InsideCentralBp.

getJsonableObject()[source]
class dragonn.synthetic.synthetic.PairEmbeddable(string1, string2, separation, embeddableDescription, nothingInBetween=True)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbeddable

Represents a pair of strings that are embedded with some separation. Used for motif grammars. See superclass docs.

canEmbed(priorEmbeddedThings, startPos)[source]
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)[source]
getDescription()[source]
class dragonn.synthetic.synthetic.PairEmbeddableGenerator(substringGenerator1, substringGenerator2, separationGenerator, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbeddableGenerator

generateEmbeddable()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.PairEmbeddableGenerator_General(embeddableGenerator1, embeddableGenerator2, separationGenerator, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbeddableGenerator

generateEmbeddable()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.PairEmbeddable_General(embeddable1, embeddable2, separation, embeddableDescription, nothingInBetween=True)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbeddable

embeds two Embeddable objects with some sep

canEmbed(priorEmbeddedThings, startPos)[source]
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)[source]
getDescription()[source]
class dragonn.synthetic.synthetic.PoissonQuantityGenerator(mean, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractQuantityGenerator

Generates values according to a poisson distribution

generateQuantity()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.PriorEmbeddedThings_numpyArrayBacked(seqLen)[source]

Bases: dragonn.synthetic.synthetic.AbstractPriorEmbeddedThings

uses a numpy array where positions are set to 1 if they are occupied, to determin which positions are occupied and which are not. See parent for more documentation.

addEmbedding(startPos, what)[source]

what: instance of Embeddable

canEmbed(startPos, endPos)[source]
getEmbeddings()[source]
getNumOccupiedPos()[source]
getTotalPos()[source]
class dragonn.synthetic.synthetic.PwmSampler(pwm, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractSubstringGenerator

samples from the pwm by calling self.pwm.sampleFromPwm

generateSubstring()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.PwmSamplerFromLoadedMotifs(loadedMotifs, motifName, name=None)[source]

Bases: dragonn.synthetic.synthetic.PwmSampler

convenience wrapper class for instantiating parent by pulling the pwm given the name from an AbstractLoadedMotifs object (it basically extracts the pwm for you)

getJsonableObject()[source]
class dragonn.synthetic.synthetic.RandomSubsetOfEmbedders(quantityGenerator, embedders, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbedder

Takes a quantity generator that generates a quantity of embedders, and executes that many embedders from a supplied set, in sequence

getJsonableObject()[source]
class dragonn.synthetic.synthetic.RepeatedEmbedder(embedder, quantityGenerator, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbedder

Wrapper around an embedder to call it multiple times according to sampling from a distribution.

getJsonableObject()[source]
class dragonn.synthetic.synthetic.RepeatedSubstringBackgroundGenerator(substringGenerator, repetitions)[source]

Bases: dragonn.synthetic.synthetic.AbstractBackgroundGenerator

generateBackground()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.ReverseComplementWrapper(substringGenerator, reverseComplementProb=0.5, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractSubstringGenerator

Wrapper around a AbstractSubstringGenerator that reverse complements it with the specified probability.

generateSubstring()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.RevertToReference(setOfMutations, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractTransformation

for a series of mutations, reverts the supplied string to the reference (“unmutated”) string

getJsonableObject()[source]
transform(stringArr)[source]
class dragonn.synthetic.synthetic.SampleFromDiscreteDistributionSubstringGenerator(discreteDistribution)[source]

Bases: dragonn.synthetic.synthetic.AbstractSubstringGenerator

generateSubstring()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.StringEmbeddable(string, stringDescription='')[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbeddable

represents a string (such as a sampling from a pwm) that is to be embedded in a background. See docs for superclass.

canEmbed(priorEmbeddedThings, startPos)[source]
embedInBackgroundStringArr(priorEmbeddedThings, backgroundStringArr, startPos)[source]
classmethod fromString(theString)[source]
getDescription()[source]
class dragonn.synthetic.synthetic.SubstringEmbeddableGenerator(substringGenerator, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbeddableGenerator

generateEmbeddable()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.SubstringEmbedder(substringGenerator, positionGenerator=<dragonn.synthetic.synthetic.UniformPositionGenerator object>, name=None)[source]

Bases: dragonn.synthetic.synthetic.EmbeddableEmbedder

embeds a single generated substring within the background sequence, at a position sampled from a distribution. Only embeds at unoccupied positions

class dragonn.synthetic.synthetic.TopNMutationsFromPwmRelativeToBestHit(pwm, N, bestHitMode)[source]

Bases: dragonn.synthetic.synthetic.AbstractSetOfMutations

See docs for parent; here, the collection of mutations are the top N strongest mutations for a PWM as compared to the best match for that pwm.

getJsonableObject()[source]
class dragonn.synthetic.synthetic.TopNMutationsFromPwmRelativeToBestHit_FromLoadedMotifs(loadedMotifs, pwmName, N, bestHitMode)[source]

Bases: dragonn.synthetic.synthetic.TopNMutationsFromPwmRelativeToBestHit

Like parent, except extracts the pwm.PWM object from an AbstractLoadedMotifs object, saving you a few lines of code.

getJsonableObject()[source]
class dragonn.synthetic.synthetic.TransformedSubstringGenerator(substringGenerator, transformations, transformationsDescription='transformations', name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractSubstringGenerator

Takes a substringGenerator and a set of AbstractTransformation objects, applies the transformations to the generated substring

generateSubstring()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.UniformIntegerGenerator(minVal, maxVal, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractQuantityGenerator

Randomly samples an integer from minVal to maxVal, inclusive.

generateQuantity()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.UniformPositionGenerator(name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractPositionGenerator

samples a start position to embed the substring in uniformly at random; does not return positions that are too close to the end of the background sequence to embed the full substring.

getJsonableObject()[source]
class dragonn.synthetic.synthetic.XOREmbedder(embedder1, embedder2, probOfFirst, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractEmbedder

calls exactly one of the supplied embedders

getJsonableObject()[source]
class dragonn.synthetic.synthetic.ZeroInflater(quantityGenerator, zeroProb, name=None)[source]

Bases: dragonn.synthetic.synthetic.AbstractQuantityGenerator

Wrapper that inflates the number of zeros returned. Flips a coin; if positive, will return zero - otherwise will sample from the wrapped distribution (which may still return 0)

generateQuantity()[source]
getJsonableObject()[source]
class dragonn.synthetic.synthetic.ZeroOrderBackgroundGenerator(seqLength, discreteDistribution=<dragonn.synthetic.util.DiscreteDistribution object>)[source]

Bases: dragonn.synthetic.synthetic.RepeatedSubstringBackgroundGenerator

returns a sequence with 40% GC content. Each base is sampled independently.

dragonn.synthetic.synthetic.generateString(options)[source]
dragonn.synthetic.synthetic.generateString_zeroOrderMarkov(length, discreteDistribution=<dragonn.synthetic.util.DiscreteDistribution object>)[source]

discreteDistribution: instance of util.DiscreteDistribution

dragonn.synthetic.synthetic.getEmbeddingsFromString(string)[source]
dragonn.synthetic.synthetic.getFileNamePieceFromOptions(options)[source]
dragonn.synthetic.synthetic.getGenerationOption(string)[source]
dragonn.synthetic.synthetic.getParentArgparse()[source]
dragonn.synthetic.synthetic.printSequences(outputFileName, sequenceSetGenerator, includeEmbeddings=False, labelGenerator=None, includeFasta=False)[source]

outputFileName: string sequenceSetGenerator: instance of AbstractSequenceSetGenerator Given an output filename, and an instance of AbstractSequenceSetGenerator, will call the sequence set generator and print the generated sequences to the output file. Will also create a file “info_outputFileName.txt” in the samedirectory as outputFileName that contains all the information about sequenceSetGenerator. includeEmbeddings: a boolean indicating whether to print a column that lists the embeddings labelGenerator: instance of LabelGenerator

dragonn.synthetic.synthetic.printSequencesTransformationPosNeg(outputFileNamePos, outputFileNameNeg, sequenceSetGenerator, transformation)[source]

outputFileName: string sequenceSetGenerator: instance of AbstractSequenceSetGenerator

generatedSequences: the sequences that have been generated by sequenceSetGenerator

Given an output filename, and an instance of AbstractSequenceSetGenerator, will print the generated sequences to the output file. Will also create a file

“info_outputFileName.txt” in the same directory as outputFileName that contains all the information about sequenceSetGenerator.
dragonn.synthetic.synthetic.sampleIndexWithinRegionOfLength(length, lengthOfThingToEmbed)[source]

uniformly at random samples integers from 0 to length-lengthOfThingToEmbedIn