Creative Biolabs offers clients statistical/probabilistic analysis, and experimental biases tests for protein library design and screening.
For protein library design and screening, we must balance our capacity to build a large diversity of protein variants with the physical limitations of what we can actually screen. Our goal is to offer some of the main statistical and probabilistic analysis for protein library creation, library-based screening and selection strategies. The statistical and probabilistic questions pertaining to library representation are important, and vary according to each different application. Besides that, we also provide tests for experimental biases, to aid the assessment of the library quality and the occurrence of biases before or after selection. The computation of these criteria throughout the process of experimental protein engineering will enable us to better design and evaluate the products of our protein variant libraries.
Figure 1. The important role played by statistical and probabilistic analysis in protein library creation, selection and screening.
A few important parameters must be determined in the process of protein library design and screening: firstly, it is the library size that is desired; secondly, how to represent the library, as the representation is required for a given application; lastly, what could be the constraints that are imposed by the screening strategy. These parameters may be addressed intuitively, although this can lead to conceptual or experimental errors. But mathematical methods are available for better planning and execution of library-based experimentation.
Supposedly a library contains n possible, different, theoretical variants to be sampled for m times, randomly. To represent the library, the following questions have to be answered: how many of the n theoretical variants expected not to appear among the m variants chosen, the probability that at least one of the n theoretical variants has not been sampled, among the m variants chosen, the probability that at most a certain number of the theoretical variants have not been sampled, and how many times a specific variant is to appear in the sample, or more generally, what is the probability that it appears r times. We must understand that items in the library can be selected with either equal or unequal probabilities. The above questions may have different answers in these two scenarios.
In screening a library, a significant bias can happen. And it possibly suggests a faulty codon distribution (e.g., flawed oligonucleotide synthesis), a positional bias in a random mutagenesis scheme, such as a bias caused by the native sequence, and lastly, a certain amount of selection that has occurred when not intended, and it may be necessary to switch to an alternate system.