In high-throughput scenarios, we
often don’t know a priori whether a given dataset represents a sigmoidal
curve, a double-sigmoidal curve, or neither. In these cases, we need to
fit multiple different models to the data and then determine which model
fits the best. This process is normally done automatically by the
function sicegar::fitAndCategorize()
. In this vignette, we
describe how this process works and how it can be done manually.
We will demonstrate this process for an artificial dataset representing a double-sigmoidal function. We first generate the data:
time <- seq(3, 24, 0.5)
noise_parameter <- 0.2
intensity_noise <- runif(n = length(time), min = 0, max = 1) * noise_parameter
intensity <- doublesigmoidalFitFormula(time,
finalAsymptoteIntensityRatio = .3,
maximum = 4,
slope1Param = 1,
midPoint1Param = 7,
slope2Param = 1,
midPointDistanceParam = 8)
intensity <- intensity + intensity_noise
dataInput <- data.frame(time, intensity)
We need to fit the sigmoidal and double-sigmoidal models to this
dataset, using multipleFitFunction()
. This requires first
normalizing the data:
normalizedInput <- normalizeData(dataInput = dataInput,
dataInputName = "doubleSigmoidalSample")
# Fit sigmoidal model
sigmoidalModel <- multipleFitFunction(dataInput = normalizedInput,
model = "sigmoidal",
n_runs_min = 20,
n_runs_max = 500,
showDetails = FALSE)
# Fit double-sigmoidal model
doubleSigmoidalModel <- multipleFitFunction(dataInput = normalizedInput,
model = "doublesigmoidal",
n_runs_min = 20,
n_runs_max = 500,
showDetails = FALSE)
We also need to perform the additional parameter calculations, as
these are required by the categorize()
function we use
below.
# Calculate additional parameters
sigmoidalModel <- parameterCalculation(sigmoidalModel)
# Calculate additional parameters
doubleSigmoidalModel <- parameterCalculation(doubleSigmoidalModel)
This is what the two fits look like:
Clearly the sigmoidal fit is not appropriate but the double-sigmoidal
one is. Next we demonstrate how to arrive at this conclusion
computationally, using the function categorize()
. It takes
as input the two fitted models as well as a number of parameters that
are used in the decision process (explained below under “The decision
process”).
# now we can categorize the fits
decisionProcess <- categorize(threshold_minimum_for_intensity_maximum = 0.3,
threshold_intensity_range = 0.1,
threshold_t0_max_int = 0.05,
parameterVectorSigmoidal = sigmoidalModel,
parameterVectorDoubleSigmoidal = doubleSigmoidalModel)
The object returned by categorize()
contains extensive
information about the decision process, but the key component is the
decision
variable. Here, it states that the data fits the
double-sigmoidal model:
## [1] "double_sigmoidal"
(The possible values here are “no_signal”, “sigmoidal”, “double_sigmoidal”, and “ambiguous”.)
The decision process consists of two parts. First, the
categorize()
function checks whether all provided input
data are valid. The steps of this verification are as follows:
categorize()
function provided with
sigmoidal and double_sigmoidal models as input?sicegar::sigmoidalFitFunctions
?sicegar::doublesigmoidalFitFunctions
?sicegar::parameterCalculation()
?sicegar::parameterCalculation()
?After these steps, the primary decision process begins. It takes a list of four possible outcomes (“no_signal”, “sigmoidal”, “double_sigmoidal”, “ambiguous”) and systematically removes options until only one remains.
First, the algorithm checks if the provided data includes a signal or not.
threshold_minimum_for_intensity_maximum
; otherwise, the
data is labeled with "no_signal"
.threshold_intensity_range
; otherwise, the data is labeled
with "no_signal"
."no_signal"
, then the data can not be labeled with
"no signal"
anymore.Next the algorithm checks if the sigmoidal and double sigmoidal models make sense.
"sigmoidal"
."double_sigmoidal"
.threshold_AIC
; otherwise, the data can not be labeled with
"sigmoidal"
.threshold_AIC
; otherwise, the data cannot be labeled
with "double_sigmoidal"
.startPoint_x
for the sigmoidal model
must be a positive number; otherwise, the data cannot be labeled with
"sigmoidal"
.startPoint_x
for the
double-sigmoidal model must be a positive number; otherwise, the data
cannot be labeled with "double_sigmoidal"
.start_intensity
for the sigmoidal
model must be smaller than threshold_t0_max_int
; otherwise,
the data cannot be labeled with "sigmoidal"
.start_intensity
for the
double-sigmoidal model must be smaller than
threshold_t0_max_int
; otherwise, the data cannot be labeled
with "double_sigmoidal"
.threshold_dsm_tmax_IntensityRatio
; otherwise, the data
cannot be labeled with "double_sigmoidal"
.threshold_sm_tmax_IntensityRatio
; otherwise, the data
cannot be labeled with "sigmoidal"
.In step eight, the algorithm checks whether the data should be
labelled as "ambiguous"
or not.
"sigmoidal"
or "double_sigmoidal"
,
then the data cannot be labeled with "ambiguous"
.In the last step; the algorithm checks whether the data should be
labeled as "sigmoidal"
or
"double_sigmoidal"
.
"sigmoidal"
and "double_sigmoidal"
options,
then the choice will be made based on the AIC scores of those models and
value of threshold_bonus_sigmoidal_AIC
. If
sigmoidalAIC + threshold_bonus_sigmoidal_AIC < doublesigmoidalAIC
,
then the data cannot be labeled with "double_sigmoidal"
. If
sigmoidalAIC + threshold_bonus_sigmoidal_AIC > doublesigmoidalAIC
,
then the data cannot be labeled with "sigmoidal"
.The only option that is left at this point will be the label of the data and thus the final decision.