Sometimes it can be useful to fit
only specific models to a dataset rather than fit multiple models and
run the decision algorithm. For this purpose, sicegar
provides the function multipleFitFunction()
. This function
fits a chosen model to the input dataset. It calls the fitting algorithm
multiple times with different, randomly generated start parameters, to
guarantee robust and reliable fitting.
We will demonstrate the use of this function on double-sigmoidal
data, generated by adding some noise to the double-sigmoidal curve used
for fitting. The curve used for fitting is implemented as
doublesigmoidalFitFormula()
, and thus can be used to
generate model data.
time <- seq(3, 24, 0.5)
noise_parameter <- 0.2
intensity_noise <- runif(n = length(time), min = 0, max = 1) * noise_parameter
intensity <- doublesigmoidalFitFormula(time,
finalAsymptoteIntensityRatio = .3,
maximum = 4,
slope1Param = 1,
midPoint1Param = 7,
slope2Param = 1,
midPointDistanceParam = 8)
intensity <- intensity + intensity_noise
dataInput <- data.frame(time, intensity)
head(dataInput) # the generated input data
## time intensity
## 1 3.0 0.2342345
## 2 3.5 0.2634529
## 3 4.0 0.2884316
## 4 4.5 0.3297332
## 5 5.0 0.5435398
## 6 5.5 0.7838986
Before we can perform the fit, we need to normalize the data
appropriately. All sicegar fit functions work on
normalized data, where time and intensity are normalized to the interval
from 0 to 1. Sicegar provides a convenient normalization function
normalizeData()
that normalizes data appropriately while
storing the required information to transform fitted parameters back
into non-normalized coordinates:
normalizedInput <- normalizeData(dataInput = dataInput,
dataInputName = "doubleSigmoidalSample")
head(normalizedInput$timeIntensityData) # the normalized time and intensity data
## time intensity
## 1 0.1250000 0.000000000
## 2 0.1458333 0.007491139
## 3 0.1666667 0.013895288
## 4 0.1875000 0.024484372
## 5 0.2083333 0.079300988
## 6 0.2291667 0.140925160
The data scaling paratmers and the data input name are stored as well:
## timeRange intensityMin intensityMax intensityRange
## 24.0000000 0.2342345 4.1346318 3.9003973
## [1] "doubleSigmoidalSample"
Note that normalizeData()
normalizes time with respect
to the maximum value the time parameter takes:
Whereas intensity is normalized with respect to the intensity interval:
intensityMin <- min(intensity)
intensityMax <- max(intensity)
intensityRange <- intensityMax - intensityMin
intensityNormalized <- (intensity-intensityMin)/intensityRange # normalized intensity values
To fit a model to the data using the function
multipleFitFunction()
, we provide it as input the
normalized data and the model type to be fitted, which can be
"sigmoidal"
or "doublesigmoidal"
. Here we are
fitting both models to the same input data:
# Do the sigmoidal fit
sigmoidalModel <- multipleFitFunction(dataInput=normalizedInput,
model="sigmoidal")
# Do the double-sigmoidal fit
doubleSigmoidalModel <- multipleFitFunction(dataInput=normalizedInput,
model="doublesigmoidal")
The two generated model objects contain a large number of computed parameters, described in detail in the following.
## [,1]
## maximum_N_Estimate "0.5817715"
## maximum_Std_Error "0.0471064"
## maximum_t_value "12.35016"
## maximum_Pr_t "3.144059e-15"
## slopeParam_N_Estimate "56.69416"
## slopeParam_Std_Error "62.9224"
## slopeParam_t_value "0.9010171"
## slopeParam_Pr_t "0.3729724"
## midPoint_N_Estimate "0.254053"
## midPoint_Std_Error "0.02232296"
## midPoint_t_value "11.38079"
## midPoint_Pr_t "4.075252e-14"
## residual_Sum_of_Squares "2.947177"
## log_likelihood "-3.38678"
## AIC_value "14.77356"
## BIC_value "21.81836"
## isThisaFit "TRUE"
## startVector.maximum "0.4293649"
## startVector.slopeParam "17.12084"
## startVector.midPoint "-0.2478345"
## dataScalingParameters.timeRange "24"
## dataScalingParameters.intensityMin "0.2342345"
## dataScalingParameters.intensityMax "4.134632"
## dataScalingParameters.intensityRange "3.900397"
## model "sigmoidal"
## additionalParameters "FALSE"
## maximum_Estimate "2.503374"
## slopeParam_Estimate "2.362257"
## midPoint_Estimate "6.097271"
## dataInputName "doubleSigmoidalSample"
## betterFit "4"
## correctFit "20"
## totalFit "28"
## [,1]
## finalAsymptoteIntensityRatio_N_Estimate "0.2783729"
## finalAsymptoteIntensityRatio_Std_Error "0.005673176"
## finalAsymptoteIntensityRatio_t_value "49.06827"
## finalAsymptoteIntensityRatio_Pr_t "2.80299e-35"
## maximum_N_Estimate "0.9890303"
## maximum_Std_Error "0.006744761"
## maximum_t_value "146.6368"
## maximum_Pr_t "9.166125e-53"
## slope1Param_N_Estimate "26.72541"
## slope1Param_Std_Error "0.9129164"
## slope1Param_t_value "29.27476"
## slope1Param_Pr_t "3.431544e-27"
## midPoint1Param_N_Estimate "0.2942993"
## midPoint1Param_Std_Error "0.001583065"
## midPoint1Param_t_value "185.9048"
## midPoint1Param_Pr_t "1.427291e-56"
## slope2Param_N_Estimate "25.36587"
## slope2Param_Std_Error "1.253865"
## slope2Param_t_value "20.23014"
## slope2Param_Pr_t "1.33991e-21"
## midPointDistanceParam_N_Estimate "0.334362"
## midPointDistanceParam_Std_Error "0.003221369"
## midPointDistanceParam_t_value "103.795"
## midPointDistanceParam_Pr_t "3.171517e-47"
## residual_Sum_of_Squares "0.01073605"
## log_likelihood "117.3356"
## AIC_value "-220.6713"
## BIC_value "-208.3429"
## isThisaFit "TRUE"
## startVector.finalAsymptoteIntensityRatio "0.7374777"
## startVector.maximum "0.6159543"
## startVector.slope1Param "86.40114"
## startVector.midPoint1Param "0.5335278"
## startVector.slope2Param "179.7345"
## startVector.midPointDistanceParam "0.212152"
## dataScalingParameters.timeRange "24"
## dataScalingParameters.intensityMin "0.2342345"
## dataScalingParameters.intensityMax "4.134632"
## dataScalingParameters.intensityRange "3.900397"
## model "doublesigmoidal"
## additionalParameters "FALSE"
## finalAsymptoteIntensityRatio_Estimate "0.2783729"
## maximum_Estimate "4.091846"
## slope1Param_Estimate "1.113559"
## midPoint1Param_Estimate "7.063182"
## slope2Param_Estimate "1.056911"
## midPointDistanceParam_Estimate "8.024689"
## dataInputName "doubleSigmoidalSample"
## betterFit "5"
## correctFit "20"
## totalFit "37"
The calculated quantities can be grouped into several different groups of parameters:
1. Information about the fitting process
model
: String indicating the type of the model,
"sigmoidal"
for the sigmoidal model and
"doublesigmoidal"
for the double-sigmoidal model.isThisaFit
: A boolean that equals to TRUE
if at least one fit was successful and FALSE
otherwise.betterFit
: The number of times that the minimum AIC
score improved with a subsequent fitting attempt. In other words, this
counts the number of times the multiple fit attempts increased fit
quality.correctFit
: The total number of successfull fits.totalFit
: The total number of fit attempts.2. Estimates of the fitted parameters
These estimates have been converted from the normalized data to the original raw data, and are the main quantities of interest to the user. They depend on the type of the model, sigmoidal vs. double-sigmoidal.
Estimates for the sigmoidal model are:
maximum_Estimate
: Maximum intensity estimate for the
raw data.slopeParam_Estimate
: Slope parameter estimate
for the raw data. Note that the slope parameter is related to but not
equal to the slope. See the vignette on additional parameters for
details.midPoint_Estimate
: Mid-point estimate (time the
intensity reaches 1/2 of maximum) for the raw data.Estimates for the double-sigmoidal model are:
maximum_Estimate
: Maximum intensity estimate for the
raw data.slope1Param_Estimate
: Slope 1 parameter
estimate for the raw data. Note that the slope 1 parameter is related to
but not equal to the slope. See the vignette on additional parameters
for details.midPoint1Param_Estimate
: Mid-point 1 estimate (time the
intensity reaches 1/2 of maximum) for the raw data. Needs numerical
correction. See the vignette on additional parameters for
details.slope2Param_Estimate
: Slope 2 parameter
estimate for the raw data. Note that the slope 2 parameter is related to
but not equal to the slope. See the vignette on additional parameters
for details.midPointDistanceParam_Estimate
: Distance between
mid-point 1 and mid-point 2, where mid-point 2 is the time at which
intensity decreases to the mean between the final asymptote intensity
and the maximum value. Needs numerical correction. See the
vignette on additional parameters for details.finalAsymptoteIntensityRatio_Estimate
: This is the
ratio between asymptote intensity and maximum intensity of the
fitted curve.3. Quantities measuring the overall quality of fit
residual_Sum_of_Squares
: Residual sum of squares,
smaller values indicate a better fit.log_likelihood
: Negative log likelihood, larger values
indicate a better fit.AIC_value
: Akaike Information Criterion, smaller values
indicate a better fit.BIC_value
: Bayesian Information Criterion, smaller
values indicate a better fit.4. Start point of the gradient descent algorithm
Each time a fit is attempted, the likelihood maximization algorithm
starts from a random initiation point and finds the final parameter
estimates by gradient descent. The start vector for the best fit is
returned in the form of variables whose name starts with
startVector.
, followed by the name of the estimated
parameter. For example:
startVector.maximum
: Value of the maximum parameter at
the initiation point.5. Parameters related to the normalization step
dataScalingParameters.timeRange
: Maximum of raw time
data.dataScalingParameters.intensityMin
: Minimum of raw
intensity data.dataScalingParameters.intensityMax
: Maximum of raw
intensity data.dataScalingParameters.intensityRange
: Maximum - Minimum
of intensity data.6. Error estimates for fitted parameters
For each estimated parameter listed under point 2, the algorithm provides additional statistical parameters, such as the estimate in the normalized scale, the standard error (also in normalized scale), the t value, and the P value. For example, for the maximum estimate, these are:
maximum_N_Estimate
: Estimate in normalized scale.maximum_Std_Error
: Standard error, in normalized
scale.maximum_t_value
: t valuemaximum_Pr_t
: P value