Fitting individual models

Sometimes it can be useful to fit only specific models to a dataset rather than fit multiple models and run the decision algorithm. For this purpose, sicegar provides the function multipleFitFunction(). This function fits a chosen model to the input dataset. It calls the fitting algorithm multiple times with different, randomly generated start parameters, to guarantee robust and reliable fitting.

We will demonstrate the use of this function on double-sigmoidal data, generated by adding some noise to the double-sigmoidal curve used for fitting. The curve used for fitting is implemented as doublesigmoidalFitFormula(), and thus can be used to generate model data.

time <- seq(3, 24, 0.5)
noise_parameter <- 0.2
intensity_noise <- runif(n = length(time), min = 0, max = 1) * noise_parameter
intensity <- doublesigmoidalFitFormula(time,
                                       finalAsymptoteIntensityRatio = .3,
                                       maximum = 4,
                                       slope1Param = 1,
                                       midPoint1Param = 7,
                                       slope2Param = 1,
                                       midPointDistanceParam = 8)
intensity <- intensity + intensity_noise
dataInput <- data.frame(time, intensity)
head(dataInput) # the generated input data

##   time intensity
## 1  3.0 0.2342345
## 2  3.5 0.2634529
## 3  4.0 0.2884316
## 4  4.5 0.3297332
## 5  5.0 0.5435398
## 6  5.5 0.7838986

Before we can perform the fit, we need to normalize the data appropriately. All sicegar fit functions work on normalized data, where time and intensity are normalized to the interval from 0 to 1. Sicegar provides a convenient normalization function normalizeData() that normalizes data appropriately while storing the required information to transform fitted parameters back into non-normalized coordinates:

normalizedInput <- normalizeData(dataInput = dataInput, 
                                 dataInputName = "doubleSigmoidalSample")
head(normalizedInput$timeIntensityData) # the normalized time and intensity data

##        time   intensity
## 1 0.1250000 0.000000000
## 2 0.1458333 0.007491139
## 3 0.1666667 0.013895288
## 4 0.1875000 0.024484372
## 5 0.2083333 0.079300988
## 6 0.2291667 0.140925160

The data scaling paratmers and the data input name are stored as well:

normalizedInput$dataScalingParameters # scaling parameters

##      timeRange   intensityMin   intensityMax intensityRange 
##     24.0000000      0.2342345      4.1346318      3.9003973

normalizedInput$dataInputName # data input name

## [1] "doubleSigmoidalSample"

Note that normalizeData() normalizes time with respect to the maximum value the time parameter takes:

timeRange <- time
timeNormalized <- time/timeRange # normalized time values

Whereas intensity is normalized with respect to the intensity interval:

intensityMin <- min(intensity)
intensityMax <- max(intensity)
intensityRange <- intensityMax - intensityMin

intensityNormalized <- (intensity-intensityMin)/intensityRange # normalized intensity values

Fitting the models to the data

To fit a model to the data using the function multipleFitFunction(), we provide it as input the normalized data and the model type to be fitted, which can be "sigmoidal" or "doublesigmoidal". Here we are fitting both models to the same input data:

# Do the sigmoidal fit
sigmoidalModel <- multipleFitFunction(dataInput=normalizedInput,
                                      model="sigmoidal")


# Do the double-sigmoidal fit
doubleSigmoidalModel <- multipleFitFunction(dataInput=normalizedInput,
                                            model="doublesigmoidal")

The two generated model objects contain a large number of computed parameters, described in detail in the following.

t(sigmoidalModel)

##                                      [,1]                   
## maximum_N_Estimate                   "0.5817715"            
## maximum_Std_Error                    "0.0471064"            
## maximum_t_value                      "12.35016"             
## maximum_Pr_t                         "3.144059e-15"         
## slopeParam_N_Estimate                "56.69416"             
## slopeParam_Std_Error                 "62.9224"              
## slopeParam_t_value                   "0.9010171"            
## slopeParam_Pr_t                      "0.3729724"            
## midPoint_N_Estimate                  "0.254053"             
## midPoint_Std_Error                   "0.02232296"           
## midPoint_t_value                     "11.38079"             
## midPoint_Pr_t                        "4.075252e-14"         
## residual_Sum_of_Squares              "2.947177"             
## log_likelihood                       "-3.38678"             
## AIC_value                            "14.77356"             
## BIC_value                            "21.81836"             
## isThisaFit                           "TRUE"                 
## startVector.maximum                  "0.4293649"            
## startVector.slopeParam               "17.12084"             
## startVector.midPoint                 "-0.2478345"           
## dataScalingParameters.timeRange      "24"                   
## dataScalingParameters.intensityMin   "0.2342345"            
## dataScalingParameters.intensityMax   "4.134632"             
## dataScalingParameters.intensityRange "3.900397"             
## model                                "sigmoidal"            
## additionalParameters                 "FALSE"                
## maximum_Estimate                     "2.503374"             
## slopeParam_Estimate                  "2.362257"             
## midPoint_Estimate                    "6.097271"             
## dataInputName                        "doubleSigmoidalSample"
## betterFit                            "4"                    
## correctFit                           "20"                   
## totalFit                             "28"

t(doubleSigmoidalModel)

##                                          [,1]                   
## finalAsymptoteIntensityRatio_N_Estimate  "0.2783729"            
## finalAsymptoteIntensityRatio_Std_Error   "0.005673176"          
## finalAsymptoteIntensityRatio_t_value     "49.06827"             
## finalAsymptoteIntensityRatio_Pr_t        "2.80299e-35"          
## maximum_N_Estimate                       "0.9890303"            
## maximum_Std_Error                        "0.006744761"          
## maximum_t_value                          "146.6368"             
## maximum_Pr_t                             "9.166125e-53"         
## slope1Param_N_Estimate                   "26.72541"             
## slope1Param_Std_Error                    "0.9129164"            
## slope1Param_t_value                      "29.27476"             
## slope1Param_Pr_t                         "3.431544e-27"         
## midPoint1Param_N_Estimate                "0.2942993"            
## midPoint1Param_Std_Error                 "0.001583065"          
## midPoint1Param_t_value                   "185.9048"             
## midPoint1Param_Pr_t                      "1.427291e-56"         
## slope2Param_N_Estimate                   "25.36587"             
## slope2Param_Std_Error                    "1.253865"             
## slope2Param_t_value                      "20.23014"             
## slope2Param_Pr_t                         "1.33991e-21"          
## midPointDistanceParam_N_Estimate         "0.334362"             
## midPointDistanceParam_Std_Error          "0.003221369"          
## midPointDistanceParam_t_value            "103.795"              
## midPointDistanceParam_Pr_t               "3.171517e-47"         
## residual_Sum_of_Squares                  "0.01073605"           
## log_likelihood                           "117.3356"             
## AIC_value                                "-220.6713"            
## BIC_value                                "-208.3429"            
## isThisaFit                               "TRUE"                 
## startVector.finalAsymptoteIntensityRatio "0.7374777"            
## startVector.maximum                      "0.6159543"            
## startVector.slope1Param                  "86.40114"             
## startVector.midPoint1Param               "0.5335278"            
## startVector.slope2Param                  "179.7345"             
## startVector.midPointDistanceParam        "0.212152"             
## dataScalingParameters.timeRange          "24"                   
## dataScalingParameters.intensityMin       "0.2342345"            
## dataScalingParameters.intensityMax       "4.134632"             
## dataScalingParameters.intensityRange     "3.900397"             
## model                                    "doublesigmoidal"      
## additionalParameters                     "FALSE"                
## finalAsymptoteIntensityRatio_Estimate    "0.2783729"            
## maximum_Estimate                         "4.091846"             
## slope1Param_Estimate                     "1.113559"             
## midPoint1Param_Estimate                  "7.063182"             
## slope2Param_Estimate                     "1.056911"             
## midPointDistanceParam_Estimate           "8.024689"             
## dataInputName                            "doubleSigmoidalSample"
## betterFit                                "5"                    
## correctFit                               "20"                   
## totalFit                                 "37"

The calculated quantities can be grouped into several different groups of parameters:

1. Information about the fitting process

model: String indicating the type of the model, "sigmoidal" for the sigmoidal model and "doublesigmoidal" for the double-sigmoidal model.
isThisaFit: A boolean that equals to TRUE if at least one fit was successful and FALSE otherwise.
betterFit: The number of times that the minimum AIC score improved with a subsequent fitting attempt. In other words, this counts the number of times the multiple fit attempts increased fit quality.
correctFit: The total number of successfull fits.
totalFit: The total number of fit attempts.

2. Estimates of the fitted parameters

These estimates have been converted from the normalized data to the original raw data, and are the main quantities of interest to the user. They depend on the type of the model, sigmoidal vs. double-sigmoidal.

Estimates for the sigmoidal model are:

maximum_Estimate: Maximum intensity estimate for the raw data.
slopeParam_Estimate: Slope parameter estimate for the raw data. Note that the slope parameter is related to but not equal to the slope. See the vignette on additional parameters for details.
midPoint_Estimate: Mid-point estimate (time the intensity reaches 1/2 of maximum) for the raw data.

Estimates for the double-sigmoidal model are:

maximum_Estimate: Maximum intensity estimate for the raw data.
slope1Param_Estimate: Slope 1 parameter estimate for the raw data. Note that the slope 1 parameter is related to but not equal to the slope. See the vignette on additional parameters for details.
midPoint1Param_Estimate: Mid-point 1 estimate (time the intensity reaches 1/2 of maximum) for the raw data. Needs numerical correction. See the vignette on additional parameters for details.
slope2Param_Estimate: Slope 2 parameter estimate for the raw data. Note that the slope 2 parameter is related to but not equal to the slope. See the vignette on additional parameters for details.
midPointDistanceParam_Estimate: Distance between mid-point 1 and mid-point 2, where mid-point 2 is the time at which intensity decreases to the mean between the final asymptote intensity and the maximum value. Needs numerical correction. See the vignette on additional parameters for details.
finalAsymptoteIntensityRatio_Estimate: This is the ratio between asymptote intensity and maximum intensity of the fitted curve.

3. Quantities measuring the overall quality of fit

residual_Sum_of_Squares: Residual sum of squares, smaller values indicate a better fit.
log_likelihood: Negative log likelihood, larger values indicate a better fit.
AIC_value: Akaike Information Criterion, smaller values indicate a better fit.
BIC_value: Bayesian Information Criterion, smaller values indicate a better fit.

4. Start point of the gradient descent algorithm

Each time a fit is attempted, the likelihood maximization algorithm starts from a random initiation point and finds the final parameter estimates by gradient descent. The start vector for the best fit is returned in the form of variables whose name starts with startVector., followed by the name of the estimated parameter. For example:

startVector.maximum: Value of the maximum parameter at the initiation point.

5. Parameters related to the normalization step

dataScalingParameters.timeRange: Maximum of raw time data.
dataScalingParameters.intensityMin: Minimum of raw intensity data.
dataScalingParameters.intensityMax: Maximum of raw intensity data.
dataScalingParameters.intensityRange: Maximum - Minimum of intensity data.

6. Error estimates for fitted parameters

For each estimated parameter listed under point 2, the algorithm provides additional statistical parameters, such as the estimate in the normalized scale, the standard error (also in normalized scale), the t value, and the P value. For example, for the maximum estimate, these are:

maximum_N_Estimate: Estimate in normalized scale.
maximum_Std_Error: Standard error, in normalized scale.
maximum_t_value: t value
maximum_Pr_t: P value

- Fitting the models to the data