--- title: "Fitting individual models" author: "Umut Caglar, Claus O. Wilke" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Fitting individual models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` Sometimes it can be useful to fit only specific models to a dataset rather than fit multiple models and run the decision algorithm. For this purpose, **sicegar** provides the function `multipleFitFunction()`. This function fits a chosen model to the input dataset. It calls the fitting algorithm multiple times with different, randomly generated start parameters, to guarantee robust and reliable fitting. ```{r install packages, echo=FALSE, warning=FALSE, results='hide',message=FALSE} ###***************************** seedNo=14159 set.seed(seedNo) ###***************************** ###***************************** require("sicegar") require("dplyr") require("ggplot2") ###***************************** ``` We will demonstrate the use of this function on double-sigmoidal data, generated by adding some noise to the double-sigmoidal curve used for fitting. The curve used for fitting is implemented as `doublesigmoidalFitFormula()`, and thus can be used to generate model data. ```{r generate data for double - sigmoidal} time <- seq(3, 24, 0.5) noise_parameter <- 0.2 intensity_noise <- runif(n = length(time), min = 0, max = 1) * noise_parameter intensity <- doublesigmoidalFitFormula(time, finalAsymptoteIntensityRatio = .3, maximum = 4, slope1Param = 1, midPoint1Param = 7, slope2Param = 1, midPointDistanceParam = 8) intensity <- intensity + intensity_noise dataInput <- data.frame(time, intensity) head(dataInput) # the generated input data ``` Before we can perform the fit, we need to normalize the data appropriately. All **sicegar** fit functions work on normalized data, where time and intensity are normalized to the interval from 0 to 1. Sicegar provides a convenient normalization function `normalizeData()` that normalizes data appropriately while storing the required information to transform fitted parameters back into non-normalized coordinates: ```{r normalize_data} normalizedInput <- normalizeData(dataInput = dataInput, dataInputName = "doubleSigmoidalSample") head(normalizedInput$timeIntensityData) # the normalized time and intensity data ``` The data scaling paratmers and the data input name are stored as well: ```{r normalized_data_output} normalizedInput$dataScalingParameters # scaling parameters normalizedInput$dataInputName # data input name ``` Note that `normalizeData()` normalizes time with respect to the maximum value the time parameter takes: ```{r time normalization, eval=FALSE} timeRange <- time timeNormalized <- time/timeRange # normalized time values ``` Whereas intensity is normalized with respect to the intensity interval: ```{r intensity normalization, eval=FALSE} intensityMin <- min(intensity) intensityMax <- max(intensity) intensityRange <- intensityMax - intensityMin intensityNormalized <- (intensity-intensityMin)/intensityRange # normalized intensity values ``` ## Fitting the models to the data To fit a model to the data using the function `multipleFitFunction()`, we provide it as input the normalized data and the model type to be fitted, which can be `"sigmoidal"` or `"doublesigmoidal"`. Here we are fitting both models to the same input data: ```{r} # Do the sigmoidal fit sigmoidalModel <- multipleFitFunction(dataInput=normalizedInput, model="sigmoidal") # Do the double-sigmoidal fit doubleSigmoidalModel <- multipleFitFunction(dataInput=normalizedInput, model="doublesigmoidal") ``` The two generated model objects contain a large number of computed parameters, described in detail in the following. ```{r parameter vectors} t(sigmoidalModel) t(doubleSigmoidalModel) ``` The calculated quantities can be grouped into several different groups of parameters: 1\. Information about the fitting process * `model`: String indicating the type of the model, `"sigmoidal"` for the sigmoidal model and `"doublesigmoidal"` for the double-sigmoidal model. * `isThisaFit`: A boolean that equals to `TRUE` if at least one fit was successful and `FALSE` otherwise. * `betterFit`: The number of times that the minimum AIC score improved with a subsequent fitting attempt. In other words, this counts the number of times the multiple fit attempts increased fit quality. * `correctFit`: The total number of successfull fits. * `totalFit`: The total number of fit attempts. 2\. Estimates of the fitted parameters These estimates have been converted from the normalized data to the original raw data, and are the main quantities of interest to the user. They depend on the type of the model, sigmoidal vs.\ double-sigmoidal. Estimates for the sigmoidal model are: * `maximum_Estimate`: Maximum intensity estimate for the raw data. * `slopeParam_Estimate`: _Slope parameter_ estimate for the raw data. Note that the slope parameter is related to but not equal to the slope. See the vignette on additional parameters for details. * `midPoint_Estimate`: Mid-point estimate (time the intensity reaches 1/2 of maximum) for the raw data. Estimates for the double-sigmoidal model are: * `maximum_Estimate`: Maximum intensity estimate for the raw data. * `slope1Param_Estimate`: _Slope 1 parameter_ estimate for the raw data. Note that the slope 1 parameter is related to but not equal to the slope. See the vignette on additional parameters for details. * `midPoint1Param_Estimate`: Mid-point 1 estimate (time the intensity reaches 1/2 of maximum) for the raw data. _Needs numerical correction._ See the vignette on additional parameters for details. * `slope2Param_Estimate`: _Slope 2 parameter_ estimate for the raw data. Note that the slope 2 parameter is related to but not equal to the slope. See the vignette on additional parameters for details. * `midPointDistanceParam_Estimate`: Distance between mid-point 1 and mid-point 2, where mid-point 2 is the time at which intensity decreases to the mean between the final asymptote intensity and the maximum value. _Needs numerical correction._ See the vignette on additional parameters for details. * `finalAsymptoteIntensityRatio_Estimate`: This is the _ratio_ between asymptote intensity and maximum intensity of the fitted curve. 3\. Quantities measuring the overall quality of fit * `residual_Sum_of_Squares`: Residual sum of squares, smaller values indicate a better fit. * `log_likelihood`: Negative log likelihood, larger values indicate a better fit. * `AIC_value`: Akaike Information Criterion, smaller values indicate a better fit. * `BIC_value`: Bayesian Information Criterion, smaller values indicate a better fit. 4\. Start point of the gradient descent algorithm Each time a fit is attempted, the likelihood maximization algorithm starts from a random initiation point and finds the final parameter estimates by gradient descent. The start vector for the best fit is returned in the form of variables whose name starts with `startVector.`, followed by the name of the estimated parameter. For example: * `startVector.maximum`: Value of the maximum parameter at the initiation point. 5\. Parameters related to the normalization step * `dataScalingParameters.timeRange`: Maximum of raw time data. * `dataScalingParameters.intensityMin`: Minimum of raw intensity data. * `dataScalingParameters.intensityMax`: Maximum of raw intensity data. * `dataScalingParameters.intensityRange`: Maximum - Minimum of intensity data. 6\. Error estimates for fitted parameters For each estimated parameter listed under point 2, the algorithm provides additional statistical parameters, such as the estimate in the normalized scale, the standard error (also in normalized scale), the t value, and the P value. For example, for the maximum estimate, these are: * `maximum_N_Estimate`: Estimate in normalized scale. * `maximum_Std_Error`: Standard error, in normalized scale. * `maximum_t_value`: t value * `maximum_Pr_t`: P value