In this article I compare Scilab with R for generating a histogram from data and fitting a curve to data using the least squares procedure.
Using Scilab
Consider an experiment where we’ve measured the time to failure for 50 identical electrical components.
Notice that only one variable has been measured — the components’ lifetimes. There is no notion of response and predictor variables; rather, each observation consists of just a single measurement. The objective of an analysis for data like these is not to predict the lifetime of a new component given a value of some other variable, but rather to describe the full distribution of possible lifetimes. This is distribution fitting with univariate data.
One simple way to visualize these data is to make a histogram.
Consider an experiment where we measure the concentration of a compound in blood samples taken from several subjects at various times after taking an experimental medication.
Notice that we have one response variable, blood concentration, and one predictor variable, time after ingestion. The predictor data are assumed to be measured with little or no error, while the response data are assumed to be affected by experimental error. The main objective in analyzing data like these is often to define a model that predicts the response variable. That is, we are trying to describe the trend line, or the mean response of y (blood concentration), as a function of x (time). This is curve fitting with bivariate data. The Scilab function we define will use the coefficients generated by the least squares procedure.
Using R
First we duplicate the histogram of the life data in R, and see that R renders the same result.
Next we take the blood concentration data and fit a curve of the same family to the data using R.
Comparison
From looking at the fitted curves, it appears that both programs render similar fits. If we look at the coefficients x(1) and x(2) from Scilab, we see that they are 0.1733551 and 1.969835, respectively. Comparing these with p1 and p2 from R, we have 0.17336 and 1.96986, respectively. Thus, the only difference is due to round-off error. If the number of digits are indicative of the actual decimal places used in calculations, I would take the Scilab result over R. However, the R least squares routine was much easier to implement.
Authored by:
Jeffrey Strickland, Ph.D.
Jeffrey Strickland, Ph.D., is the Author of “Predictive Analytics Using R” and a Senior Analytics Scientist with Clarity Solution Group. He has performed predictive modeling, simulation and analysis for the Department of Defense, NASA, the Missile Defense Agency, and the Financial and Insurance Industries for over 20 years. Jeff is a Certified Modeling and Simulation professional (CMSP) and an Associate Systems Engineering Professional. He has published nearly 200 blogs on LinkedIn, is also a frequently invited guest speaker and the author of 20 books including:
- Discrete Event simulation using ExtendSim
- Crime Analysis and Mapping
- Missile Flight Simulation
- Mathematical modeling of Warfare and Combat Phenomenon
- Predictive Modeling and Analytics
- Using Math to Defeat the Enemy
- Verification and Validation for Modeling and Simulation
- Simulation Conceptual Modeling
- System Engineering Process and Practices
- Weird Scientist: the Creators of Quantum Physics
- Albert Einstein: No one expected me to lay a golden eggs
- The Men of Manhattan: the Creators of the Nuclear Era
- Fundamentals of Combat Modeling
Connect with Jeffrey Strickland
Contact Jeffrey Strickland
Categories: Articles, Featured, Jeffrey Strickland
This question may be out of context of this article but I was curious as to how you decided on the non-linear equation of acos(b)x+bcos(a)x for defining the given distribution. Thank you
LikeLike
It was a guess based on what I believed the function looked like. I also tried several polynomials of different order.
LikeLike