Title: | Statistical Methods for Life Data Analysis |
---|---|
Description: | Provides statistical methods and visualizations that are often used in reliability engineering. Comprises a compact and easily accessible set of methods and visualization tools that make the examination and adjustment as well as the analysis and interpretation of field data (and bench tests) as simple as possible. Non-parametric estimators like Median Ranks, Kaplan-Meier (Abernethy, 2006, <ISBN:978-0-9653062-3-2>), Johnson (Johnson, 1964, <ISBN:978-0444403223>), and Nelson-Aalen for failure probability estimation within samples that contain failures as well as censored data are included. The package supports methods like Maximum Likelihood and Rank Regression, (Genschel and Meeker, 2010, <DOI:10.1080/08982112.2010.503447>) for the estimation of multiple parametric lifetime distributions, as well as the computation of confidence intervals of quantiles and probabilities using the delta method related to Fisher's confidence intervals (Meeker and Escobar, 1998, <ISBN:9780471673279>) and the beta-binomial confidence bounds. If desired, mixture model analysis can be done with segmented regression and the EM algorithm. Besides the well-known Weibull analysis, the package also contains Monte Carlo methods for the correction and completion of imprecisely recorded or unknown lifetime characteristics. (Verband der Automobilindustrie e.V. (VDA), 2016, <ISSN:0943-9412>). Plots are created statically ('ggplot2') or interactively ('plotly') and can be customized with functions of the respective visualization package. The graphical technique of probability plotting as well as the addition of regression lines and confidence bounds to existing plots are supported. |
Authors: | Tim-Gunnar Hensel [aut, cre], David Barkemeyer [aut] |
Maintainer: | Tim-Gunnar Hensel <[email protected]> |
License: | GPL-2 |
Version: | 2.1.0 |
Built: | 2024-11-07 05:13:40 UTC |
Source: | https://github.com/tim-tu/weibulltools |
Provides statistical methods and visualizations that are often used in reliability engineering. Comprises a compact and easily accessible set of methods and visualization tools that make the examination and adjustment as well as the analysis and interpretation of field data (and bench tests) as simple as possible.
Besides the well-known Weibull analysis, the package supports multiple lifetime distributions and also contains Monte Carlo methods for the correction and completion of imprecisely recorded or unknown lifetime characteristics.
Plots are created statically
(ggplot2
) or
interactively (plotly
) and can be
customized with functions of the respective visualization package.
A dataset containing the number of cycles of fatigue life for Alloy T7987 specimens.
alloy
alloy
A tibble with 72 rows and 2 variables:
Number of cycles (in thousands).
If specimen failed before 300 thousand cycles 1
else 0
.
Meeker, William Q; Escobar, Luis A., Statistical Methods for Reliability Data, New York: Wiley series in probability and statistics (1998, p.131)
This function computes the non-parametric beta binomial confidence bounds (BB) for quantiles and failure probabilities.
confint_betabinom(x, ...) ## S3 method for class 'wt_model' confint_betabinom( x, b_lives = c(0.01, 0.1, 0.5), bounds = c("two_sided", "lower", "upper"), conf_level = 0.95, direction = c("y", "x"), ... )
confint_betabinom(x, ...) ## S3 method for class 'wt_model' confint_betabinom( x, b_lives = c(0.01, 0.1, 0.5), bounds = c("two_sided", "lower", "upper"), conf_level = 0.95, direction = c("y", "x"), ... )
x |
A list with class |
... |
Further arguments passed to or from other methods. Currently not used. |
b_lives |
A numeric vector indicating the probabilities |
bounds |
A character string specifying the bound(s) to be computed. |
conf_level |
Confidence level of the interval. |
direction |
A character string specifying the direction of the confidence
interval. |
The procedure is similar to the Median Ranks method but with the difference that instead of finding the probability for the j-th rank at the 50% level the probability (probabilities) has (have) to be found at the given confidence level.
A tibble with class wt_confint
containing the following columns:
x
: An ordered sequence of the lifetime characteristic regarding the
failed units, starting at min(x)
and ending up at max(x)
. With
b_lives = c(0.01, 0.1, 0.5)
the 1%, 10% and 50% quantiles are additionally
included in x
, but only if the specified probabilities are in the range of
the estimated probabilities.
rank
: Interpolated ranks as a function of probabilities, computed with
the converted approximation formula of Benard.
prob
: An ordered sequence of probabilities with specified b_lives
included.
lower_bound
: Provided, if bounds
is one of "two_sided"
or "lower"
.
Lower confidence limits with respect to direction
, i.e. limits for
quantiles or probabilities.
upper_bound
: Provided, if bounds
is one of "two_sided"
or "upper"
.
Upper confidence limits with respect to direction
, i.e. limits for
quantiles or probabilities.
cdf_estimation_method
: Method for the estimation of failure probabilities which was specified in estimate_cdf.
Further information is stored in the attributes of this tibble:
distribution
: Distribution which was specified in rank_regression.
bounds
: Specified bound(s).
direction
: Specified direction.
model_estimation
: Input list with class wt_model
.
# Reliability data preparation: ## Data for two-parametric model: data_2p <- reliability_data( shock, x = distance, status = status ) ## Data for three-parametric model: data_3p <- reliability_data( alloy, x = cycles, status = status ) # Probability estimation: prob_tbl_2p <- estimate_cdf( data_2p, methods = "johnson" ) prob_tbl_3p <- estimate_cdf( data_3p, methods = "johnson" ) prob_tbl_mult <- estimate_cdf( data_3p, methods = c("johnson", "mr") ) # Model estimation with rank_regression(): rr_2p <- rank_regression( prob_tbl_2p, distribution = "weibull" ) rr_3p <- rank_regression( prob_tbl_3p, distribution = "lognormal3", conf_level = 0.90 ) rr_lists <- rank_regression( prob_tbl_mult, distribution = "loglogistic3", conf_level = 0.90 ) # Example 1 - Two-sided 95% confidence interval for probabilities ('y'): conf_betabin_1 <- confint_betabinom( x = rr_2p, bounds = "two_sided", conf_level = 0.95, direction = "y" ) # Example 2 - One-sided lower/upper 90% confidence interval for quantiles ('x'): conf_betabin_2_1 <- confint_betabinom( x = rr_2p, bounds = "lower", conf_level = 0.90, direction = "x" ) conf_betabin_2_2 <- confint_betabinom( x = rr_2p, bounds = "upper", conf_level = 0.90, direction = "x" ) # Example 3 - Two-sided 90% confidence intervals for both directions using # a three-parametric model: conf_betabin_3_1 <- confint_betabinom( x = rr_3p, bounds = "two_sided", conf_level = 0.90, direction = "y" ) conf_betabin_3_2 <- confint_betabinom( x = rr_3p, bounds = "two_sided", conf_level = 0.90, direction = "x" ) # Example 4 - Confidence intervals if multiple methods in estimate_cdf, i.e. # "johnson" and "mr", were specified: conf_betabin_4 <- confint_betabinom( x = rr_lists, bounds = "two_sided", conf_level = 0.99, direction = "y" )
# Reliability data preparation: ## Data for two-parametric model: data_2p <- reliability_data( shock, x = distance, status = status ) ## Data for three-parametric model: data_3p <- reliability_data( alloy, x = cycles, status = status ) # Probability estimation: prob_tbl_2p <- estimate_cdf( data_2p, methods = "johnson" ) prob_tbl_3p <- estimate_cdf( data_3p, methods = "johnson" ) prob_tbl_mult <- estimate_cdf( data_3p, methods = c("johnson", "mr") ) # Model estimation with rank_regression(): rr_2p <- rank_regression( prob_tbl_2p, distribution = "weibull" ) rr_3p <- rank_regression( prob_tbl_3p, distribution = "lognormal3", conf_level = 0.90 ) rr_lists <- rank_regression( prob_tbl_mult, distribution = "loglogistic3", conf_level = 0.90 ) # Example 1 - Two-sided 95% confidence interval for probabilities ('y'): conf_betabin_1 <- confint_betabinom( x = rr_2p, bounds = "two_sided", conf_level = 0.95, direction = "y" ) # Example 2 - One-sided lower/upper 90% confidence interval for quantiles ('x'): conf_betabin_2_1 <- confint_betabinom( x = rr_2p, bounds = "lower", conf_level = 0.90, direction = "x" ) conf_betabin_2_2 <- confint_betabinom( x = rr_2p, bounds = "upper", conf_level = 0.90, direction = "x" ) # Example 3 - Two-sided 90% confidence intervals for both directions using # a three-parametric model: conf_betabin_3_1 <- confint_betabinom( x = rr_3p, bounds = "two_sided", conf_level = 0.90, direction = "y" ) conf_betabin_3_2 <- confint_betabinom( x = rr_3p, bounds = "two_sided", conf_level = 0.90, direction = "x" ) # Example 4 - Confidence intervals if multiple methods in estimate_cdf, i.e. # "johnson" and "mr", were specified: conf_betabin_4 <- confint_betabinom( x = rr_lists, bounds = "two_sided", conf_level = 0.99, direction = "y" )
This function computes the non-parametric beta binomial confidence bounds (BB) for quantiles and failure probabilities.
## Default S3 method: confint_betabinom( x, status, dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), b_lives = c(0.01, 0.1, 0.5), bounds = c("two_sided", "lower", "upper"), conf_level = 0.95, direction = c("y", "x"), ... )
## Default S3 method: confint_betabinom( x, status, dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), b_lives = c(0.01, 0.1, 0.5), bounds = c("two_sided", "lower", "upper"), conf_level = 0.95, direction = c("y", "x"), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
dist_params |
The parameters ( |
distribution |
Supposed distribution of the random variable. Has to be in line with the specification made in rank_regression. |
b_lives |
A numeric vector indicating the probabilities |
bounds |
A character string specifying the bound(s) to be computed. |
conf_level |
Confidence level of the interval. |
direction |
A character string specifying the direction of the confidence
interval. |
... |
Further arguments passed to or from other methods. Currently not used. |
The procedure is similar to the Median Ranks method but with the difference that instead of finding the probability for the j-th rank at the 50% level the probability (probabilities) has (have) to be found at the given confidence level.
A tibble with class wt_confint
containing the following columns:
x
: An ordered sequence of the lifetime characteristic regarding the
failed units, starting at min(x)
and ending up at max(x)
. With
b_lives = c(0.01, 0.1, 0.5)
the 1%, 10% and 50% quantiles are additionally
included in x
, but only if the specified probabilities are in the range of
the estimated probabilities.
rank
: Interpolated ranks as a function of probabilities, computed with
the converted approximation formula of Benard.
prob
: An ordered sequence of probabilities with specified b_lives
included.
lower_bound
: Provided, if bounds
is one of "two_sided"
or "lower"
.
Lower confidence limits with respect to direction
, i.e. limits for
quantiles or probabilities.
upper_bound
: Provided, if bounds
is one of "two_sided"
or "upper"
.
Upper confidence limits with respect to direction
, i.e. limits for
quantiles or probabilities.
cdf_estimation_method
: A character that is always NA_character
. Only needed for internal use.
Further information is stored in the attributes of this tibble:
distribution
: Distribution which was specified in rank_regression.
bounds
: Specified bound(s).
direction
: Specified direction.
# Vectors: obs <- seq(10000, 100000, 10000) status_1 <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) cycles <- alloy$cycles status_2 <- alloy$status # Probability estimation: prob_tbl <- estimate_cdf( x = obs, status = status_1, method = "johnson" ) prob_tbl_2 <- estimate_cdf( x = cycles, status = status_2, method = "johnson" ) # Model estimation with rank_regression(): rr <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "weibull", conf_level = 0.9 ) rr_2 <- rank_regression( x = prob_tbl_2$x, y = prob_tbl_2$prob, status = prob_tbl_2$status, distribution = "lognormal3" ) # Example 1 - Two-sided 95% confidence interval for probabilities ('y'): conf_betabin_1 <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr$coefficients, distribution = "weibull", bounds = "two_sided", conf_level = 0.95, direction = "y" ) # Example 2 - One-sided lower/upper 90% confidence interval for quantiles ('x'): conf_betabin_2_1 <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr$coefficients, distribution = "weibull", bounds = "lower", conf_level = 0.9, direction = "x" ) conf_betabin_2_2 <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr$coefficients, distribution = "weibull", bounds = "upper", conf_level = 0.9, direction = "x" ) # Example 3 - Two-sided 90% confidence intervals for both directions using # a three-parametric model: conf_betabin_3_1 <- confint_betabinom( x = prob_tbl_2$x, status = prob_tbl_2$status, dist_params = rr_2$coefficients, distribution = "lognormal3", bounds = "two_sided", conf_level = 0.9, direction = "y" ) conf_betabin_3_2 <- confint_betabinom( x = prob_tbl_2$x, status = prob_tbl_2$status, dist_params = rr_2$coefficients, distribution = "lognormal3", bounds = "two_sided", conf_level = 0.9, direction = "x" )
# Vectors: obs <- seq(10000, 100000, 10000) status_1 <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) cycles <- alloy$cycles status_2 <- alloy$status # Probability estimation: prob_tbl <- estimate_cdf( x = obs, status = status_1, method = "johnson" ) prob_tbl_2 <- estimate_cdf( x = cycles, status = status_2, method = "johnson" ) # Model estimation with rank_regression(): rr <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "weibull", conf_level = 0.9 ) rr_2 <- rank_regression( x = prob_tbl_2$x, y = prob_tbl_2$prob, status = prob_tbl_2$status, distribution = "lognormal3" ) # Example 1 - Two-sided 95% confidence interval for probabilities ('y'): conf_betabin_1 <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr$coefficients, distribution = "weibull", bounds = "two_sided", conf_level = 0.95, direction = "y" ) # Example 2 - One-sided lower/upper 90% confidence interval for quantiles ('x'): conf_betabin_2_1 <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr$coefficients, distribution = "weibull", bounds = "lower", conf_level = 0.9, direction = "x" ) conf_betabin_2_2 <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr$coefficients, distribution = "weibull", bounds = "upper", conf_level = 0.9, direction = "x" ) # Example 3 - Two-sided 90% confidence intervals for both directions using # a three-parametric model: conf_betabin_3_1 <- confint_betabinom( x = prob_tbl_2$x, status = prob_tbl_2$status, dist_params = rr_2$coefficients, distribution = "lognormal3", bounds = "two_sided", conf_level = 0.9, direction = "y" ) conf_betabin_3_2 <- confint_betabinom( x = prob_tbl_2$x, status = prob_tbl_2$status, dist_params = rr_2$coefficients, distribution = "lognormal3", bounds = "two_sided", conf_level = 0.9, direction = "x" )
This function computes normal-approximation confidence intervals for quantiles and failure probabilities.
confint_fisher(x, ...) ## S3 method for class 'wt_model' confint_fisher( x, b_lives = c(0.01, 0.1, 0.5), bounds = c("two_sided", "lower", "upper"), conf_level = 0.95, direction = c("y", "x"), ... )
confint_fisher(x, ...) ## S3 method for class 'wt_model' confint_fisher( x, b_lives = c(0.01, 0.1, 0.5), bounds = c("two_sided", "lower", "upper"), conf_level = 0.95, direction = c("y", "x"), ... )
x |
A list with classes |
... |
Further arguments passed to or from other methods. Currently not used. |
b_lives |
A numeric vector indicating the probabilities |
bounds |
A character string specifying the bound(s) to be computed. |
conf_level |
Confidence level of the interval. |
direction |
A character string specifying the direction of the confidence
interval. |
The basis for the calculation of these confidence bounds are the standard errors obtained by the delta method.
The bounds on the probability are determined by the z-procedure. See 'References' for more information on this approach.
A tibble with class wt_confint
containing the following columns:
x
: An ordered sequence of the lifetime characteristic regarding the
failed units, starting at min(x)
and ending up at max(x)
. With
b_lives = c(0.01, 0.1, 0.5)
the 1%, 10% and 50% quantiles are additionally
included in x
, but only if the specified probabilities are in the range of
the estimated probabilities.
prob
: An ordered sequence of probabilities with specified b_lives
included.
std_err
: Estimated standard errors with respect to direction
.
lower_bound
: Provided, if bounds
is one of "two_sided"
or "lower"
.
Lower confidence limits with respect to direction
, i.e. limits for
quantiles or probabilities.
upper_bound
: Provided, if bounds
is one of "two_sided"
or "upper"
.
Upper confidence limits with respect to direction
, i.e. limits for
quantiles or probabilities.
cdf_estimation_method
: A character that is always NA_character
. Only
needed for internal use.
Further information is stored in the attributes of this tibble:
distribution
: Distribution which was specified in ml_estimation.
bounds
: Specified bound(s).
direction
: Specified direction.
model_estimation
: Input list with classes wt_model
and wt_ml_estimation
.
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Reliability data preparation: ## Data for two-parametric model: data_2p <- reliability_data( shock, x = distance, status = status ) ## Data for three-parametric model: data_3p <- reliability_data( alloy, x = cycles, status = status ) # Model estimation with ml_estimation(): ml_2p <- ml_estimation( data_2p, distribution = "weibull" ) ml_3p <- ml_estimation( data_3p, distribution = "lognormal3", conf_level = 0.90 ) # Example 1 - Two-sided 95% confidence interval for probabilities ('y'): conf_fisher_1 <- confint_fisher( x = ml_2p, bounds = "two_sided", conf_level = 0.95, direction = "y" ) # Example 2 - One-sided lower/upper 90% confidence interval for quantiles ('x'): conf_fisher_2_1 <- confint_fisher( x = ml_2p, bounds = "lower", conf_level = 0.90, direction = "x" ) conf_fisher_2_2 <- confint_fisher( x = ml_2p, bounds = "upper", conf_level = 0.90, direction = "x" ) # Example 3 - Two-sided 90% confidence intervals for both directions using # a three-parametric model: conf_fisher_3_1 <- confint_fisher( x = ml_3p, bounds = "two_sided", conf_level = 0.90, direction = "y" ) conf_fisher_3_2 <- confint_fisher( x = ml_3p, bounds = "two_sided", conf_level = 0.90, direction = "x" )
# Reliability data preparation: ## Data for two-parametric model: data_2p <- reliability_data( shock, x = distance, status = status ) ## Data for three-parametric model: data_3p <- reliability_data( alloy, x = cycles, status = status ) # Model estimation with ml_estimation(): ml_2p <- ml_estimation( data_2p, distribution = "weibull" ) ml_3p <- ml_estimation( data_3p, distribution = "lognormal3", conf_level = 0.90 ) # Example 1 - Two-sided 95% confidence interval for probabilities ('y'): conf_fisher_1 <- confint_fisher( x = ml_2p, bounds = "two_sided", conf_level = 0.95, direction = "y" ) # Example 2 - One-sided lower/upper 90% confidence interval for quantiles ('x'): conf_fisher_2_1 <- confint_fisher( x = ml_2p, bounds = "lower", conf_level = 0.90, direction = "x" ) conf_fisher_2_2 <- confint_fisher( x = ml_2p, bounds = "upper", conf_level = 0.90, direction = "x" ) # Example 3 - Two-sided 90% confidence intervals for both directions using # a three-parametric model: conf_fisher_3_1 <- confint_fisher( x = ml_3p, bounds = "two_sided", conf_level = 0.90, direction = "y" ) conf_fisher_3_2 <- confint_fisher( x = ml_3p, bounds = "two_sided", conf_level = 0.90, direction = "x" )
This function computes normal-approximation confidence intervals for quantiles and failure probabilities.
## Default S3 method: confint_fisher( x, status, dist_params, dist_varcov, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), b_lives = c(0.01, 0.1, 0.5), bounds = c("two_sided", "lower", "upper"), conf_level = 0.95, direction = c("y", "x"), ... )
## Default S3 method: confint_fisher( x, status, dist_params, dist_varcov, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), b_lives = c(0.01, 0.1, 0.5), bounds = c("two_sided", "lower", "upper"), conf_level = 0.95, direction = c("y", "x"), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
dist_params |
The parameters ( |
dist_varcov |
The variance-covariance-matrix ( |
distribution |
Supposed distribution of the random variable. Has to be in line with the specification made in ml_estimation. |
b_lives |
A numeric vector indicating the probabilities |
bounds |
A character string specifying the bound(s) to be computed. |
conf_level |
Confidence level of the interval. |
direction |
A character string specifying the direction of the confidence
interval. |
... |
Further arguments passed to or from other methods. Currently not used. |
The basis for the calculation of these confidence bounds are the standard errors obtained by the delta method.
The bounds on the probability are determined by the z-procedure. See 'References' for more information on this approach.
A tibble with class wt_confint
containing the following columns:
x
: An ordered sequence of the lifetime characteristic regarding the
failed units, starting at min(x)
and ending up at max(x)
. With
b_lives = c(0.01, 0.1, 0.5)
the 1%, 10% and 50% quantiles are additionally
included in x
, but only if the specified probabilities are in the range of
the estimated probabilities.
prob
: An ordered sequence of probabilities with specified b_lives
included.
std_err
: Estimated standard errors with respect to direction
.
lower_bound
: Provided, if bounds
is one of "two_sided"
or "lower"
.
Lower confidence limits with respect to direction
, i.e. limits for
quantiles or probabilities.
upper_bound
: Provided, if bounds
is one of "two_sided"
or "upper"
.
Upper confidence limits with respect to direction
, i.e. limits for
quantiles or probabilities.
cdf_estimation_method
: A character that is always NA_character
. Only
needed for internal use.
Further information is stored in the attributes of this tibble:
distribution
: Distribution which was specified in ml_estimation.
bounds
: Specified bound(s).
direction
: Specified direction.
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Vectors: obs <- seq(10000, 100000, 10000) status_1 <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) cycles <- alloy$cycles status_2 <- alloy$status # Model estimation with ml_estimation(): ml <- ml_estimation( x = obs, status = status_1, distribution = "weibull", conf_level = 0.90 ) ml_2 <- ml_estimation( x = cycles, status = status_2, distribution = "lognormal3" ) # Example 1 - Two-sided 95% confidence interval for probabilities ('y'): conf_fisher_1 <- confint_fisher( x = obs, status = status_1, dist_params = ml$coefficients, dist_varcov = ml$varcov, distribution = "weibull", bounds = "two_sided", conf_level = 0.95, direction = "y" ) # Example 2 - One-sided lower/upper 90% confidence interval for quantiles ('x'): conf_fisher_2_1 <- confint_fisher( x = obs, status = status_1, dist_params = ml$coefficients, dist_varcov = ml$varcov, distribution = "weibull", bounds = "lower", conf_level = 0.90, direction = "x" ) conf_fisher_2_2 <- confint_fisher( x = obs, status = status_1, dist_params = ml$coefficients, dist_varcov = ml$varcov, distribution = "weibull", bounds = "upper", conf_level = 0.90, direction = "x" ) # Example 3 - Two-sided 90% confidence intervals for both directions using # a three-parametric model: conf_fisher_3_1 <- confint_fisher( x = cycles, status = status_2, dist_params = ml_2$coefficients, dist_varcov = ml_2$varcov, distribution = "lognormal3", bounds = "two_sided", conf_level = 0.90, direction = "y" ) conf_fisher_3_2 <- confint_fisher( x = cycles, status = status_2, dist_params = ml_2$coefficients, dist_varcov = ml_2$varcov, distribution = "lognormal3", bounds = "two_sided", conf_level = 0.90, direction = "x" )
# Vectors: obs <- seq(10000, 100000, 10000) status_1 <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) cycles <- alloy$cycles status_2 <- alloy$status # Model estimation with ml_estimation(): ml <- ml_estimation( x = obs, status = status_1, distribution = "weibull", conf_level = 0.90 ) ml_2 <- ml_estimation( x = cycles, status = status_2, distribution = "lognormal3" ) # Example 1 - Two-sided 95% confidence interval for probabilities ('y'): conf_fisher_1 <- confint_fisher( x = obs, status = status_1, dist_params = ml$coefficients, dist_varcov = ml$varcov, distribution = "weibull", bounds = "two_sided", conf_level = 0.95, direction = "y" ) # Example 2 - One-sided lower/upper 90% confidence interval for quantiles ('x'): conf_fisher_2_1 <- confint_fisher( x = obs, status = status_1, dist_params = ml$coefficients, dist_varcov = ml$varcov, distribution = "weibull", bounds = "lower", conf_level = 0.90, direction = "x" ) conf_fisher_2_2 <- confint_fisher( x = obs, status = status_1, dist_params = ml$coefficients, dist_varcov = ml$varcov, distribution = "weibull", bounds = "upper", conf_level = 0.90, direction = "x" ) # Example 3 - Two-sided 90% confidence intervals for both directions using # a three-parametric model: conf_fisher_3_1 <- confint_fisher( x = cycles, status = status_2, dist_params = ml_2$coefficients, dist_varcov = ml_2$varcov, distribution = "lognormal3", bounds = "two_sided", conf_level = 0.90, direction = "y" ) conf_fisher_3_2 <- confint_fisher( x = cycles, status = status_2, dist_params = ml_2$coefficients, dist_varcov = ml_2$varcov, distribution = "lognormal3", bounds = "two_sided", conf_level = 0.90, direction = "x" )
This function applies the delta method to a parametric lifetime distribution.
delta_method( x, dist_params, dist_varcov, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), direction = c("y", "x"), p = deprecated() )
delta_method( x, dist_params, dist_varcov, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), direction = c("y", "x"), p = deprecated() )
x |
A numeric vector of probabilities or quantiles. If the standard errors of quantiles should be determined the corresponding probabilities have to be specified, and if the standard errors of standardized quantiles (z-values) should be computed corresponding quantiles are required. |
dist_params |
The parameters ( |
dist_varcov |
The variance-covariance-matrix ( |
distribution |
Supposed distribution of the random variable. Has to be in line with the specification made in ml_estimation. |
direction |
A character string specifying for which quantity the standard
errors are calculated. |
p |
The delta method estimates the standard errors for quantities that can be written as non-linear functions of ML estimators. Hence, the parameters as well as the variance-covariance matrix of these quantities have to be estimated with maximum likelihood.
The estimated standard errors are used to calculate Fisher's (normal
approximation) confidence intervals. For confidence bounds on the probability,
standard errors of the standardized quantiles (direction = "y"
) have to be
computed (z-procedure) and for bounds on quantiles, standard errors of
quantiles (direction = "x"
) are required. For more information see
confint_fisher.
A numeric vector of estimated standard errors for quantiles or standardized quantiles (z-values).
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Reliability data preparation: data <- reliability_data( shock, x = distance, status = status ) # Parameter estimation using maximum likelihood: mle <- ml_estimation( data, distribution = "weibull", conf_level = 0.95 ) # Example 1 - Standard errors of standardized quantiles: delta_y <- delta_method( x = shock$distance, dist_params = mle$coefficients, dist_varcov = mle$varcov, distribution = "weibull", direction = "y" ) # Example 2 - Standard errors of quantiles: delta_x <- delta_method( x = seq(0.01, 0.99, 0.01), dist_params = mle$coefficients, dist_varcov = mle$varcov, distribution = "weibull", direction = "x" )
# Reliability data preparation: data <- reliability_data( shock, x = distance, status = status ) # Parameter estimation using maximum likelihood: mle <- ml_estimation( data, distribution = "weibull", conf_level = 0.95 ) # Example 1 - Standard errors of standardized quantiles: delta_y <- delta_method( x = shock$distance, dist_params = mle$coefficients, dist_varcov = mle$varcov, distribution = "weibull", direction = "y" ) # Example 2 - Standard errors of quantiles: delta_x <- delta_method( x = seq(0.01, 0.99, 0.01), dist_params = mle$coefficients, dist_varcov = mle$varcov, distribution = "weibull", direction = "x" )
This function models a delay (in days) random variable (e.g. in logistic, registration, report) using a supposed continuous distribution. First, the row-wise differences in days of the related date columns are calculated and then the parameter(s) of the assumed distribution is (are) estimated with maximum likelihood. See 'Details' for more information.
dist_delay(...) ## S3 method for class 'wt_mcs_delay_data' dist_delay(..., x, distribution = c("lognormal", "exponential"))
dist_delay(...) ## S3 method for class 'wt_mcs_delay_data' dist_delay(..., x, distribution = c("lognormal", "exponential"))
... |
Further arguments passed to or from other methods. Currently not used. |
x |
A |
distribution |
Supposed distribution of the respective delay. |
The distribution parameter(s) is (are) determined on the basis of complete
cases, i.e. there is no NA
(row-wise) in one of the related date columns.
Time differences less than or equal to zero are not considered as well.
A list with class wt_delay_estimation
which contains:
coefficients
: A named vector of estimated parameter(s).
delay
: A numeric vector of element-wise computed differences in days.
distribution
: Specified distribution.
If more than one delay was considered in mcs_delay_data, the resulting output
is a list with class wt_delay_estimation_list
. In this case each list element
has class wt_delay_estimation
and the items listed above, are included.
# MCS data preparation: ## Data for delay in registration: mcs_tbl_1 <- mcs_delay_data( field_data, date_1 = production_date, date_2 = registration_date, time = dis, status = status, id = vin ) ## Data for delay in report: mcs_tbl_2 <- mcs_delay_data( field_data, date_1 = repair_date, date_2 = report_date, time = dis, status = status, id = vin ) ## Data for both delays: mcs_tbl_both <- mcs_delay_data( field_data, date_1 = c(production_date, repair_date), date_2 = c(registration_date, report_date), time = dis, status = status, id = vin ) # Example 1 - Delay in registration: params_delay_regist <- dist_delay( x = mcs_tbl_1, distribution = "lognormal" ) # Example 2 - Delay in report: params_delay_report <- dist_delay( x = mcs_tbl_2, distribution = "exponential" ) # Example 3 - Delays in registration and report with same distribution: params_delays <- dist_delay( x = mcs_tbl_both, distribution = "lognormal" ) # Example 4 - Delays in registration and report with different distributions: params_delays_2 <- dist_delay( x = mcs_tbl_both, distribution = c("lognormal", "exponential") )
# MCS data preparation: ## Data for delay in registration: mcs_tbl_1 <- mcs_delay_data( field_data, date_1 = production_date, date_2 = registration_date, time = dis, status = status, id = vin ) ## Data for delay in report: mcs_tbl_2 <- mcs_delay_data( field_data, date_1 = repair_date, date_2 = report_date, time = dis, status = status, id = vin ) ## Data for both delays: mcs_tbl_both <- mcs_delay_data( field_data, date_1 = c(production_date, repair_date), date_2 = c(registration_date, report_date), time = dis, status = status, id = vin ) # Example 1 - Delay in registration: params_delay_regist <- dist_delay( x = mcs_tbl_1, distribution = "lognormal" ) # Example 2 - Delay in report: params_delay_report <- dist_delay( x = mcs_tbl_2, distribution = "exponential" ) # Example 3 - Delays in registration and report with same distribution: params_delays <- dist_delay( x = mcs_tbl_both, distribution = "lognormal" ) # Example 4 - Delays in registration and report with different distributions: params_delays_2 <- dist_delay( x = mcs_tbl_both, distribution = c("lognormal", "exponential") )
dist_delay_register()
is no longer under active development, switching to
dist_delay is recommended.
dist_delay_register(date_prod, date_register, distribution = "lognormal")
dist_delay_register(date_prod, date_register, distribution = "lognormal")
date_prod |
A vector of class |
date_register |
A vector of class |
distribution |
Supposed distribution of the random variable. Only
|
This function introduces a delay random variable by calculating the time difference between the registration and production date for the sample units and afterwards estimates the parameter(s) of a supposed distribution, using maximum likelihood.
A named vector of estimated parameters for the specified distribution.
date_of_production <- c("2014-07-28", "2014-02-17", "2014-07-14", "2014-06-26", "2014-03-10", "2014-05-14", "2014-05-06", "2014-03-07", "2014-03-09", "2014-04-13", "2014-05-20", "2014-07-07", "2014-01-27", "2014-01-30", "2014-03-17", "2014-02-09", "2014-04-14", "2014-04-20", "2014-03-13", "2014-02-23", "2014-04-03", "2014-01-08", "2014-01-08") date_of_registration <- c(NA, "2014-03-29", "2014-12-06", "2014-09-09", NA, NA, "2014-06-16", NA, "2014-05-23", "2014-05-09", "2014-05-31", NA, "2014-04-13", NA, NA, "2014-03-12", NA, "2014-06-02", NA, "2014-03-21", "2014-06-19", NA, NA) params_delay_regist <- dist_delay_register( date_prod = date_of_production, date_register = date_of_registration, distribution = "lognormal" )
date_of_production <- c("2014-07-28", "2014-02-17", "2014-07-14", "2014-06-26", "2014-03-10", "2014-05-14", "2014-05-06", "2014-03-07", "2014-03-09", "2014-04-13", "2014-05-20", "2014-07-07", "2014-01-27", "2014-01-30", "2014-03-17", "2014-02-09", "2014-04-14", "2014-04-20", "2014-03-13", "2014-02-23", "2014-04-03", "2014-01-08", "2014-01-08") date_of_registration <- c(NA, "2014-03-29", "2014-12-06", "2014-09-09", NA, NA, "2014-06-16", NA, "2014-05-23", "2014-05-09", "2014-05-31", NA, "2014-04-13", NA, NA, "2014-03-12", NA, "2014-06-02", NA, "2014-03-21", "2014-06-19", NA, NA) params_delay_regist <- dist_delay_register( date_prod = date_of_production, date_register = date_of_registration, distribution = "lognormal" )
dist_delay_report()
is no longer under active development, switching to
dist_delay is recommended.
dist_delay_report(date_repair, date_report, distribution = "lognormal")
dist_delay_report(date_repair, date_report, distribution = "lognormal")
date_repair |
a vector of class |
date_report |
a vector of class |
distribution |
Supposed distribution of the random variable. Only
|
This function introduces a delay random variable by calculating the time difference between the report and repair date for the sample units and afterwards estimates the parameter(s) of a supposed distribution, using maximum likelihood.
A named vector of estimated parameters for the specified distribution.
date_of_repair <- c(NA, "2014-09-15", "2015-07-04", "2015-04-10", NA, NA, "2015-04-24", NA, "2015-04-25", "2015-04-24", "2015-06-12", NA, "2015-05-04", NA, NA, "2015-05-22", NA, "2015-09-17", NA, "2015-08-15", "2015-11-26", NA, NA) date_of_report <- c(NA, "2014-10-09", "2015-08-28", "2015-04-15", NA, NA, "2015-05-16", NA, "2015-05-28", "2015-05-15", "2015-07-11", NA, "2015-08-14", NA, NA, "2015-06-05", NA, "2015-10-17", NA, "2015-08-21", "2015-12-02", NA, NA) params_delay_report <- dist_delay_report( date_repair = date_of_repair, date_report = date_of_report, distribution = "lognormal" )
date_of_repair <- c(NA, "2014-09-15", "2015-07-04", "2015-04-10", NA, NA, "2015-04-24", NA, "2015-04-25", "2015-04-24", "2015-06-12", NA, "2015-05-04", NA, NA, "2015-05-22", NA, "2015-09-17", NA, "2015-08-15", "2015-11-26", NA, NA) date_of_report <- c(NA, "2014-10-09", "2015-08-28", "2015-04-15", NA, NA, "2015-05-16", NA, "2015-05-28", "2015-05-15", "2015-07-11", NA, "2015-08-14", NA, NA, "2015-06-05", NA, "2015-10-17", NA, "2015-08-21", "2015-12-02", NA, NA) params_delay_report <- dist_delay_report( date_repair = date_of_repair, date_report = date_of_report, distribution = "lognormal" )
This function models a delay (in days) random variable (e.g. in logistic,
registration, report) using a supposed continuous distribution. First, the
element-wise differences in days of both vectors date_1
and date_2
are
calculated and then the parameter(s) of the assumed
distribution is (are) estimated with maximum likelihood. See 'Details' for
more information.
## Default S3 method: dist_delay(..., date_1, date_2, distribution = c("lognormal", "exponential"))
## Default S3 method: dist_delay(..., date_1, date_2, distribution = c("lognormal", "exponential"))
... |
Further arguments passed to or from other methods. Currently not used. |
date_1 |
A vector of class If more than one delay is to be considered, use a list where the first element is the earlier date of the first delay, the second element is the earlier date of the second delay, and so forth (see 'Examples'). |
date_2 |
A vector of class |
distribution |
Supposed distribution of the respective delay. |
The distribution parameter(s) is (are) determined on the basis of complete
cases, i.e. there is no NA
in one of the related vector elements
c(date_1[i], date_2[i])
. Time differences less than or equal to zero are
not considered as well.
A list with class wt_delay_estimation
which contains:
coefficients
: A named vector of estimated parameter(s).
delay
: A numeric vector of element-wise computed differences in days.
distribution
: Specified distribution.
If more than one delay was considered, the resulting output is a list with class
wt_delay_estimation_list
. In this case each list element has class
wt_delay_estimation
and the items listed above, are included.
# Example 1 - Delay in registration: params_delay_regist <- dist_delay( date_1 = field_data$production_date, date_2 = field_data$registration_date, distribution = "lognormal" ) # Example 2 - Delay in report: params_delay_report <- dist_delay( date_1 = field_data$repair_date, date_2 = field_data$report_date, distribution = "exponential" ) # Example 3 - Delays in registration and report with same distribution: params_delays <- dist_delay( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), distribution = "lognormal" ) # Example 4 - Delays in registration and report with different distributions: params_delays_2 <- dist_delay( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), distribution = c("lognormal", "exponential") )
# Example 1 - Delay in registration: params_delay_regist <- dist_delay( date_1 = field_data$production_date, date_2 = field_data$registration_date, distribution = "lognormal" ) # Example 2 - Delay in report: params_delay_report <- dist_delay( date_1 = field_data$repair_date, date_2 = field_data$report_date, distribution = "exponential" ) # Example 3 - Delays in registration and report with same distribution: params_delays <- dist_delay( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), distribution = "lognormal" ) # Example 4 - Delays in registration and report with different distributions: params_delays_2 <- dist_delay( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), distribution = c("lognormal", "exponential") )
This function models a mileage random variable on an annual basis with respect to a supposed continuous distribution. First, the distances are calculated for one year (365 days) using a linear relationship between the distance and operating time. Second, the parameter(s) of the assumed distribution are estimated with maximum likelihood. See 'Details' for more information.
dist_mileage(x, ...) ## S3 method for class 'wt_mcs_mileage_data' dist_mileage(x, distribution = c("lognormal", "exponential"), ...)
dist_mileage(x, ...) ## S3 method for class 'wt_mcs_mileage_data' dist_mileage(x, distribution = c("lognormal", "exponential"), ...)
x |
A |
... |
Further arguments passed to or from other methods. Currently not used. |
distribution |
Supposed distribution of the annual mileage. |
The distribution parameter(s) is (are) determined on the basis of complete
cases, i.e. there is no NA
(row-wise) in one of the related columns mileage
and time
. Distances and operating times less than or equal to zero are not
considered as well.
Assumption of linear relationship: Imagine a component in a vehicle has endured a distance of 25000 kilometers (km) in 500 days (d), the annual distance of this unit is
A list with class wt_mileage_estimation
which contains:
coefficients
: A named vector of estimated parameter(s).
miles_annual
: A numeric vector of element-wise computed annual distances
using the linear relationship described in 'Details'.
distribution
: Specified distribution.
# MCS data preparation: mcs_tbl <- mcs_mileage_data( field_data, mileage = mileage, time = dis, status = status, id = vin ) # Example 1 - Assuming lognormal annual mileage distribution: params_mileage_annual <- dist_mileage( x = mcs_tbl, distribution = "lognormal" ) # Example 2 - Assuming exponential annual mileage distribution: params_mileage_annual_2 <- dist_mileage( x = mcs_tbl, distribution = "exponential" )
# MCS data preparation: mcs_tbl <- mcs_mileage_data( field_data, mileage = mileage, time = dis, status = status, id = vin ) # Example 1 - Assuming lognormal annual mileage distribution: params_mileage_annual <- dist_mileage( x = mcs_tbl, distribution = "lognormal" ) # Example 2 - Assuming exponential annual mileage distribution: params_mileage_annual_2 <- dist_mileage( x = mcs_tbl, distribution = "exponential" )
This function models a mileage random variable on an annual basis with respect to a supposed continuous distribution. First, the distances are calculated for one year (365 days) using a linear relationship between the distance and operating time. Second, the parameter(s) of the assumed distribution are estimated with maximum likelihood. See 'Details' for more information.
## Default S3 method: dist_mileage(x, time, distribution = c("lognormal", "exponential"), ...)
## Default S3 method: dist_mileage(x, time, distribution = c("lognormal", "exponential"), ...)
x |
A numeric vector of distances covered. Use |
time |
A numeric vector of operating times. Use |
distribution |
Supposed distribution of the annual mileage. |
... |
Further arguments passed to or from other methods. Currently not used. |
The distribution parameter(s) is (are) determined on the basis of complete cases,
i.e. there is no NA
in one of the related vector elements
c(mileage[i], time[i])
. Distances and operating times less than or equal
to zero are not considered as well.
Assumption of linear relationship: Imagine a component in a vehicle has endured a distance of 25000 kilometers (km) in 500 days (d), the annual distance of this unit is
A list with class wt_mileage_estimation
which contains:
coefficients
: A named vector of estimated parameter(s).
miles_annual
: A numeric vector of element-wise computed annual distances
using the linear relationship described in 'Details'.
distribution
: Specified distribution.
# Example 1 - Assuming lognormal annual mileage distribution: params_mileage_annual <- dist_mileage( x = field_data$mileage, time = field_data$dis, distribution = "lognormal" ) # Example 2 - Assuming exponential annual mileage distribution: params_mileage_annual_2 <- dist_mileage( x = field_data$mileage, time = field_data$dis, distribution = "exponential" )
# Example 1 - Assuming lognormal annual mileage distribution: params_mileage_annual <- dist_mileage( x = field_data$mileage, time = field_data$dis, distribution = "lognormal" ) # Example 2 - Assuming exponential annual mileage distribution: params_mileage_annual_2 <- dist_mileage( x = field_data$mileage, time = field_data$dis, distribution = "exponential" )
This function applies a non-parametric method to estimate the failure probabilities of complete data taking (multiple) right-censored observations into account.
estimate_cdf(x, ...) ## S3 method for class 'wt_reliability_data' estimate_cdf( x, methods = c("mr", "johnson", "kaplan", "nelson"), options = list(), ... )
estimate_cdf(x, ...) ## S3 method for class 'wt_reliability_data' estimate_cdf( x, methods = c("mr", "johnson", "kaplan", "nelson"), options = list(), ... )
x |
A tibble with class |
... |
Further arguments passed to or from other methods. Currently not used. |
methods |
One or multiple methods of |
options |
A list of named options. See 'Options'. |
One or multiple techniques can be used for the methods
argument:
"mr"
: Method Median Ranks is used to estimate the failure probabilities
of failed units without considering censored items. Tied observations can be
handled in three ways (See 'Options'):
"max"
: Highest observed rank is assigned to tied observations.
"min"
: Lowest observed rank is assigned to tied observations.
"average"
: Mean rank is assigned to tied observations.
Two formulas can be used to determine cumulative failure probabilities F(t) (See 'Options'):
"benard"
: Benard's approximation for Median Ranks.
"invbeta"
: Exact Median Ranks using the inverse beta distribution.
"johnson"
: The Johnson method is used to estimate the failure
probabilities of failed units, taking censored units into account. Compared
to complete data, correction of probabilities is done by the computation of
adjusted ranks. Two formulas can be used to determine cumulative failure
probabilities F(t) (See 'Options'):
"benard"
: Benard's approximation for Median Ranks.
"invbeta"
: Exact Median Ranks using the inverse beta distribution.
"kaplan"
: The method of Kaplan and Meier is used to estimate the
survival function S(t) with respect to (multiple) right censored data.
The complement of S(t), i.e. F(t), is returned. In contrast to the
original Kaplan-Meier estimator, one modification is made (see 'References').
"nelson"
: The Nelson-Aalen estimator models the cumulative hazard rate
function in case of (multiple) right censored data. Equating the formal
definition of the hazard rate with that according to Nelson-Aalen results
in a formula for the calculation of failure probabilities.
A tibble with class wt_cdf_estimation
containing the following columns:
id
: Identification for every unit.
x
: Lifetime characteristic.
status
: Binary data (0 or 1) indicating whether a unit is a right
censored observation (= 0) or a failure (= 1).
rank
: The (computed) ranks. Determined for methods "mr"
and "johnson"
,
filled with NA
for other methods or if status = 0
.
prob
: Estimated failure probabilities, NA
if status = 0
.
cdf_estimation_method
: Specified method for the estimation of failure
probabilities.
Argument options
is a named list of options:
Method | Name | Value |
mr |
mr_method |
"benard" (default) or "invbeta" |
mr |
mr_ties.method |
"max" (default), "min" or "average" |
johnson |
johnson_method |
"benard" (default) or "invbeta" |
NIST/SEMATECH e-Handbook of Statistical Methods, 8.2.1.5. Empirical model fitting - distribution free (Kaplan-Meier) approach, NIST SEMATECH, December 3, 2020
# Reliability data: data <- reliability_data( alloy, x = cycles, status = status ) # Example 1 - Johnson method: prob_tbl <- estimate_cdf( x = data, methods = "johnson" ) # Example 2 - Multiple methods: prob_tbl_2 <- estimate_cdf( x = data, methods = c("johnson", "kaplan", "nelson") ) # Example 3 - Method 'mr' with options: prob_tbl_3 <- estimate_cdf( x = data, methods = "mr", options = list( mr_method = "invbeta", mr_ties.method = "average" ) ) # Example 4 - Multiple methods and options: prob_tbl_4 <- estimate_cdf( x = data, methods = c("mr", "johnson"), options = list( mr_ties.method = "max", johnson_method = "invbeta" ) )
# Reliability data: data <- reliability_data( alloy, x = cycles, status = status ) # Example 1 - Johnson method: prob_tbl <- estimate_cdf( x = data, methods = "johnson" ) # Example 2 - Multiple methods: prob_tbl_2 <- estimate_cdf( x = data, methods = c("johnson", "kaplan", "nelson") ) # Example 3 - Method 'mr' with options: prob_tbl_3 <- estimate_cdf( x = data, methods = "mr", options = list( mr_method = "invbeta", mr_ties.method = "average" ) ) # Example 4 - Multiple methods and options: prob_tbl_4 <- estimate_cdf( x = data, methods = c("mr", "johnson"), options = list( mr_ties.method = "max", johnson_method = "invbeta" ) )
This function applies a non-parametric method to estimate the failure probabilities of complete data taking (multiple) right-censored observations into account.
## Default S3 method: estimate_cdf( x, status, id = NULL, method = c("mr", "johnson", "kaplan", "nelson"), options = list(), ... )
## Default S3 method: estimate_cdf( x, status, id = NULL, method = c("mr", "johnson", "kaplan", "nelson"), options = list(), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether unit i is a right censored observation (= 0) or a failure (= 1). |
id |
A vector for the identification of every unit. Default is |
method |
Method used for the estimation of failure probabilities. See 'Details'. |
options |
A list of named options. See 'Options'. |
... |
Further arguments passed to or from other methods. Currently not used. |
The following techniques can be used for the method
argument:
"mr"
: Method Median Ranks is used to estimate the failure probabilities
of failed units without considering censored items. Tied observations can be
handled in three ways (See 'Options'):
"max"
: Highest observed rank is assigned to tied observations.
"min"
: Lowest observed rank is assigned to tied observations.
"average"
: Mean rank is assigned to tied observations.
Two formulas can be used to determine cumulative failure probabilities F(t) (See 'Options'):
"benard"
: Benard's approximation for Median Ranks.
"invbeta"
: Exact Median Ranks using the inverse beta distribution.
"johnson"
: The Johnson method is used to estimate the failure
probabilities of failed units, taking censored units into account. Compared
to complete data, correction of probabilities is done by the computation of
adjusted ranks. Two formulas can be used to determine cumulative failure
probabilities F(t) (See 'Options'):
"benard"
: Benard's approximation for Median Ranks.
"invbeta"
: Exact Median Ranks using the inverse beta distribution.
"kaplan"
: The method of Kaplan and Meier is used to estimate the
survival function S(t) with respect to (multiple) right censored data.
The complement of S(t), i.e. F(t), is returned. In contrast to the
original Kaplan-Meier estimator, one modification is made (see 'References').
"nelson"
: The Nelson-Aalen estimator models the cumulative hazard rate
function in case of (multiple) right censored data. Equating the formal
definition of the hazard rate with that according to Nelson-Aalen results
in a formula for the calculation of failure probabilities.
A tibble with class wt_cdf_estimation
containing the following columns:
id
: Identification for every unit.
x
: Lifetime characteristic.
status
: Binary data (0 or 1) indicating whether a unit is a right
censored observation (= 0) or a failure (= 1).
rank
: The (computed) ranks. Determined for methods "mr"
and "johnson"
,
filled with NA
for other methods or if status = 0
.
prob
: Estimated failure probabilities, NA
if status = 0
.
cdf_estimation_method
: Specified method for the estimation of failure
probabilities.
Argument options
is a named list of options:
Method | Name | Value |
mr |
mr_method |
"benard" (default) or "invbeta" |
mr |
mr_ties.method |
"max" (default), "min" or "average" |
johnson |
johnson_method |
"benard" (default) or "invbeta" |
NIST/SEMATECH e-Handbook of Statistical Methods, 8.2.1.5. Empirical model fitting - distribution free (Kaplan-Meier) approach, NIST SEMATECH, December 3, 2020
# Vectors: cycles <- alloy$cycles status <- alloy$status # Example 1 - Johnson method: prob_tbl <- estimate_cdf( x = cycles, status = status, method = "johnson" ) # Example 2 - Method 'mr' with options: prob_tbl_2 <- estimate_cdf( x = cycles, status = status, method = "mr", options = list( mr_method = "invbeta", mr_ties.method = "average" ) )
# Vectors: cycles <- alloy$cycles status <- alloy$status # Example 1 - Johnson method: prob_tbl <- estimate_cdf( x = cycles, status = status, method = "johnson" ) # Example 2 - Method 'mr' with options: prob_tbl_2 <- estimate_cdf( x = cycles, status = status, method = "mr", options = list( mr_method = "invbeta", mr_ties.method = "average" ) )
An illustrative field dataset that contains a variety of variables commonly collected in the automotive sector.
The dataset has complete information about failed and incomplete information about intact vehicles. See 'Format' and 'Details' for further insights.
field_data
field_data
A tibble with 10,684 rows and 20 variables:
Vehicle identification number.
Days in service.
Distances covered, which are unknown for censored units.
1
for failed and 0
for censored units.
Date of production.
Date of registration. Known for all failed units and for a few intact units.
The date on which the failure was repaired. It is assumed that the repair date is equal to the date of failure occurrence.
The date on which lifetime information about the failure were available.
Delivering country.
The region within the country of delivery. Known for registered
vehicles, NA
for units with a missing registration_date
.
Climatic zone based on "Köppen-Geiger" climate classification.
Known for registered vehicles, NA
for units with a missing registration_date
.
Climatic subzone based on "Köppen-Geiger" climate classification.
Known for registered vehicles, NA
for units with a registration_date
.
Brand of the vehicle.
Model of the vehicle.
Type of the engine.
Date where the engine was installed.
Type of the gear.
Date where the gear was installed.
Transmission of the vehicle.
Vehicle fuel.
All vehicles were produced in 2014 and an analysis of the field data was made at the end of 2015. At the date of analysis, there were 684 failed and 10,000 intact vehicles.
Censored vehicles:
For censored units the service time (dis
) was computed as the difference
of the date of analysis "2015-12-31"
and the registration_date
.
For many units the latter date is unknown. For these, the difference of the
analysis date and production_date
was used to get a rough estimation of
the real service time. This uncertainty has to be considered in the subsequent
analysis (see delay in registration in the section 'Details' of
mcs_delay
).
Furthermore, due to the delay in report, the computed service time could also
be inaccurate. This uncertainty should be considered as well (see
delay in report in the section 'Details' of mcs_delay
).
The lifetime characteristic mileage
is unknown for all censored units.
If an analysis is to be made for this lifetime characteristic, covered distances
for these units have to be estimated (see mcs_mileage
).
Failed vehicles:
For failed units the service time (dis
) is computed as the difference
of repair_date
and registration_date
, which are known for all of them.
johnson_method()
is no longer under active development, switching to
estimate_cdf is recommended.
johnson_method(x, status, id = NULL, method = c("benard", "invbeta"))
johnson_method(x, status, id = NULL, method = c("benard", "invbeta"))
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
id |
A vector for the identification of every unit. Default is |
method |
Method for the estimation of the cdf. Can be |
This non-parametric approach is used to estimate the failure probabilities in terms of uncensored or (multiple) right censored data. Compared to complete data the correction is done by calculating adjusted ranks which takes non-defective units into account.
A tibble containing the following columns:
id
: Identification for every unit.
x
: Lifetime characteristic.
status
: Binary data (0 or 1) indicating whether a unit is a right
censored observation (= 0) or a failure (= 1).
rank
: Adjusted ranks, NA
if status = 0
.
prob
: Estimated failure probabilities, NA
if status = 0
.
cdf_estimation_method
: Specified method for the estimation of failure
probabilities (always 'johnson').
# Vectors: obs <- seq(10000, 100000, 10000) state <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) uic <- c("3435", "1203", "958X", "XX71", "abcd", "tz46", "fl29", "AX23", "Uy12", "kl1a") # Example 1 - Johnson method for intact and failed units: tbl_john <- johnson_method( x = obs, status = state, id = uic ) # Example 2 - Johnson's method works also if only defective units are considered: tbl_john_2 <- johnson_method( x = obs, status = rep(1, length(obs)) )
# Vectors: obs <- seq(10000, 100000, 10000) state <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) uic <- c("3435", "1203", "958X", "XX71", "abcd", "tz46", "fl29", "AX23", "Uy12", "kl1a") # Example 1 - Johnson method for intact and failed units: tbl_john <- johnson_method( x = obs, status = state, id = uic ) # Example 2 - Johnson's method works also if only defective units are considered: tbl_john_2 <- johnson_method( x = obs, status = rep(1, length(obs)) )
kaplan_method()
is no longer under active development, switching to
estimate_cdf is recommended.
kaplan_method(x, status, id = NULL)
kaplan_method(x, status, id = NULL)
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
id |
A vector for the identification of every unit. Default is |
Whereas the non-parametric Kaplan-Meier estimator is used to estimate the survival function S(t) in terms of (multiple) right censored data, the complement is an estimate of the cumulative distribution function F(t). One modification is made in contrast to the original Kaplan-Meier estimator (see 'References').
A tibble containing the following columns:
id
: Identification for every unit.
x
: Lifetime characteristic.
status
: Binary data (0 or 1) indicating whether a unit is a right
censored observation (= 0) or a failure (= 1).
rank
: Filled with NA
.
prob
: Estimated failure probabilities, NA
if status = 0
.
cdf_estimation_method
: Specified method for the estimation of failure
probabilities (always 'kaplan').
NIST/SEMATECH e-Handbook of Statistical Methods, 8.2.1.5. Empirical model fitting - distribution free (Kaplan-Meier) approach, NIST SEMATECH, December 3, 2020
# Vectors: obs <- seq(10000, 100000, 10000) state <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) state_2 <- c(0, 1, 1, 0, 0, 0, 1, 0, 0, 1) uic <- c("3435", "1203", "958X", "XX71", "abcd", "tz46", "fl29", "AX23","Uy12", "kl1a") # Example 1 - Observation with highest characteristic is an intact unit: tbl_kap <- kaplan_method( x = obs, status = state, id = uic ) # Example 2 - Observation with highest characteristic is a defective unit: tbl_kap_2 <- kaplan_method( x = obs, status = state_2 )
# Vectors: obs <- seq(10000, 100000, 10000) state <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) state_2 <- c(0, 1, 1, 0, 0, 0, 1, 0, 0, 1) uic <- c("3435", "1203", "958X", "XX71", "abcd", "tz46", "fl29", "AX23","Uy12", "kl1a") # Example 1 - Observation with highest characteristic is an intact unit: tbl_kap <- kaplan_method( x = obs, status = state, id = uic ) # Example 2 - Observation with highest characteristic is a defective unit: tbl_kap_2 <- kaplan_method( x = obs, status = state_2 )
This function computes the log-likelihood value with respect to a given set of parameters. In terms of Maximum Likelihood Estimation this function can be optimized (optim) to estimate the parameters and variance-covariance matrix of the parameters.
loglik_function(x, ...) ## S3 method for class 'wt_reliability_data' loglik_function( x, wts = rep(1, nrow(x)), dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), ... )
loglik_function(x, ...) ## S3 method for class 'wt_reliability_data' loglik_function( x, wts = rep(1, nrow(x)), dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), ... )
x |
A |
... |
Further arguments passed to or from other methods. Currently not used. |
wts |
Optional vector of case weights. The length of |
dist_params |
A vector of parameters. An overview of the distribution-specific parameters can be found in section 'Distributions'. |
distribution |
Supposed distribution of the random variable. |
Returns the log-likelihood value for the parameters in dist_params
given
the data.
The following table summarizes the available distributions and their parameters
location parameter ,
scale parameter or
and
threshold parameter .
The order within dist_params
is given in the table header.
distribution |
dist_params[1] |
dist_params[2] |
dist_params[3] |
"sev" |
|
|
- |
"weibull" |
|
|
- |
"weibull3" |
|
|
|
"normal" |
|
|
- |
"lognormal" |
|
|
- |
"lognormal3" |
|
|
|
"logistic" |
|
|
- |
"loglogistic" |
|
|
- |
"loglogistic3" |
|
|
|
"exponential" |
|
- | - |
"exponential2" |
|
|
- |
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Reliability data preparation: data <- reliability_data( alloy, x = cycles, status = status ) # Example 1 - Evaluating Log-Likelihood function of two-parametric weibull: loglik_weib <- loglik_function( x = data, dist_params = c(5.29, 0.33), distribution = "weibull" ) # Example 2 - Evaluating Log-Likelihood function of three-parametric weibull: loglik_weib3 <- loglik_function( x = data, dist_params = c(4.54, 0.76, 92.99), distribution = "weibull3" )
# Reliability data preparation: data <- reliability_data( alloy, x = cycles, status = status ) # Example 1 - Evaluating Log-Likelihood function of two-parametric weibull: loglik_weib <- loglik_function( x = data, dist_params = c(5.29, 0.33), distribution = "weibull" ) # Example 2 - Evaluating Log-Likelihood function of three-parametric weibull: loglik_weib3 <- loglik_function( x = data, dist_params = c(4.54, 0.76, 92.99), distribution = "weibull3" )
This function computes the log-likelihood value with respect to a given set of parameters. In terms of Maximum Likelihood Estimation this function can be optimized (optim) to estimate the parameters and variance-covariance matrix of the parameters.
## Default S3 method: loglik_function( x, status, wts = rep(1, length(x)), dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), ... )
## Default S3 method: loglik_function( x, status, wts = rep(1, length(x)), dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
wts |
Optional vector of case weights. The length of |
dist_params |
A vector of parameters. An overview of the distribution-specific parameters can be found in section 'Distributions'. |
distribution |
Supposed distribution of the random variable. |
... |
Further arguments passed to or from other methods. Currently not used. |
Returns the log-likelihood value for the parameters in dist_params
given
the data.
The following table summarizes the available distributions and their parameters
location parameter ,
scale parameter or
and
threshold parameter .
The order within dist_params
is given in the table header.
distribution |
dist_params[1] |
dist_params[2] |
dist_params[3] |
"sev" |
|
|
- |
"weibull" |
|
|
- |
"weibull3" |
|
|
|
"normal" |
|
|
- |
"lognormal" |
|
|
- |
"lognormal3" |
|
|
|
"logistic" |
|
|
- |
"loglogistic" |
|
|
- |
"loglogistic3" |
|
|
|
"exponential" |
|
- | - |
"exponential2" |
|
|
- |
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Vectors: cycles <- alloy$cycles status <- alloy$status # Example 1 - Evaluating Log-Likelihood function of two-parametric weibull: loglik_weib <- loglik_function( x = cycles, status = status, dist_params = c(5.29, 0.33), distribution = "weibull" ) # Example 2 - Evaluating Log-Likelihood function of three-parametric weibull: loglik_weib3 <- loglik_function( x = cycles, status = status, dist_params = c(4.54, 0.76, 92.99), distribution = "weibull3" )
# Vectors: cycles <- alloy$cycles status <- alloy$status # Example 1 - Evaluating Log-Likelihood function of two-parametric weibull: loglik_weib <- loglik_function( x = cycles, status = status, dist_params = c(5.29, 0.33), distribution = "weibull" ) # Example 2 - Evaluating Log-Likelihood function of three-parametric weibull: loglik_weib3 <- loglik_function( x = cycles, status = status, dist_params = c(4.54, 0.76, 92.99), distribution = "weibull3" )
This function evaluates the log-likelihood with respect to a given threshold parameter of a parametric lifetime distribution. In terms of Maximum Likelihood Estimation this function can be optimized (optim) to estimate the threshold parameter.
loglik_profiling(x, ...) ## S3 method for class 'wt_reliability_data' loglik_profiling( x, wts = rep(1, nrow(x)), thres, distribution = c("weibull3", "lognormal3", "loglogistic3", "exponential2"), ... )
loglik_profiling(x, ...) ## S3 method for class 'wt_reliability_data' loglik_profiling( x, wts = rep(1, nrow(x)), thres, distribution = c("weibull3", "lognormal3", "loglogistic3", "exponential2"), ... )
x |
A |
... |
Further arguments passed to or from other methods. Currently not used. |
wts |
Optional vector of case weights. The length of |
thres |
A numeric value for the threshold parameter. |
distribution |
Supposed parametric distribution of the random variable. |
Returns the log-likelihood value for the threshold parameter thres
given
the data.
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Reliability data preparation: data <- reliability_data( alloy, x = cycles, status = status ) # Determining the optimal loglikelihood value: ## Range of threshold parameter must be smaller than the first failure: threshold <- seq( 0, min(data$x[data$status == 1]) - 0.1, length.out = 50 ) ## loglikelihood value with respect to threshold values: profile_logL <- loglik_profiling( x = data, thres = threshold, distribution = "weibull3" ) ## Threshold value (among the candidates) that maximizes the ## loglikelihood: threshold[which.max(profile_logL)] ## plot: plot( threshold, profile_logL, type = "l" ) abline( v = threshold[which.max(profile_logL)], h = max(profile_logL), col = "red" )
# Reliability data preparation: data <- reliability_data( alloy, x = cycles, status = status ) # Determining the optimal loglikelihood value: ## Range of threshold parameter must be smaller than the first failure: threshold <- seq( 0, min(data$x[data$status == 1]) - 0.1, length.out = 50 ) ## loglikelihood value with respect to threshold values: profile_logL <- loglik_profiling( x = data, thres = threshold, distribution = "weibull3" ) ## Threshold value (among the candidates) that maximizes the ## loglikelihood: threshold[which.max(profile_logL)] ## plot: plot( threshold, profile_logL, type = "l" ) abline( v = threshold[which.max(profile_logL)], h = max(profile_logL), col = "red" )
This function evaluates the log-likelihood with respect to a given threshold parameter of a parametric lifetime distribution. In terms of Maximum Likelihood Estimation this function can be optimized (optim) to estimate the threshold parameter.
## Default S3 method: loglik_profiling( x, status, wts = rep(1, length(x)), thres, distribution = c("weibull3", "lognormal3", "loglogistic3", "exponential2"), ... )
## Default S3 method: loglik_profiling( x, status, wts = rep(1, length(x)), thres, distribution = c("weibull3", "lognormal3", "loglogistic3", "exponential2"), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
wts |
Optional vector of case weights. The length of |
thres |
A numeric value for the threshold parameter. |
distribution |
Supposed parametric distribution of the random variable. |
... |
Further arguments passed to or from other methods. Currently not used. |
Returns the log-likelihood value for the threshold parameter thres
given
the data.
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Vectors: cycles <- alloy$cycles status <- alloy$status # Determining the optimal loglikelihood value: ## Range of threshold parameter must be smaller than the first failure: threshold <- seq( 0, min(cycles[status == 1]) - 0.1, length.out = 50 ) ## loglikelihood value with respect to threshold values: profile_logL <- loglik_profiling( x = cycles, status = status, thres = threshold, distribution = "weibull3" ) ## Threshold value (among the candidates) that maximizes the ## loglikelihood: threshold[which.max(profile_logL)] ## plot: plot( threshold, profile_logL, type = "l" ) abline( v = threshold[which.max(profile_logL)], h = max(profile_logL), col = "red" )
# Vectors: cycles <- alloy$cycles status <- alloy$status # Determining the optimal loglikelihood value: ## Range of threshold parameter must be smaller than the first failure: threshold <- seq( 0, min(cycles[status == 1]) - 0.1, length.out = 50 ) ## loglikelihood value with respect to threshold values: profile_logL <- loglik_profiling( x = cycles, status = status, thres = threshold, distribution = "weibull3" ) ## Threshold value (among the candidates) that maximizes the ## loglikelihood: threshold[which.max(profile_logL)] ## plot: plot( threshold, profile_logL, type = "l" ) abline( v = threshold[which.max(profile_logL)], h = max(profile_logL), col = "red" )
In general, the amount of available information about units in the field is very different. During the warranty period, there are only a few cases with complete data (mainly failed units) but lots of cases with incomplete data (usually censored units). As a result, the operating time of units with incomplete information is often inaccurate and must be adjusted by delays.
This function reduces the operating times of incomplete observations by simulated delays (in days). A unit is considered as incomplete if the later of the related dates is unknown. See 'Details' for some practical examples.
Random delay numbers are drawn from the distribution determined by complete cases (described in 'Details' of dist_delay).
mcs_delay(...) ## S3 method for class 'wt_mcs_delay_data' mcs_delay(..., x, distribution = c("lognormal", "exponential"))
mcs_delay(...) ## S3 method for class 'wt_mcs_delay_data' mcs_delay(..., x, distribution = c("lognormal", "exponential"))
... |
Further arguments passed to or from other methods. Currently not used. |
x |
A |
distribution |
Supposed distribution of the respective delay. |
In field data analysis time-dependent characteristics (e.g. time in service) are often imprecisely recorded. These inaccuracies are caused by unconsidered delays.
For a better understanding of the MCS application in the context of field data, two cases are described below.
Delay in registration: It is common that a supplier, which provides
parts to the manufacturing industry does not know when the unit, in which
its parts are installed, were put in service (due to unknown registration or
sales date (date_2
)). Without taking the described delay into account, the
time in service of the failed units would be the difference between the
repair date and the production date (date_1
) and for intact units the
difference between the present date and the production date. But the real
operating times are (much) shorter, since the stress on the components have
not started until the whole systems were put in service. Hence, units with
incomplete data (missing date_2
) must be reduced by the delays.
Delay in report:: Authorized repairers often do not immediately
notify the manufacturer or OEM of repairs that were made during the warranty
period, but instead pass the information about these repairs in collected
forms e.g. weekly, monthly or quarterly. The resulting time difference between
the reporting (date_2
) of the repair in the guarantee database and the
actual repair date (date_1
), which is often assumed to be the failure
date, is called the reporting delay. For a given date where the analysis
is made there could be units which had a failure but the failure isn't
reported and therefore they are treated as censored units. In order to take
this into account and according to the principle of equal opportunities, the
lifetime of units with missing report date (date_2[i] = NA
) is reduced by
simulated reporting delays.
A list with class wt_mcs_delay
containing the following elements:
data
: A tibble
returned by mcs_delay_data where two modifications
has been made:
If the column status
exists, the tibble
has additional classes
wt_mcs_data
and wt_reliability_data
. Otherwise, the tibble
only has
the additional class wt_mcs_data
(which is not supported by estimate_cdf).
The column time
is renamed to x
(to be in accordance with
reliability_data) and contains the adjusted operating times for incomplete
observations and input operating times for the complete observations.
sim_data
: A tibble
with column sim_delay
that holds the simulated
delay-specific numbers for incomplete cases and 0
for complete cases.
If more than one delay was considered multiple columns with names sim_delay_1
,
sim_delay_2
, ..., sim_delay_i
and corresponding delay-specific random
numbers are presented.
model_estimation
: A list returned by dist_delay.
Verband der Automobilindustrie e.V. (VDA); Qualitätsmanagement in der Automobilindustrie. Zuverlässigkeitssicherung bei Automobilherstellern und Lieferanten. Zuverlässigkeits-Methoden und -Hilfsmittel.; 4th Edition, 2016, ISSN:0943-9412
dist_delay for the determination of a parametric delay distribution and estimate_cdf for the estimation of failure probabilities.
# MCS data preparation: ## Data for delay in registration: mcs_tbl_1 <- mcs_delay_data( field_data, date_1 = production_date, date_2 = registration_date, time = dis, status = status, id = vin ) ## Data for delay in report: mcs_tbl_2 <- mcs_delay_data( field_data, date_1 = repair_date, date_2 = report_date, time = dis, status = status, id = vin ) ## Data for both delays: mcs_tbl_both <- mcs_delay_data( field_data, date_1 = c(production_date, repair_date), date_2 = c(registration_date, report_date), time = dis, status = status, id = vin ) # Example 1 - MCS for delay in registration: mcs_regist <- mcs_delay( x = mcs_tbl_1, distribution = "lognormal" ) # Example 2 - MCS for delay in report: mcs_report <- mcs_delay( x = mcs_tbl_2, distribution = "exponential" ) # Example 3 - Reproducibility of random numbers: set.seed(1234) mcs_report_reproduce <- mcs_delay( x = mcs_tbl_2, distribution = "exponential" ) # Example 4 - MCS for delays in registration and report with same distribution: mcs_delays <- mcs_delay( x = mcs_tbl_both, distribution = "lognormal" ) # Example 5 - MCS for delays in registration and report with different distributions: ## Assuming lognormal registration and exponential reporting delays. mcs_delays_2 <- mcs_delay( x = mcs_tbl_both, distribution = c("lognormal", "exponential") )
# MCS data preparation: ## Data for delay in registration: mcs_tbl_1 <- mcs_delay_data( field_data, date_1 = production_date, date_2 = registration_date, time = dis, status = status, id = vin ) ## Data for delay in report: mcs_tbl_2 <- mcs_delay_data( field_data, date_1 = repair_date, date_2 = report_date, time = dis, status = status, id = vin ) ## Data for both delays: mcs_tbl_both <- mcs_delay_data( field_data, date_1 = c(production_date, repair_date), date_2 = c(registration_date, report_date), time = dis, status = status, id = vin ) # Example 1 - MCS for delay in registration: mcs_regist <- mcs_delay( x = mcs_tbl_1, distribution = "lognormal" ) # Example 2 - MCS for delay in report: mcs_report <- mcs_delay( x = mcs_tbl_2, distribution = "exponential" ) # Example 3 - Reproducibility of random numbers: set.seed(1234) mcs_report_reproduce <- mcs_delay( x = mcs_tbl_2, distribution = "exponential" ) # Example 4 - MCS for delays in registration and report with same distribution: mcs_delays <- mcs_delay( x = mcs_tbl_both, distribution = "lognormal" ) # Example 5 - MCS for delays in registration and report with different distributions: ## Assuming lognormal registration and exponential reporting delays. mcs_delays_2 <- mcs_delay( x = mcs_tbl_both, distribution = c("lognormal", "exponential") )
Create consistent mcs_delay_data
based on an existing data.frame
(preferred)
or on multiple equal length vectors.
mcs_delay_data( data = NULL, date_1, date_2, time, status = NULL, id = NULL, .keep_all = FALSE )
mcs_delay_data( data = NULL, date_1, date_2, time, status = NULL, id = NULL, .keep_all = FALSE )
data |
Either |
date_1 |
A date of class If more than one delay is to be considered, use a list for the vector-based approach and a vector of column names or positions for the data-based approach. The first element is the earlier date of the first delay, the second element is the earlier date of the second delay, and so forth (see 'Examples'). |
date_2 |
A date of class |
time |
Operating times. Use |
status |
Optional argument. If used, it must contain binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). If |
id |
Identification of every unit. |
.keep_all |
If |
A tibble
with class wt_mcs_delay_data
that is formed for the downstream
Monte Carlo method mcs_delay.
It contains the following columns (if .keep_all = FALSE
):
Column(s) preserving the input of date_1
. For the vector-based approach
with unnamed input, column name(s) is (are) date_1
(date_1.1
, date_1.2
, ...
, date_1.i
).
Column(s) preserving the input of date_2
. For the vector-based approach
with unnamed input, column name(s) is (are) date_2
(date_2.1
, date_2.2
, ...
, date_2.i
).
time
: Input operating times.
status
(optional) :
If is.null(status)
column status
does not exist.
If status
is provided the column contains the entered binary
data (0 or 1).
id
: Identification for every unit.
If .keep_all = TRUE
, the remaining columns of data
are also preserved.
The attributes mcs_start_dates
and mcs_end_dates
hold the name(s) of the
column(s) that preserve the input of date_1
and date_2
.
dist_delay for the determination of a parametric delay distribution and mcs_delay for the Monte Carlo method with respect to delays.
# Example 1 - Based on an existing data.frame/tibble and column names: mcs_tbl <- mcs_delay_data( data = field_data, date_1 = production_date, date_2 = registration_date, time = dis, status = status ) # Example 2 - Based on an existing data.frame/tibble and column positions: mcs_tbl_2 <- mcs_delay_data( data = field_data, date_1 = 7, date_2 = 8, time = 2, id = 1 ) # Example 3 - Keep all variables of the tibble/data.frame entered to argument data: mcs_tbl_3 <- mcs_delay_data( data = field_data, date_1 = production_date, date_2 = registration_date, time = dis, status = status, id = vin, .keep_all = TRUE ) # Example 4 - For multiple delays (data-based): mcs_tbl_4 <- mcs_delay_data( data = field_data, date_1 = c(production_date, repair_date), date_2 = c(registration_date, report_date), time = dis, status = status ) # Example 5 - Based on vectors: mcs_tbl_5 <- mcs_delay_data( date_1 = field_data$production_date, date_2 = field_data$registration_date, time = field_data$dis, status = field_data$status, id = field_data$vin ) # Example 6 - For multiple delays (vector-based): mcs_tbl_6 <- mcs_delay_data( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), time = field_data$dis, status = field_data$status, id = field_data$vin ) # Example 7 - For multiple delays (vector-based with named dates): mcs_tbl_7 <- mcs_delay_data( date_1 = list(d11 = field_data$production_date, d12 = field_data$repair_date), date_2 = list(d21 = field_data$registration_date, d22 = field_data$report_date), time = field_data$dis, status = field_data$status, id = field_data$vin )
# Example 1 - Based on an existing data.frame/tibble and column names: mcs_tbl <- mcs_delay_data( data = field_data, date_1 = production_date, date_2 = registration_date, time = dis, status = status ) # Example 2 - Based on an existing data.frame/tibble and column positions: mcs_tbl_2 <- mcs_delay_data( data = field_data, date_1 = 7, date_2 = 8, time = 2, id = 1 ) # Example 3 - Keep all variables of the tibble/data.frame entered to argument data: mcs_tbl_3 <- mcs_delay_data( data = field_data, date_1 = production_date, date_2 = registration_date, time = dis, status = status, id = vin, .keep_all = TRUE ) # Example 4 - For multiple delays (data-based): mcs_tbl_4 <- mcs_delay_data( data = field_data, date_1 = c(production_date, repair_date), date_2 = c(registration_date, report_date), time = dis, status = status ) # Example 5 - Based on vectors: mcs_tbl_5 <- mcs_delay_data( date_1 = field_data$production_date, date_2 = field_data$registration_date, time = field_data$dis, status = field_data$status, id = field_data$vin ) # Example 6 - For multiple delays (vector-based): mcs_tbl_6 <- mcs_delay_data( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), time = field_data$dis, status = field_data$status, id = field_data$vin ) # Example 7 - For multiple delays (vector-based with named dates): mcs_tbl_7 <- mcs_delay_data( date_1 = list(d11 = field_data$production_date, d12 = field_data$repair_date), date_2 = list(d21 = field_data$registration_date, d22 = field_data$report_date), time = field_data$dis, status = field_data$status, id = field_data$vin )
mcs_delay_register()
is no longer under active development, switching
to mcs_delay is recommended.
mcs_delay_register( date_prod, date_register, time, status, distribution = "lognormal", details = FALSE )
mcs_delay_register( date_prod, date_register, time, status, distribution = "lognormal", details = FALSE )
date_prod |
A vector of class |
date_register |
A vector of class |
time |
A numeric vector of operating times. |
status |
A vector of binary data (0 or 1) indicating whether unit i is a right censored observation (= 0) or a failure (= 1). |
distribution |
Supposed distribution of the random variable. Only
|
details |
A logical. If |
In general the amount of information about units in the field, that have not failed yet, are rare. For example it is common that a supplier, who provides parts to the automotive industry does not know when a vehicle was put in service and therefore does not know the exact operating time of the supplied parts. This function uses a Monte Carlo approach for simulating the operating times of (multiple) right censored observations, taking account of registering delays. The simulation is based on the distribution of operating times that were calculated from complete data (see dist_delay_register).
A numeric vector of corrected operating times for the censored units
and the input operating times for the failed units if details = FALSE
.
If details = TRUE
the output is a list which consists of the following elements:
time
: Numeric vector of corrected operating times for the censored
observations and input operating times for failed units.
x_sim
: Simulated random numbers of specified distribution with estimated
parameters. The length of x_sim
is equal to the number of censored observations.
coefficients
: Estimated coefficients of supposed distribution.
date_of_production <- c("2014-07-28", "2014-02-17", "2014-07-14", "2014-06-26", "2014-03-10", "2014-05-14", "2014-05-06", "2014-03-07", "2014-03-09", "2014-04-13", "2014-05-20", "2014-07-07", "2014-01-27", "2014-01-30", "2014-03-17", "2014-02-09", "2014-04-14", "2014-04-20", "2014-03-13", "2014-02-23", "2014-04-03", "2014-01-08", "2014-01-08") date_of_registration <- c(NA, "2014-03-29", "2014-12-06", "2014-09-09", NA, NA, "2014-06-16", NA, "2014-05-23", "2014-05-09", "2014-05-31", NA, "2014-04-13", NA, NA, "2014-03-12", NA, "2014-06-02", NA, "2014-03-21", "2014-06-19", NA, NA) op_time <- rep(1000, length(date_of_production)) status <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0) # Example 1 - Simplified vector output: x_corrected <- mcs_delay_register( date_prod = date_of_production, date_register = date_of_registration, time = op_time, status = status, distribution = "lognormal", details = FALSE ) # Example 2 - Detailed list output: list_detail <- mcs_delay_register( date_prod = date_of_production, date_register = date_of_registration, time = op_time, status = status, distribution = "lognormal", details = TRUE )
date_of_production <- c("2014-07-28", "2014-02-17", "2014-07-14", "2014-06-26", "2014-03-10", "2014-05-14", "2014-05-06", "2014-03-07", "2014-03-09", "2014-04-13", "2014-05-20", "2014-07-07", "2014-01-27", "2014-01-30", "2014-03-17", "2014-02-09", "2014-04-14", "2014-04-20", "2014-03-13", "2014-02-23", "2014-04-03", "2014-01-08", "2014-01-08") date_of_registration <- c(NA, "2014-03-29", "2014-12-06", "2014-09-09", NA, NA, "2014-06-16", NA, "2014-05-23", "2014-05-09", "2014-05-31", NA, "2014-04-13", NA, NA, "2014-03-12", NA, "2014-06-02", NA, "2014-03-21", "2014-06-19", NA, NA) op_time <- rep(1000, length(date_of_production)) status <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0) # Example 1 - Simplified vector output: x_corrected <- mcs_delay_register( date_prod = date_of_production, date_register = date_of_registration, time = op_time, status = status, distribution = "lognormal", details = FALSE ) # Example 2 - Detailed list output: list_detail <- mcs_delay_register( date_prod = date_of_production, date_register = date_of_registration, time = op_time, status = status, distribution = "lognormal", details = TRUE )
mcs_delay_report()
is no longer under active development, switching to
mcs_delay is recommended.
mcs_delay_report( date_repair, date_report, time, status, distribution = "lognormal", details = FALSE )
mcs_delay_report( date_repair, date_report, time, status, distribution = "lognormal", details = FALSE )
date_repair |
a vector of class |
date_report |
a vector of class |
time |
A numeric vector of operating times. |
status |
A vector of binary data (0 or 1) indicating whether unit i is a right censored observation (= 0) or a failure (= 1). |
distribution |
Supposed distribution of the random variable. Only
|
details |
A logical. If |
The delay in report describes the time between the occurrence of a damage and the registration in the warranty database. For a given date where the analysis is made there could be units which had a failure but are not registered in the database and therefore treated as censored units. To overcome this problem this function uses a Monte Carlo approach for simulating the operating times of (multiple) right censored observations, taking account of reporting delays. The simulation is based on the distribution of operating times that were calculated from complete data, i.e. failed items (see dist_delay_report).
A numeric vector of corrected operating times for the censored units
and the input operating times for the failed units if details = FALSE
.
If details = TRUE
the output is a list which consists of the following
elements:
time
: Numeric vector of corrected operating times for the censored
observations and input operating times for failed units.
x_sim
: Simulated random numbers of specified distribution with
estimated parameters. The length of x_sim
is equal to the number of
censored observations.
coefficients
: Estimated coefficients of supposed distribution.
date_of_repair <- c(NA, "2014-09-15", "2015-07-04", "2015-04-10", NA, NA, "2015-04-24", NA, "2015-04-25", "2015-04-24", "2015-06-12", NA, "2015-05-04", NA, NA, "2015-05-22", NA, "2015-09-17", NA, "2015-08-15", "2015-11-26", NA, NA) date_of_report <- c(NA, "2014-10-09", "2015-08-28", "2015-04-15", NA, NA, "2015-05-16", NA, "2015-05-28", "2015-05-15", "2015-07-11", NA, "2015-08-14", NA, NA, "2015-06-05", NA, "2015-10-17", NA, "2015-08-21", "2015-12-02", NA, NA) op_time <- rep(1000, length(date_of_repair)) status <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0) # Example 1 - Simplified vector output: x_corrected <- mcs_delay_report( date_repair = date_of_repair, date_report = date_of_report, time = op_time, status = status, distribution = "lognormal", details = FALSE ) # Example 2 - Detailed list output: list_detail <- mcs_delay_report( date_repair = date_of_repair, date_report = date_of_report, time = op_time, status = status, distribution = "lognormal", details = TRUE )
date_of_repair <- c(NA, "2014-09-15", "2015-07-04", "2015-04-10", NA, NA, "2015-04-24", NA, "2015-04-25", "2015-04-24", "2015-06-12", NA, "2015-05-04", NA, NA, "2015-05-22", NA, "2015-09-17", NA, "2015-08-15", "2015-11-26", NA, NA) date_of_report <- c(NA, "2014-10-09", "2015-08-28", "2015-04-15", NA, NA, "2015-05-16", NA, "2015-05-28", "2015-05-15", "2015-07-11", NA, "2015-08-14", NA, NA, "2015-06-05", NA, "2015-10-17", NA, "2015-08-21", "2015-12-02", NA, NA) op_time <- rep(1000, length(date_of_repair)) status <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0) # Example 1 - Simplified vector output: x_corrected <- mcs_delay_report( date_repair = date_of_repair, date_report = date_of_report, time = op_time, status = status, distribution = "lognormal", details = FALSE ) # Example 2 - Detailed list output: list_detail <- mcs_delay_report( date_repair = date_of_repair, date_report = date_of_report, time = op_time, status = status, distribution = "lognormal", details = TRUE )
In general, the amount of available information about units in the field is very different. During the warranty period, there are only a few cases with complete data (mainly failed units) but lots of cases with incomplete data (usually censored units). As a result, the operating time of units with incomplete information is often inaccurate and must be adjusted by delays.
This function reduces the operating times of incomplete observations by simulated delays (in days). A unit is considered as incomplete if the later of the related dates is unknown. See 'Details' for some practical examples.
Random delay numbers are drawn from the distribution determined by complete cases (described in 'Details' of dist_delay).
## Default S3 method: mcs_delay( ..., date_1, date_2, time, status = NULL, id = paste0("ID", seq_len(length(time))), distribution = c("lognormal", "exponential") )
## Default S3 method: mcs_delay( ..., date_1, date_2, time, status = NULL, id = paste0("ID", seq_len(length(time))), distribution = c("lognormal", "exponential") )
... |
Further arguments passed to or from other methods. Currently not used. |
date_1 |
A vector of class If more than one delay is to be considered, use a list where the first element is the earlier date of the first delay, the second element is the earlier date of the second delay, and so forth (see 'Examples'). |
date_2 |
A vector of class |
time |
Operating times. Use |
status |
Optional argument. If used, it must contain binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). If |
id |
Identification of every unit. |
distribution |
Supposed distribution of the respective delay. |
In field data analysis time-dependent characteristics (e.g. time in service) are often imprecisely recorded. These inaccuracies are caused by unconsidered delays.
For a better understanding of the MCS application in the context of field data, two cases are described below.
Delay in registration: It is common that a supplier, which provides
parts to the manufacturing industry does not know when the unit, in which
its parts are installed, were put in service (due to unknown registration or
sales date (date_2
)). Without taking the described delay into account, the
time in service of the failed units would be the difference between the
repair date and the production date (date_1
) and for intact units the
difference between the present date and the production date. But the real
operating times are (much) shorter, since the stress on the components have
not started until the whole systems were put in service. Hence, units with
incomplete data (missing date_2
) must be reduced by the delays.
Delay in report:: Authorized repairers often do not immediately
notify the manufacturer or OEM of repairs that were made during the warranty
period, but instead pass the information about these repairs in collected
forms e.g. weekly, monthly or quarterly. The resulting time difference between
the reporting (date_2
) of the repair in the guarantee database and the
actual repair date (date_1
), which is often assumed to be the failure
date, is called the reporting delay. For a given date where the analysis
is made there could be units which had a failure but the failure isn't
reported and therefore they are treated as censored units. In order to take
this into account and according to the principle of equal opportunities, the
lifetime of units with missing report date (date_2[i] = NA
) is reduced by
simulated reporting delays.
A list with class wt_mcs_delay
containing the following elements:
data
: A tibble
returned by mcs_delay_data where two modifications
has been made:
If the column status
exists, the tibble
has additional classes
wt_mcs_data
and wt_reliability_data
. Otherwise, the tibble
only has
the additional class wt_mcs_data
(which is not supported by estimate_cdf).
The column time
is renamed to x
(to be in accordance with
reliability_data) and contains the adjusted operating times for incomplete
observations and input operating times for the complete observations.
sim_data
: A tibble
with column sim_delay
that holds the simulated
delay-specific numbers for incomplete cases and 0
for complete cases.
If more than one delay was considered multiple columns with names sim_delay_1
,
sim_delay_2
, ..., sim_delay_i
and corresponding delay-specific random
numbers are presented.
model_estimation
: A list returned by dist_delay.
Verband der Automobilindustrie e.V. (VDA); Qualitätsmanagement in der Automobilindustrie. Zuverlässigkeitssicherung bei Automobilherstellern und Lieferanten. Zuverlässigkeits-Methoden und -Hilfsmittel.; 4th Edition, 2016, ISSN:0943-9412
dist_delay for the determination of a parametric delay distribution and estimate_cdf for the estimation of failure probabilities.
# Example 1 - MCS for delay in registration: mcs_regist <- mcs_delay( date_1 = field_data$production_date, date_2 = field_data$registration_date, time = field_data$dis, status = field_data$status, distribution = "lognormal" ) # Example 2 - MCS for delay in report: mcs_report <- mcs_delay( date_1 = field_data$repair_date, date_2 = field_data$report_date, time = field_data$dis, status = field_data$status, distribution = "exponential" ) # Example 3 - Reproducibility of random numbers: set.seed(1234) mcs_report_reproduce <- mcs_delay( date_1 = field_data$repair_date, date_2 = field_data$report_date, time = field_data$dis, status = field_data$status, distribution = "exponential" ) # Example 4 - MCS for delays in registration and report with same distribution: mcs_delays <- mcs_delay( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), time = field_data$dis, status = field_data$status, distribution = "lognormal" ) # Example 5 - MCS for delays in registration and report with different distributions: ## Assuming lognormal registration and exponential reporting delays. mcs_delays_2 <- mcs_delay( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), time = field_data$dis, status = field_data$status, distribution = c("lognormal", "exponential") )
# Example 1 - MCS for delay in registration: mcs_regist <- mcs_delay( date_1 = field_data$production_date, date_2 = field_data$registration_date, time = field_data$dis, status = field_data$status, distribution = "lognormal" ) # Example 2 - MCS for delay in report: mcs_report <- mcs_delay( date_1 = field_data$repair_date, date_2 = field_data$report_date, time = field_data$dis, status = field_data$status, distribution = "exponential" ) # Example 3 - Reproducibility of random numbers: set.seed(1234) mcs_report_reproduce <- mcs_delay( date_1 = field_data$repair_date, date_2 = field_data$report_date, time = field_data$dis, status = field_data$status, distribution = "exponential" ) # Example 4 - MCS for delays in registration and report with same distribution: mcs_delays <- mcs_delay( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), time = field_data$dis, status = field_data$status, distribution = "lognormal" ) # Example 5 - MCS for delays in registration and report with different distributions: ## Assuming lognormal registration and exponential reporting delays. mcs_delays_2 <- mcs_delay( date_1 = list(field_data$production_date, field_data$repair_date), date_2 = list(field_data$registration_date, field_data$report_date), time = field_data$dis, status = field_data$status, distribution = c("lognormal", "exponential") )
mcs_delays()
is no longer under active development, switching to mcs_delay
is recommended.
mcs_delays( date_prod, date_register, date_repair, date_report, time, status, distribution = "lognormal", details = FALSE )
mcs_delays( date_prod, date_register, date_repair, date_report, time, status, distribution = "lognormal", details = FALSE )
date_prod |
A vector of class |
date_register |
A vector of class |
date_repair |
a vector of class |
date_report |
a vector of class |
time |
A numeric vector of operating times. |
status |
A vector of binary data (0 or 1) indicating whether unit i is a right censored observation (= 0) or a failure (= 1). |
distribution |
Supposed distribution of the random variable. Only
|
details |
A logical. If |
This function is a wrapper that combines both, mcs_delay_register and mcs_delay_report functions for the adjustment of operating times of censored units.
A numerical vector of corrected operating times for the censored units
and the input operating times for the failed units if
details = FALSE
. If details = TRUE
the output is a list which
consists of the following elements:
time
: A numeric vector of corrected operating times for the censored
observations and input operating times for failed units.
x_sim_regist
: Simulated random numbers of specified distribution with
estimated parameters for delay in registration. The length of x_sim_regist
is equal to the number of censored observations.
x_sim_report
: Simulated random numbers of specified distribution with
estimated parameters for delay in report. The length of x_sim_report
is
equal to the number of censored observations.
coefficients_regist
: Estimated coefficients of supposed distribution for
delay in registration.
coefficients_report
: Estimated coefficients of supposed distribution for
delay in report
date_of_production <- c("2014-07-28", "2014-02-17", "2014-07-14", "2014-06-26", "2014-03-10", "2014-05-14", "2014-05-06", "2014-03-07", "2014-03-09", "2014-04-13", "2014-05-20", "2014-07-07", "2014-01-27", "2014-01-30", "2014-03-17", "2014-02-09", "2014-04-14", "2014-04-20", "2014-03-13", "2014-02-23", "2014-04-03", "2014-01-08", "2014-01-08") date_of_registration <- c("2014-08-17", "2014-03-29", "2014-12-06", "2014-09-09", "2014-05-14", "2014-07-01", "2014-06-16", "2014-04-03", "2014-05-23", "2014-05-09", "2014-05-31", "2014-08-12", "2014-04-13", "2014-02-15", "2014-07-07", "2014-03-12", "2014-05-27", "2014-06-02", "2014-05-20", "2014-03-21", "2014-06-19", "2014-02-12", "2014-03-27") date_of_repair <- c(NA, "2014-09-15", "2015-07-04", "2015-04-10", NA, NA, "2015-04-24", NA, "2015-04-25", "2015-04-24", "2015-06-12", NA, "2015-05-04", NA, NA, "2015-05-22", NA, "2015-09-17", NA, "2015-08-15", "2015-11-26", NA, NA) date_of_report <- c(NA, "2014-10-09", "2015-08-28", "2015-04-15", NA, NA, "2015-05-16", NA, "2015-05-28", "2015-05-15", "2015-07-11", NA, "2015-08-14", NA, NA, "2015-06-05", NA, "2015-10-17", NA, "2015-08-21", "2015-12-02", NA, NA) op_time <- rep(1000, length(date_of_repair)) status <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0) # Example 1 - Simplified vector output: x_corrected <- mcs_delays( date_prod = date_of_production, date_register = date_of_registration, date_repair = date_of_repair, date_report = date_of_report, time = op_time, status = status, distribution = "lognormal", details = FALSE ) # Example 2 - Detailed list output: list_detail <- mcs_delays( date_prod = date_of_production, date_register = date_of_registration, date_repair = date_of_repair, date_report = date_of_report, time = op_time, status = status, distribution = "lognormal", details = TRUE )
date_of_production <- c("2014-07-28", "2014-02-17", "2014-07-14", "2014-06-26", "2014-03-10", "2014-05-14", "2014-05-06", "2014-03-07", "2014-03-09", "2014-04-13", "2014-05-20", "2014-07-07", "2014-01-27", "2014-01-30", "2014-03-17", "2014-02-09", "2014-04-14", "2014-04-20", "2014-03-13", "2014-02-23", "2014-04-03", "2014-01-08", "2014-01-08") date_of_registration <- c("2014-08-17", "2014-03-29", "2014-12-06", "2014-09-09", "2014-05-14", "2014-07-01", "2014-06-16", "2014-04-03", "2014-05-23", "2014-05-09", "2014-05-31", "2014-08-12", "2014-04-13", "2014-02-15", "2014-07-07", "2014-03-12", "2014-05-27", "2014-06-02", "2014-05-20", "2014-03-21", "2014-06-19", "2014-02-12", "2014-03-27") date_of_repair <- c(NA, "2014-09-15", "2015-07-04", "2015-04-10", NA, NA, "2015-04-24", NA, "2015-04-25", "2015-04-24", "2015-06-12", NA, "2015-05-04", NA, NA, "2015-05-22", NA, "2015-09-17", NA, "2015-08-15", "2015-11-26", NA, NA) date_of_report <- c(NA, "2014-10-09", "2015-08-28", "2015-04-15", NA, NA, "2015-05-16", NA, "2015-05-28", "2015-05-15", "2015-07-11", NA, "2015-08-14", NA, NA, "2015-06-05", NA, "2015-10-17", NA, "2015-08-21", "2015-12-02", NA, NA) op_time <- rep(1000, length(date_of_repair)) status <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0) # Example 1 - Simplified vector output: x_corrected <- mcs_delays( date_prod = date_of_production, date_register = date_of_registration, date_repair = date_of_repair, date_report = date_of_report, time = op_time, status = status, distribution = "lognormal", details = FALSE ) # Example 2 - Detailed list output: list_detail <- mcs_delays( date_prod = date_of_production, date_register = date_of_registration, date_repair = date_of_repair, date_report = date_of_report, time = op_time, status = status, distribution = "lognormal", details = TRUE )
This function simulates distances for units where these are unknown.
First, random numbers of the annual mileage distribution, estimated by dist_mileage, are drawn. Second, the drawn annual distances are converted with respect to the actual operating times (in days) using a linear relationship. See 'Details'.
mcs_mileage(x, ...) ## S3 method for class 'wt_mcs_mileage_data' mcs_mileage(x, distribution = c("lognormal", "exponential"), ...)
mcs_mileage(x, ...) ## S3 method for class 'wt_mcs_mileage_data' mcs_mileage(x, distribution = c("lognormal", "exponential"), ...)
x |
A |
... |
Further arguments passed to or from other methods. Currently not used. |
distribution |
Supposed distribution of the annual mileage. |
Assumption of linear relationship: Imagine the distance of the vehicle is unknown. A distance of 3500.25 kilometers (km) was drawn from the annual distribution and the known operating time is 200 days (d). So the resulting distance of this vehicle is
A list with class wt_mcs_mileage
containing the following elements:
data
: A tibble
returned by mcs_mileage_data where two modifications
has been made:
If the column status
exists, the tibble
has additional classes
wt_mcs_data
and wt_reliability_data
. Otherwise, the tibble
only has
the additional class wt_mcs_data
(which is not supported by estimate_cdf).
The column mileage
is renamed to x
(to be in accordance with
reliability_data) and contains simulated distances for incomplete
observations and input distances for the complete observations.
sim_data
: A tibble
with column sim_mileage
that holds the simulated
distances for incomplete cases and 0
for complete cases.
model_estimation
: A list returned by dist_mileage.
dist_mileage for the determination of a parametric annual mileage distribution and estimate_cdf for the estimation of failure probabilities.
# MCS data preparation: mcs_tbl <- mcs_mileage_data( field_data, mileage = mileage, time = dis, status = status, id = vin ) # Example 1 - Reproducibility of drawn random numbers: set.seed(1234) mcs_distances <- mcs_mileage( x = mcs_tbl, distribution = "lognormal" ) # Example 2 - MCS for distances with exponential annual mileage distribution: mcs_distances_2 <- mcs_mileage( x = mcs_tbl, distribution = "exponential" ) # Example 3 - MCS for distances with downstream probability estimation: ## Apply 'estimate_cdf()' to *$data: prob_estimation <- estimate_cdf( x = mcs_distances$data, methods = "kaplan" ) ## Apply 'plot_prob()': plot_prob_estimation <- plot_prob(prob_estimation)
# MCS data preparation: mcs_tbl <- mcs_mileage_data( field_data, mileage = mileage, time = dis, status = status, id = vin ) # Example 1 - Reproducibility of drawn random numbers: set.seed(1234) mcs_distances <- mcs_mileage( x = mcs_tbl, distribution = "lognormal" ) # Example 2 - MCS for distances with exponential annual mileage distribution: mcs_distances_2 <- mcs_mileage( x = mcs_tbl, distribution = "exponential" ) # Example 3 - MCS for distances with downstream probability estimation: ## Apply 'estimate_cdf()' to *$data: prob_estimation <- estimate_cdf( x = mcs_distances$data, methods = "kaplan" ) ## Apply 'plot_prob()': plot_prob_estimation <- plot_prob(prob_estimation)
Create consistent mcs_mileage_data
based on an existing data.frame
(preferred)
or on multiple equal length vectors
mcs_mileage_data( data = NULL, mileage, time, status = NULL, id = NULL, .keep_all = FALSE )
mcs_mileage_data( data = NULL, mileage, time, status = NULL, id = NULL, .keep_all = FALSE )
data |
Either |
mileage |
Covered distances. Use |
time |
Operating times. Use |
status |
Optional argument. If used, it must contain binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). If |
id |
Identification of every unit. |
.keep_all |
If |
A tibble
with class wt_mcs_mileage_data
that is formed for the downstream
Monte Carlo method mcs_mileage.
It contains the following columns (if .keep_all = FALSE
):
mileage
: Input mileages.
time
: Input operating times.
status
(optional) :
If is.null(status)
column status
does not exist.
If status
is provided the column contains the entered binary
data (0 or 1).
id
: Identification for every unit.
If .keep_all = TRUE
, the remaining columns of data
are also preserved.
The attribute mcs_characteristic
is set to "mileage"
.
dist_mileage for the determination of a parametric annual mileage distribution and mcs_mileage for the Monte Carlo method with respect to unknown distances.
# Example 1 - Based on an existing data.frame/tibble and column names: mcs_tbl <- mcs_mileage_data( data = field_data, mileage = mileage, time = dis, status = status ) # Example 2 - Based on an existing data.frame/tibble and column positions: mcs_tbl_2 <- mcs_mileage_data( data = field_data, mileage = 3, time = 2, id = 1 ) # Example 3 - Keep all variables of the tibble/data.frame entered to argument data: mcs_tbl_3 <- mcs_mileage_data( data = field_data, mileage = mileage, time = dis, status = status, id = vin, .keep_all = TRUE ) # Example 4 - Based on vectors: mcs_tbl_4 <- mcs_mileage_data( mileage = field_data$mileage, time = field_data$dis, status = field_data$status, id = field_data$vin )
# Example 1 - Based on an existing data.frame/tibble and column names: mcs_tbl <- mcs_mileage_data( data = field_data, mileage = mileage, time = dis, status = status ) # Example 2 - Based on an existing data.frame/tibble and column positions: mcs_tbl_2 <- mcs_mileage_data( data = field_data, mileage = 3, time = 2, id = 1 ) # Example 3 - Keep all variables of the tibble/data.frame entered to argument data: mcs_tbl_3 <- mcs_mileage_data( data = field_data, mileage = mileage, time = dis, status = status, id = vin, .keep_all = TRUE ) # Example 4 - Based on vectors: mcs_tbl_4 <- mcs_mileage_data( mileage = field_data$mileage, time = field_data$dis, status = field_data$status, id = field_data$vin )
This function simulates distances for units where these are unknown.
First, random numbers of the annual mileage distribution, estimated by dist_mileage, are drawn. Second, the drawn annual distances are converted with respect to the actual operating times (in days) using a linear relationship. See 'Details'.
## Default S3 method: mcs_mileage( x, time, status = NULL, id = paste0("ID", seq_len(length(time))), distribution = c("lognormal", "exponential"), ... )
## Default S3 method: mcs_mileage( x, time, status = NULL, id = paste0("ID", seq_len(length(time))), distribution = c("lognormal", "exponential"), ... )
x |
A numeric vector of distances covered. Use |
time |
A numeric vector of operating times. Use |
status |
Optional argument. If used, it must contain binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). If |
id |
Identification of every unit. |
distribution |
Supposed distribution of the annual mileage. |
... |
Further arguments passed to or from other methods. Currently not used. |
Assumption of linear relationship: Imagine the distance of the vehicle is unknown. A distance of 3500.25 kilometers (km) was drawn from the annual distribution and the known operating time is 200 days (d). So the resulting distance of this vehicle is
A list with class wt_mcs_mileage
containing the following elements:
data
: A tibble
returned by mcs_mileage_data where two modifications
has been made:
If the column status
exists, the tibble
has additional classes
wt_mcs_data
and wt_reliability_data
. Otherwise, the tibble
only has
the additional class wt_mcs_data
(which is not supported by estimate_cdf).
The column mileage
is renamed to x
(to be in accordance with
reliability_data) and contains simulated distances for incomplete
observations and input distances for the complete observations.
sim_data
: A tibble
with column sim_mileage
that holds the simulated
distances for incomplete cases and 0
for complete cases.
model_estimation
: A list returned by dist_mileage.
dist_mileage for the determination of a parametric annual mileage distribution and estimate_cdf for the estimation of failure probabilities.
# Example 1 - Reproducibility of drawn random numbers: set.seed(1234) mcs_distances <- mcs_mileage( x = field_data$mileage, time = field_data$dis, status = field_data$status, id = field_data$vin, distribution = "lognormal" ) # Example 2 - MCS for distances with exponential annual mileage distribution: mcs_distances_2 <- mcs_mileage( x = field_data$mileage, time = field_data$dis, status = field_data$status, id = field_data$vin, distribution = "exponential" ) # Example 3 - MCS for distances with downstream probability estimation: ## Apply 'estimate_cdf()' to *$data: prob_estimation <- estimate_cdf( x = mcs_distances$data, methods = "kaplan" ) ## Apply 'plot_prob()': plot_prob_estimation <- plot_prob(prob_estimation)
# Example 1 - Reproducibility of drawn random numbers: set.seed(1234) mcs_distances <- mcs_mileage( x = field_data$mileage, time = field_data$dis, status = field_data$status, id = field_data$vin, distribution = "lognormal" ) # Example 2 - MCS for distances with exponential annual mileage distribution: mcs_distances_2 <- mcs_mileage( x = field_data$mileage, time = field_data$dis, status = field_data$status, id = field_data$vin, distribution = "exponential" ) # Example 3 - MCS for distances with downstream probability estimation: ## Apply 'estimate_cdf()' to *$data: prob_estimation <- estimate_cdf( x = mcs_distances$data, methods = "kaplan" ) ## Apply 'plot_prob()': plot_prob_estimation <- plot_prob(prob_estimation)
This method applies the expectation-maximization (EM) algorithm to estimate the parameters of a univariate Weibull mixture model. See 'Details'.
mixmod_em(x, ...) ## S3 method for class 'wt_reliability_data' mixmod_em( x, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 100L, conv_limit = 1e-06, diff_loglik = 0.01, ... )
mixmod_em(x, ...) ## S3 method for class 'wt_reliability_data' mixmod_em( x, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 100L, conv_limit = 1e-06, diff_loglik = 0.01, ... )
x |
A tibble with class |
... |
Further arguments passed to or from other methods. Currently not used. |
distribution |
|
conf_level |
Confidence level for the intervals of the Weibull parameters
of every component |
k |
Number of mixture components. |
method |
|
n_iter |
Integer defining the maximum number of iterations. |
conv_limit |
Numeric value defining the convergence limit. |
diff_loglik |
Numeric value defining the maximum difference between log-likelihood values, which seems permissible. |
The EM algorithm is an iterative algorithm for which starting values must be defined. Starting values can be provided for the unknown parameter vector as well as for the posterior probabilities. This implementation employs initial values for the posterior probabilities. These are assigned randomly by using the Dirichlet distribution, the conjugate prior of a multinomial distribution (see Mr. Gelissen's blog post listed under references).
M-Step : On the basis of the initial posterior probabilities, the parameter vector is estimated with Newton-Raphson.
E-Step : The actual estimated parameter vector is used to perform an update of the posterior probabilities.
This procedure is repeated until the complete log-likelihood has converged.
A list with classes wt_model
and wt_mixmod_em
. The length of the
list depends on the number of specified subgroups k
. The first k
lists
contain information provided by ml_estimation. The values of logL
, aic
and bic
are the results of a weighted log-likelihood, where the weights are
the posterior probabilities determined by the algorithm. The last list summarizes
further results of the EM algorithm and is therefore called em_results
. It
contains the following elements:
a_priori
: A vector with estimated prior probabilities.
a_posteriori
: A matrix with estimated posterior probabilities.
groups
: Numeric vector specifying the group membership of every observation.
logL
: The value of the complete log-likelihood.
aic
: Akaike Information Criterion.
bic
: Bayesian Information Criterion.
Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
# Reliability data preparation: ## Data for mixture model: data_mix <- reliability_data( voltage, x = hours, status = status ) # Example 1 - EM algorithm with k = 2: mix_mod_em <- mixmod_em( x = data_mix, conf_level = 0.95, k = 2, n_iter = 150 ) # Example 2 - Maximum likelihood is applied when k = 1: mix_mod_em_2 <- mixmod_em( x = data_mix, conf_level = 0.95, k = 1, n_iter = 150 )
# Reliability data preparation: ## Data for mixture model: data_mix <- reliability_data( voltage, x = hours, status = status ) # Example 1 - EM algorithm with k = 2: mix_mod_em <- mixmod_em( x = data_mix, conf_level = 0.95, k = 2, n_iter = 150 ) # Example 2 - Maximum likelihood is applied when k = 1: mix_mod_em_2 <- mixmod_em( x = data_mix, conf_level = 0.95, k = 1, n_iter = 150 )
This method applies the expectation-maximization (EM) algorithm to estimate the parameters of a univariate Weibull mixture model. See 'Details'.
## Default S3 method: mixmod_em( x, status, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 100L, conv_limit = 1e-06, diff_loglik = 0.01, ... )
## Default S3 method: mixmod_em( x, status, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 100L, conv_limit = 1e-06, diff_loglik = 0.01, ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
distribution |
|
conf_level |
Confidence level for the intervals of the Weibull parameters
of every component |
k |
Number of mixture components. |
method |
|
n_iter |
Integer defining the maximum number of iterations. |
conv_limit |
Numeric value defining the convergence limit. |
diff_loglik |
Numeric value defining the maximum difference between log-likelihood values, which seems permissible. |
... |
Further arguments passed to or from other methods. Currently not used. |
The EM algorithm is an iterative algorithm for which starting values must be defined. Starting values can be provided for the unknown parameter vector as well as for the posterior probabilities. This implementation employs initial values for the posterior probabilities. These are assigned randomly by using the Dirichlet distribution, the conjugate prior of a multinomial distribution (see Mr. Gelissen's blog post listed under references).
M-Step : On the basis of the initial posterior probabilities, the parameter vector is estimated with Newton-Raphson.
E-Step : The actual estimated parameter vector is used to perform an update of the posterior probabilities.
This procedure is repeated until the complete log-likelihood has converged.
A list with classes wt_model
and wt_mixmod_em
. The length of the
list depends on the number of specified subgroups k
. The first k
lists
contain information provided by ml_estimation. The values of logL
, aic
and bic
are the results of a weighted log-likelihood, where the weights are
the posterior probabilities determined by the algorithm. The last list summarizes
further results of the EM algorithm and is therefore called em_results
. It
contains the following elements:
a_priori
: A vector with estimated prior probabilities.
a_posteriori
: A matrix with estimated posterior probabilities.
groups
: Numeric vector specifying the group membership of every observation.
logL
: The value of the complete log-likelihood.
aic
: Akaike Information Criterion.
bic
: Bayesian Information Criterion.
Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
# Vectors: hours <- voltage$hours status <- voltage$status # Example 1 - EM algorithm with k = 2: mix_mod_em <- mixmod_em( x = hours, status = status, distribution = "weibull", conf_level = 0.95, k = 2, n_iter = 150 ) #' # Example 2 - Maximum likelihood is applied when k = 1: mix_mod_em_2 <- mixmod_em( x = hours, status = status, distribution = "weibull", conf_level = 0.95, k = 1, method = "EM", n_iter = 150 )
# Vectors: hours <- voltage$hours status <- voltage$status # Example 1 - EM algorithm with k = 2: mix_mod_em <- mixmod_em( x = hours, status = status, distribution = "weibull", conf_level = 0.95, k = 2, n_iter = 150 ) #' # Example 2 - Maximum likelihood is applied when k = 1: mix_mod_em_2 <- mixmod_em( x = hours, status = status, distribution = "weibull", conf_level = 0.95, k = 1, method = "EM", n_iter = 150 )
This function uses piecewise linear regression to divide the data into subgroups. See 'Details'.
mixmod_regression(x, ...) ## S3 method for class 'wt_cdf_estimation' mixmod_regression( x, distribution = c("weibull", "lognormal", "loglogistic"), conf_level = 0.95, k = 2, control = segmented::seg.control(), ... )
mixmod_regression(x, ...) ## S3 method for class 'wt_cdf_estimation' mixmod_regression( x, distribution = c("weibull", "lognormal", "loglogistic"), conf_level = 0.95, k = 2, control = segmented::seg.control(), ... )
x |
A |
... |
Further arguments passed to or from other methods. Currently not used. |
distribution |
Supposed distribution of the random variable. |
conf_level |
Confidence level of the interval. |
k |
Number of mixture components. If the data should be split in an
automated fashion, |
control |
Output of the call to seg.control, which is passed to segmented.lm. See 'Examples' for usage. |
The segmentation process is based on the lifetime realizations of failed units and their corresponding estimated failure probabilities for which intact items are taken into account. It is performed with the support of segmented.lm.
Segmentation can be done with a specified number of subgroups or in an automated
fashion (see argument k
). The algorithm tends to overestimate the number of
breakpoints when the separation is done automatically (see 'Warning' in
segmented.lm).
In the context of reliability analysis it is important that the main types of failures can be identified and analyzed separately. These are
early failures,
random failures and
wear-out failures.
In order to reduce the risk of overestimation as well as being able to consider
the main types of failures, a maximum of three subgroups (k = 3
) is recommended.
A list with classes wt_model
and wt_rank_regression
if no breakpoint
was detected. See rank_regression.
A list with classes wt_model
and wt_mixmod_regression
if at least one
breakpoint was determined. The length of the list depends on the number of
identified subgroups. Each list element contains the information provided by
rank_regression. In addition, the returned tibble data
of each list element
only retains information on the failed units and has two more columns:
q
: Quantiles of the standard distribution calculated from column prob
.
group
: Membership to the respective segment.
If more than one method was specified in estimate_cdf, the resulting output
is a list with classes wt_model
and wt_mixmod_regression_list
where each
list element has classes wt_model
and wt_mixmod_regression
.
Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
# Reliability data preparation: ## Data for mixture model: data_mix <- reliability_data( voltage, x = hours, status = status ) ## Data for simple unimodal distribution: data <- reliability_data( shock, x = distance, status = status ) # Probability estimation with one method: prob_mix <- estimate_cdf( data_mix, methods = "johnson" ) prob <- estimate_cdf( data, methods = "johnson" ) # Probability estimation for multiple methods: prob_mix_mult <- estimate_cdf( data_mix, methods = c("johnson", "kaplan", "nelson") ) # Example 1 - Mixture identification using k = 2 two-parametric Weibull models: mix_mod_weibull <- mixmod_regression( x = prob_mix, distribution = "weibull", conf_level = 0.99, k = 2 ) # Example 2 - Mixture identification using k = 3 two-parametric lognormal models: mix_mod_lognorm <- mixmod_regression( x = prob_mix, distribution = "lognormal", k = 3 ) # Example 3 - Mixture identification for multiple methods specified in estimate_cdf: mix_mod_mult <- mixmod_regression( x = prob_mix_mult, distribution = "loglogistic" ) # Example 4 - Mixture identification using control argument: mix_mod_control <- mixmod_regression( x = prob_mix, distribution = "weibull", control = segmented::seg.control(display = TRUE) ) # Example 5 - Mixture identification performs rank_regression for k = 1: mod <- mixmod_regression( x = prob, distribution = "weibull", k = 1 )
# Reliability data preparation: ## Data for mixture model: data_mix <- reliability_data( voltage, x = hours, status = status ) ## Data for simple unimodal distribution: data <- reliability_data( shock, x = distance, status = status ) # Probability estimation with one method: prob_mix <- estimate_cdf( data_mix, methods = "johnson" ) prob <- estimate_cdf( data, methods = "johnson" ) # Probability estimation for multiple methods: prob_mix_mult <- estimate_cdf( data_mix, methods = c("johnson", "kaplan", "nelson") ) # Example 1 - Mixture identification using k = 2 two-parametric Weibull models: mix_mod_weibull <- mixmod_regression( x = prob_mix, distribution = "weibull", conf_level = 0.99, k = 2 ) # Example 2 - Mixture identification using k = 3 two-parametric lognormal models: mix_mod_lognorm <- mixmod_regression( x = prob_mix, distribution = "lognormal", k = 3 ) # Example 3 - Mixture identification for multiple methods specified in estimate_cdf: mix_mod_mult <- mixmod_regression( x = prob_mix_mult, distribution = "loglogistic" ) # Example 4 - Mixture identification using control argument: mix_mod_control <- mixmod_regression( x = prob_mix, distribution = "weibull", control = segmented::seg.control(display = TRUE) ) # Example 5 - Mixture identification performs rank_regression for k = 1: mod <- mixmod_regression( x = prob, distribution = "weibull", k = 1 )
This function uses piecewise linear regression to divide the data into subgroups. See 'Details'.
## Default S3 method: mixmod_regression( x, y, status, distribution = c("weibull", "lognormal", "loglogistic"), conf_level = 0.95, k = 2, control = segmented::seg.control(), ... )
## Default S3 method: mixmod_regression( x, y, status, distribution = c("weibull", "lognormal", "loglogistic"), conf_level = 0.95, k = 2, control = segmented::seg.control(), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
y |
A numeric vector which consists of estimated failure probabilities
regarding the lifetime data in |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
distribution |
Supposed distribution of the random variable. |
conf_level |
Confidence level of the interval. |
k |
Number of mixture components. If the data should be split in an
automated fashion, |
control |
Output of the call to seg.control, which is passed to segmented.lm. See 'Examples' for usage. |
... |
Further arguments passed to or from other methods. Currently not used. |
The segmentation process is based on the lifetime realizations of failed units and their corresponding estimated failure probabilities for which intact items are taken into account. It is performed with the support of segmented.lm.
Segmentation can be done with a specified number of subgroups or in an automated
fashion (see argument k
). The algorithm tends to overestimate the number of
breakpoints when the separation is done automatically (see 'Warning' in
segmented.lm).
In the context of reliability analysis it is important that the main types of failures can be identified and analyzed separately. These are
early failures,
random failures and
wear-out failures.
In order to reduce the risk of overestimation as well as being able to consider
the main types of failures, a maximum of three subgroups (k = 3
) is recommended.
A list with classes wt_model
and wt_rank_regression
if no breakpoint
was detected. See rank_regression. The returned tibble data
is of class
wt_cdf_estimation
and contains the dummy columns cdf_estimation_method
and
id
. The former is filled with NA_character
, due to internal usage and the
latter is filled with "XXXXXX"
to point out that unit identification is not
possible when using the vector-based approach.
A list with classes wt_model
and wt_mixmod_regression
if at least one
breakpoint was determined. The length of the list depends on the number of
identified subgroups. Each list contains the information provided by
rank_regression. The returned tibble data
of each list element only retains
information on the failed units and has modified and additional columns:
id
: Modified id, overwritten with "XXXXXX"
to point out that unit
identification is not possible when using the vector-based approach.
cdf_estimation_method
: A character that is always NA_character
. Only
needed for internal use.
q
: Quantiles of the standard distribution calculated from column prob
.
group
: Membership to the respective segment.
Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
# Vectors: ## Data for mixture model: hours <- voltage$hours status <- voltage$status ## Data for simple unimodal distribution: distance <- shock$distance status_2 <- shock$status # Probability estimation with one method: prob_mix <- estimate_cdf( x = hours, status = status, method = "johnson" ) prob <- estimate_cdf( x = distance, status = status_2, method = "johnson" ) # Example 1 - Mixture identification using k = 2 two-parametric Weibull models: mix_mod_weibull <- mixmod_regression( x = prob_mix$x, y = prob_mix$prob, status = prob_mix$status, distribution = "weibull", conf_level = 0.99, k = 2 ) # Example 2 - Mixture identification using k = 3 two-parametric lognormal models: mix_mod_lognorm <- mixmod_regression( x = prob_mix$x, y = prob_mix$prob, status = prob_mix$status, distribution = "lognormal", k = 3 ) # Example 3 - Mixture identification using control argument: mix_mod_control <- mixmod_regression( x = prob_mix$x, y = prob_mix$prob, status = prob_mix$status, distribution = "weibull", k = 2, control = segmented::seg.control(display = TRUE) ) # Example 4 - Mixture identification performs rank_regression for k = 1: mod <- mixmod_regression( x = prob$x, y = prob$prob, status = prob$status, distribution = "weibull", k = 1 )
# Vectors: ## Data for mixture model: hours <- voltage$hours status <- voltage$status ## Data for simple unimodal distribution: distance <- shock$distance status_2 <- shock$status # Probability estimation with one method: prob_mix <- estimate_cdf( x = hours, status = status, method = "johnson" ) prob <- estimate_cdf( x = distance, status = status_2, method = "johnson" ) # Example 1 - Mixture identification using k = 2 two-parametric Weibull models: mix_mod_weibull <- mixmod_regression( x = prob_mix$x, y = prob_mix$prob, status = prob_mix$status, distribution = "weibull", conf_level = 0.99, k = 2 ) # Example 2 - Mixture identification using k = 3 two-parametric lognormal models: mix_mod_lognorm <- mixmod_regression( x = prob_mix$x, y = prob_mix$prob, status = prob_mix$status, distribution = "lognormal", k = 3 ) # Example 3 - Mixture identification using control argument: mix_mod_control <- mixmod_regression( x = prob_mix$x, y = prob_mix$prob, status = prob_mix$status, distribution = "weibull", k = 2, control = segmented::seg.control(display = TRUE) ) # Example 4 - Mixture identification performs rank_regression for k = 1: mod <- mixmod_regression( x = prob$x, y = prob$prob, status = prob$status, distribution = "weibull", k = 1 )
This function estimates the parameters of a parametric lifetime distribution for complete and (multiple) right-censored data. The parameters are determined in the frequently used (log-)location-scale parameterization.
For the Weibull, estimates are additionally transformed such that they are in line with the parameterization provided by the stats package (see Weibull).
ml_estimation(x, ...) ## S3 method for class 'wt_reliability_data' ml_estimation( x, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), wts = rep(1, nrow(x)), conf_level = 0.95, start_dist_params = NULL, control = list(), ... )
ml_estimation(x, ...) ## S3 method for class 'wt_reliability_data' ml_estimation( x, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), wts = rep(1, nrow(x)), conf_level = 0.95, start_dist_params = NULL, control = list(), ... )
x |
A |
... |
Further arguments passed to or from other methods. Currently not used. |
distribution |
Supposed distribution of the random variable. |
wts |
Optional vector of case weights. The length of |
conf_level |
Confidence level of the interval. |
start_dist_params |
Optional vector with initial values of the (log-)location-scale parameters. |
control |
A list of control parameters (see 'Details' and optim). |
Within ml_estimation
, optim is called with method = "BFGS"
and control$fnscale = -1
to estimate the parameters that maximize the
log-likelihood (see loglik_function). For threshold models, the profile
log-likelihood is maximized in advance (see loglik_profiling). Once the
threshold parameter is determined, the threshold model is treated like a
distribution without threshold (lifetime is reduced by threshold estimate)
and the general optimization routine is applied.
Normal approximation confidence intervals for the parameters are computed as well.
A list with classes wt_model
, wt_ml_estimation
and wt_model_estimation
which contains:
coefficients
: A named vector of estimated coefficients (parameters of the
assumed distribution). Note: The parameters are given in the
(log-)location-scale-parameterization.
confint
: Confidence intervals for the (log-)location-scale parameters.
shape_scale_coefficients
: Only included if distribution
is "weibull"
or "weibull3"
(parameterization used in Weibull).
shape_scale_confint
: Only included if distribution
is "weibull"
or "weibull3"
. Confidence intervals for scale and shape
(and threshold
if
distribution = "weibull3"
).
varcov
: Estimated variance-covariance matrix of (log-)location-scale parameters.
logL
: The log-likelihood value.
aic
: Akaike Information Criterion.
bic
: Bayesian Information Criterion.
data
: A tibble
with class wt_reliability_data
returned by
distribution
: Specified distribution.
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Reliability data preparation: ## Data for two-parametric model: data_2p <- reliability_data( shock, x = distance, status = status ) ## Data for three-parametric model: data_3p <- reliability_data( alloy, x = cycles, status = status ) # Example 1 - Fitting a two-parametric weibull distribution: ml_2p <- ml_estimation( data_2p, distribution = "weibull" ) # Example 2 - Fitting a three-parametric lognormal distribution: ml_3p <- ml_estimation( data_3p, distribution = "lognormal3", conf_level = 0.99 )
# Reliability data preparation: ## Data for two-parametric model: data_2p <- reliability_data( shock, x = distance, status = status ) ## Data for three-parametric model: data_3p <- reliability_data( alloy, x = cycles, status = status ) # Example 1 - Fitting a two-parametric weibull distribution: ml_2p <- ml_estimation( data_2p, distribution = "weibull" ) # Example 2 - Fitting a three-parametric lognormal distribution: ml_3p <- ml_estimation( data_3p, distribution = "lognormal3", conf_level = 0.99 )
This function estimates the parameters of a parametric lifetime distribution for complete and (multiple) right-censored data. The parameters are determined in the frequently used (log-)location-scale parameterization.
For the Weibull, estimates are additionally transformed such that they are in line with the parameterization provided by the stats package (see Weibull).
## Default S3 method: ml_estimation( x, status, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), wts = rep(1, length(x)), conf_level = 0.95, start_dist_params = NULL, control = list(), ... )
## Default S3 method: ml_estimation( x, status, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), wts = rep(1, length(x)), conf_level = 0.95, start_dist_params = NULL, control = list(), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
distribution |
Supposed distribution of the random variable. |
wts |
Optional vector of case weights. The length of |
conf_level |
Confidence level of the interval. |
start_dist_params |
Optional vector with initial values of the (log-)location-scale parameters. |
control |
A list of control parameters (see 'Details' and optim). |
... |
Further arguments passed to or from other methods. Currently not used. |
Within ml_estimation
, optim is called with method = "BFGS"
and control$fnscale = -1
to estimate the parameters that maximize the
log-likelihood (see loglik_function). For threshold models, the profile
log-likelihood is maximized in advance (see loglik_profiling). Once the
threshold parameter is determined, the threshold model is treated like a
distribution without threshold (lifetime is reduced by threshold estimate)
and the general optimization routine is applied.
Normal approximation confidence intervals for the parameters are computed as well.
A list with classes wt_model
, wt_ml_estimation
and wt_model_estimation
which contains:
coefficients
: A named vector of estimated coefficients (parameters of the
assumed distribution). Note: The parameters are given in the
(log-)location-scale-parameterization.
confint
: Confidence intervals for the (log-)location-scale parameters.
shape_scale_coefficients
: Only included if distribution
is "weibull"
or "weibull3"
(parameterization used in Weibull).
shape_scale_confint
: Only included if distribution
is "weibull"
or "weibull3"
. Confidence intervals for scale and shape
(and threshold
if
distribution = "weibull3"
).
varcov
: Estimated variance-covariance matrix of (log-)location-scale parameters.
logL
: The log-likelihood value.
aic
: Akaike Information Criterion.
bic
: Bayesian Information Criterion.
data
: A tibble
with columns x
and status
.
distribution
: Specified distribution.
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Vectors: obs <- seq(10000, 100000, 10000) status_1 <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) cycles <- alloy$cycles status_2 <- alloy$status # Example 1 - Fitting a two-parametric weibull distribution: ml <- ml_estimation( x = obs, status = status_1, distribution = "weibull", conf_level = 0.90 ) # Example 2 - Fitting a three-parametric lognormal distribution: ml_2 <- ml_estimation( x = cycles, status = status_2, distribution = "lognormal3" )
# Vectors: obs <- seq(10000, 100000, 10000) status_1 <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) cycles <- alloy$cycles status_2 <- alloy$status # Example 1 - Fitting a two-parametric weibull distribution: ml <- ml_estimation( x = obs, status = status_1, distribution = "weibull", conf_level = 0.90 ) # Example 2 - Fitting a three-parametric lognormal distribution: ml_2 <- ml_estimation( x = cycles, status = status_2, distribution = "lognormal3" )
mr_method()
is no longer under active development, switching to estimate_cdf
is recommended.
mr_method( x, status = rep(1, length(x)), id = NULL, method = c("benard", "invbeta"), ties.method = c("max", "min", "average") )
mr_method( x, status = rep(1, length(x)), id = NULL, method = c("benard", "invbeta"), ties.method = c("max", "min", "average") )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of ones indicating that every unit has failed. |
id |
A vector for the identification of every unit. Default is |
method |
Method for the estimation of the cdf. Can be |
ties.method |
A character string specifying how ties are treated,
default is |
This non-parametric approach (Median Ranks) is used to estimate the failure probabilities in terms of complete data. Two methods are available to estimate the cumulative distribution function F(t):
"benard"
: Benard's approximation for Median Ranks.
"invbeta"
: Exact Median Ranks using the inverse beta distribution.
A tibble with only failed units containing the following columns:
id
: Identification for every unit.
x
: Lifetime characteristic.
status
: Status of failed units (always 1).
rank
: Assigned ranks.
prob
: Estimated failure probabilities.
cdf_estimation_method
: Specified method for the estimation of failure
probabilities (always 'mr').
# Vectors: obs <- seq(10000, 100000, 10000) state <- rep(1, length(obs)) uic <- c("3435", "1203", "958X", "XX71", "abcd", "tz46", "fl29", "AX23", "Uy12", "kl1a") # Example 1 - Benard's approximation: tbl_mr <- mr_method( x = obs, status = state, id = uic, method = "benard" ) # Example 2 - Inverse beta distribution: tbl_mr_invbeta <- mr_method( x = obs, status = state, method = "invbeta" )
# Vectors: obs <- seq(10000, 100000, 10000) state <- rep(1, length(obs)) uic <- c("3435", "1203", "958X", "XX71", "abcd", "tz46", "fl29", "AX23", "Uy12", "kl1a") # Example 1 - Benard's approximation: tbl_mr <- mr_method( x = obs, status = state, id = uic, method = "benard" ) # Example 2 - Inverse beta distribution: tbl_mr_invbeta <- mr_method( x = obs, status = state, method = "invbeta" )
nelson_method()
is no longer under active development, switching to
estimate_cdf is recommended.
nelson_method(x, status, id = NULL)
nelson_method(x, status, id = NULL)
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
id |
A vector for the identification of every unit. Default is |
This non-parametric approach estimates the cumulative hazard rate in terms of (multiple) right censored data. By equating the definition of the hazard rate with the hazard rate according to Nelson-Aalen one can calculate the failure probabilities.
A tibble containing the following columns:
id
: Identification for every unit.
x
: Lifetime characteristic.
status
: Binary data (0 or 1) indicating whether a unit is a right
censored observation (= 0) or a failure (= 1).
rank
: Filled with NA
.
prob
: Estimated failure probabilities, NA
if status = 0
.
cdf_estimation_method
: Specified method for the estimation of failure
probabilities (always 'nelson').
# Vectors: obs <- seq(10000, 100000, 10000) state <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) uic <- c("3435", "1203", "958X", "XX71", "abcd", "tz46", "fl29", "AX23","Uy12", "kl1a") # Example - Nelson-Aalen estimator applied to intact and failed units: tbl_nel <- nelson_method( x = obs, status = state, id = uic )
# Vectors: obs <- seq(10000, 100000, 10000) state <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) uic <- c("3435", "1203", "958X", "XX71", "abcd", "tz46", "fl29", "AX23","Uy12", "kl1a") # Example - Nelson-Aalen estimator applied to intact and failed units: tbl_nel <- nelson_method( x = obs, status = state, id = uic )
This function is used to add estimated confidence region(s) to an existing probability plot. Since confidence regions are related to the estimated regression line, the latter is provided as well.
plot_conf(p_obj, x, ...) ## S3 method for class 'wt_confint' plot_conf( p_obj, x, title_trace_mod = "Fit", title_trace_conf = "Confidence Limit", ... )
plot_conf(p_obj, x, ...) ## S3 method for class 'wt_confint' plot_conf( p_obj, x, title_trace_mod = "Fit", title_trace_conf = "Confidence Limit", ... )
p_obj |
A plot object returned by plot_prob. |
x |
A tibble with class |
... |
Further arguments passed to or from other methods. Currently not used. |
title_trace_mod |
A character string which is assigned to the model trace in the legend. |
title_trace_conf |
A character string which is assigned to the confidence trace in the legend. |
A plot object containing the probability plot with plotting positions, the estimated regression line and the estimated confidence region(s).
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Reliability data: data <- reliability_data(data = alloy, x = cycles, status = status) # Probability estimation: prob_tbl <- estimate_cdf(data, methods = "johnson") # Example 1 - Probability Plot, Regression Line and Confidence Bounds for Three-Parameter-Weibull: rr <- rank_regression(prob_tbl, distribution = "weibull3") conf_betabin <- confint_betabinom(rr) plot_weibull <- plot_prob(prob_tbl, distribution = "weibull") plot_conf_beta <- plot_conf( p_obj = plot_weibull, x = conf_betabin ) # Example 2 - Probability Plot, Regression Line and Confidence Bounds for Three-Parameter-Lognormal: rr_ln <- rank_regression( prob_tbl, distribution = "lognormal3", conf_level = 0.9 ) conf_betabin_ln <- confint_betabinom( rr_ln, bounds = "two_sided", conf_level = 0.9, direction = "y" ) plot_lognormal <- plot_prob(prob_tbl, distribution = "lognormal") plot_conf_beta_ln <- plot_conf( p_obj = plot_lognormal, x = conf_betabin_ln ) # Example 3 - Probability Plot, Regression Line and Confidence Bounds for MLE ml <- ml_estimation(data, distribution = "weibull") conf_fisher <- confint_fisher(ml) plot_weibull <- plot_prob(prob_tbl, distribution = "weibull") plot_conf_fisher_weibull <- plot_conf( p_obj = plot_weibull, x = conf_fisher )
# Reliability data: data <- reliability_data(data = alloy, x = cycles, status = status) # Probability estimation: prob_tbl <- estimate_cdf(data, methods = "johnson") # Example 1 - Probability Plot, Regression Line and Confidence Bounds for Three-Parameter-Weibull: rr <- rank_regression(prob_tbl, distribution = "weibull3") conf_betabin <- confint_betabinom(rr) plot_weibull <- plot_prob(prob_tbl, distribution = "weibull") plot_conf_beta <- plot_conf( p_obj = plot_weibull, x = conf_betabin ) # Example 2 - Probability Plot, Regression Line and Confidence Bounds for Three-Parameter-Lognormal: rr_ln <- rank_regression( prob_tbl, distribution = "lognormal3", conf_level = 0.9 ) conf_betabin_ln <- confint_betabinom( rr_ln, bounds = "two_sided", conf_level = 0.9, direction = "y" ) plot_lognormal <- plot_prob(prob_tbl, distribution = "lognormal") plot_conf_beta_ln <- plot_conf( p_obj = plot_lognormal, x = conf_betabin_ln ) # Example 3 - Probability Plot, Regression Line and Confidence Bounds for MLE ml <- ml_estimation(data, distribution = "weibull") conf_fisher <- confint_fisher(ml) plot_weibull <- plot_prob(prob_tbl, distribution = "weibull") plot_conf_fisher_weibull <- plot_conf( p_obj = plot_weibull, x = conf_fisher )
This function is used to add estimated confidence region(s) to an existing probability plot which also includes the estimated regression line.
## Default S3 method: plot_conf( p_obj, x, y, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), direction = c("y", "x"), title_trace = "Confidence Limit", ... )
## Default S3 method: plot_conf( p_obj, x, y, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), direction = c("y", "x"), title_trace = "Confidence Limit", ... )
p_obj |
A plot object returned by plot_mod. |
x |
A list containing the x-coordinates of the confidence region(s). The list can be of length 1 or 2. For more information see Details. |
y |
A list containing the y-coordinates of the Confidence Region(s). The list can be of length 1 or 2. For more information see Details. |
distribution |
Supposed distribution of the random variable. |
direction |
A character string specifying the direction of the plotted
interval(s). |
title_trace |
A character string which is assigned to the legend trace. |
... |
Further arguments passed to or from other methods. Currently not used. |
It is important that the length of the vectors provided as lists in x
and
y
match with the length of the vectors x
and y
in the function plot_mod.
For this reason the following procedure is recommended:
Calculate confidence intervals with the function confint_betabinom or
confint_fisher and store it in a data.frame
. For instance call it df.
Inside plot_mod use the output df$x
for x
and df$prob
for y
of
the function(s) named before.
In Examples the described approach is shown with code.
A plot object containing the probability plot with plotting positions, the estimated regression line and the estimated confidence region(s).
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Vectors: cycles <- alloy$cycles status <- alloy$status prob_tbl <- estimate_cdf(x = cycles, status = status, method = "johnson") # Example 1 - Probability Plot, Regression Line and Confidence Bounds for Three-Parameter-Weibull: rr <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "weibull3" ) conf_betabin <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr$coefficients, distribution = "weibull3" ) plot_weibull <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "weibull" ) plot_reg_weibull <- plot_mod( p_obj = plot_weibull, x = conf_betabin$x, y = conf_betabin$prob, dist_params = rr$coefficients, distribution = "weibull3" ) plot_conf_beta <- plot_conf( p_obj = plot_reg_weibull, x = list(conf_betabin$x), y = list(conf_betabin$lower_bound, conf_betabin$upper_bound), direction = "y", distribution = "weibull3" ) # Example 2 - Probability Plot, Regression Line and Confidence Bounds for Three-Parameter-Lognormal: rr_ln <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "lognormal3" ) conf_betabin_ln <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr_ln$coefficients, distribution = "lognormal3" ) plot_lognormal <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "lognormal" ) plot_reg_lognormal <- plot_mod( p_obj = plot_lognormal, x = conf_betabin_ln$x, y = conf_betabin_ln$prob, dist_params = rr_ln$coefficients, distribution = "lognormal3" ) plot_conf_beta_ln <- plot_conf( p_obj = plot_reg_lognormal, x = list(conf_betabin_ln$x), y = list(conf_betabin_ln$lower_bound, conf_betabin_ln$upper_bound), direction = "y", distribution = "lognormal3" )
# Vectors: cycles <- alloy$cycles status <- alloy$status prob_tbl <- estimate_cdf(x = cycles, status = status, method = "johnson") # Example 1 - Probability Plot, Regression Line and Confidence Bounds for Three-Parameter-Weibull: rr <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "weibull3" ) conf_betabin <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr$coefficients, distribution = "weibull3" ) plot_weibull <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "weibull" ) plot_reg_weibull <- plot_mod( p_obj = plot_weibull, x = conf_betabin$x, y = conf_betabin$prob, dist_params = rr$coefficients, distribution = "weibull3" ) plot_conf_beta <- plot_conf( p_obj = plot_reg_weibull, x = list(conf_betabin$x), y = list(conf_betabin$lower_bound, conf_betabin$upper_bound), direction = "y", distribution = "weibull3" ) # Example 2 - Probability Plot, Regression Line and Confidence Bounds for Three-Parameter-Lognormal: rr_ln <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "lognormal3" ) conf_betabin_ln <- confint_betabinom( x = prob_tbl$x, status = prob_tbl$status, dist_params = rr_ln$coefficients, distribution = "lognormal3" ) plot_lognormal <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "lognormal" ) plot_reg_lognormal <- plot_mod( p_obj = plot_lognormal, x = conf_betabin_ln$x, y = conf_betabin_ln$prob, dist_params = rr_ln$coefficients, distribution = "lognormal3" ) plot_conf_beta_ln <- plot_conf( p_obj = plot_reg_lognormal, x = list(conf_betabin_ln$x), y = list(conf_betabin_ln$lower_bound, conf_betabin_ln$upper_bound), direction = "y", distribution = "lognormal3" )
This function adds one or multiple estimated regression lines to an existing probability plot (plot_prob). Depending on the output of the functions rank_regression, ml_estimation, mixmod_regression or mixmod_em one or multiple lines are plotted.
plot_mod(p_obj, x, ...) ## S3 method for class 'wt_model' plot_mod(p_obj, x, title_trace = "Fit", ...)
plot_mod(p_obj, x, ...) ## S3 method for class 'wt_model' plot_mod(p_obj, x, title_trace = "Fit", ...)
p_obj |
A plot object returned by plot_prob. |
x |
A list with class |
... |
Further arguments passed to or from other methods. Currently not used. |
title_trace |
A character string which is assigned to the legend trace. |
The name of the legend entry is a combination of the title_trace
and the
number of determined subgroups from mixmod_regression or mixmod_em. If
title_trace = "Line"
and the data could be split in two groups, the legend
entries would be "Line: 1" and "Line: 2".
A plot object containing the probability plot with plotting positions and the estimated regression line(s).
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Reliability data: data <- reliability_data(data = alloy, x = cycles, status = status) # Probability estimation: prob_tbl <- estimate_cdf(data, methods = c("johnson", "kaplan")) ## Rank Regression # Example 1 - Probability Plot and Regression Line Three-Parameter-Weibull: plot_weibull <- plot_prob(prob_tbl, distribution = "weibull") rr_weibull <- rank_regression(prob_tbl, distribution = "weibull3") plot_reg_weibull <- plot_mod(p_obj = plot_weibull, x = rr_weibull) # Example 2 - Probability Plot and Regression Line Three-Parameter-Lognormal: plot_lognormal <- plot_prob(prob_tbl, distribution = "lognormal") rr_lognormal <- rank_regression(prob_tbl, distribution = "lognormal3") plot_reg_lognormal <- plot_mod(p_obj = plot_lognormal, x = rr_lognormal) ## ML Estimation # Example 3 - Probability Plot and Regression Line Two-Parameter-Weibull: plot_weibull <- plot_prob(prob_tbl, distribution = "weibull") ml_weibull_2 <- ml_estimation(data, distribution = "weibull") plot_reg_weibull_2 <- plot_mod(p_obj = plot_weibull, ml_weibull_2) ## Mixture Identification # Reliability data: data_mix <- reliability_data(voltage, x = hours, status = status) # Probability estimation: prob_mix <- estimate_cdf( data_mix, methods = c("johnson", "kaplan", "nelson") ) # Example 4 - Probability Plot and Regression Line Mixmod Regression: mix_mod_rr <- mixmod_regression(prob_mix, distribution = "weibull") plot_weibull <- plot_prob(mix_mod_rr) plot_reg_mix_mod_rr <- plot_mod(p_obj = plot_weibull, x = mix_mod_rr) # Example 5 - Probability Plot and Regression Line Mixmod EM: mix_mod_em <- mixmod_em(data_mix) plot_weibull <- plot_prob(mix_mod_em) plot_reg_mix_mod_em <- plot_mod(p_obj = plot_weibull, x = mix_mod_em)
# Reliability data: data <- reliability_data(data = alloy, x = cycles, status = status) # Probability estimation: prob_tbl <- estimate_cdf(data, methods = c("johnson", "kaplan")) ## Rank Regression # Example 1 - Probability Plot and Regression Line Three-Parameter-Weibull: plot_weibull <- plot_prob(prob_tbl, distribution = "weibull") rr_weibull <- rank_regression(prob_tbl, distribution = "weibull3") plot_reg_weibull <- plot_mod(p_obj = plot_weibull, x = rr_weibull) # Example 2 - Probability Plot and Regression Line Three-Parameter-Lognormal: plot_lognormal <- plot_prob(prob_tbl, distribution = "lognormal") rr_lognormal <- rank_regression(prob_tbl, distribution = "lognormal3") plot_reg_lognormal <- plot_mod(p_obj = plot_lognormal, x = rr_lognormal) ## ML Estimation # Example 3 - Probability Plot and Regression Line Two-Parameter-Weibull: plot_weibull <- plot_prob(prob_tbl, distribution = "weibull") ml_weibull_2 <- ml_estimation(data, distribution = "weibull") plot_reg_weibull_2 <- plot_mod(p_obj = plot_weibull, ml_weibull_2) ## Mixture Identification # Reliability data: data_mix <- reliability_data(voltage, x = hours, status = status) # Probability estimation: prob_mix <- estimate_cdf( data_mix, methods = c("johnson", "kaplan", "nelson") ) # Example 4 - Probability Plot and Regression Line Mixmod Regression: mix_mod_rr <- mixmod_regression(prob_mix, distribution = "weibull") plot_weibull <- plot_prob(mix_mod_rr) plot_reg_mix_mod_rr <- plot_mod(p_obj = plot_weibull, x = mix_mod_rr) # Example 5 - Probability Plot and Regression Line Mixmod EM: mix_mod_em <- mixmod_em(data_mix) plot_weibull <- plot_prob(mix_mod_em) plot_reg_mix_mod_em <- plot_mod(p_obj = plot_weibull, x = mix_mod_em)
plot_mod_mix()
is no longer under active development, switching to
plot_mod is recommended.
plot_mod_mix( p_obj, x, status, mix_output, distribution = c("weibull", "lognormal", "loglogistic"), title_trace = "Fit", ... )
plot_mod_mix( p_obj, x, status, mix_output, distribution = c("weibull", "lognormal", "loglogistic"), title_trace = "Fit", ... )
p_obj |
A plot object returned by plot_prob_mix. |
x |
A numeric vector containing the x-coordinates of the respective regression line. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
mix_output |
A list returned by mixmod_regression or mixmod_em, which consists of elements necessary to visualize the regression lines. |
distribution |
Supposed distribution of the random variable. |
title_trace |
A character string which is assigned to the legend trace. |
... |
Further arguments passed to or from other methods. Currently not used. |
This function adds one or multiple estimated regression lines to an existing probability plot plot_prob). Depending on the output of the function mixmod_regression or mixmod_em one or multiple lines are plotted.
The name of the legend entry is a combination of the title_trace
and the
number of determined subgroups. If title_trace = "Line"
and the data has
been split in two groups, the legend entries would be "Line: 1"
and
"Line: 2"
.
A plot object containing the probability plot with plotting positions and estimated regression line(s).
Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
# Vectors: hours <- voltage$hours status <- voltage$status # Example 1 - Using result of mixmod_em in mix_output: mix_mod_em <- mixmod_em( x = hours, status = status, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 150 ) plot_weibull_em <- plot_prob_mix( x = hours, status = status, id = id, distribution = "weibull", mix_output = mix_mod_em ) plot_weibull_emlines <- plot_mod_mix( p_obj = plot_weibull_em, x = hours, status = status, mix_output = mix_mod_em, distribution = "weibull" ) # Example 2 - Using result of mixmod_regression in mix_output: john <- johnson_method(x = hours, status = status) mix_mod_reg <- mixmod_regression( x = john$x, y = john$prob, status = john$status, distribution = "weibull" ) plot_weibull_reg <- plot_prob_mix( x = john$x, status = john$status, id = john$id, distribution = "weibull", mix_output = mix_mod_reg, ) plot_weibull_reglines <- plot_mod_mix( p_obj = plot_weibull_reg, x = john$x, status = john$status, mix_output = mix_mod_reg, distribution = "weibull" )
# Vectors: hours <- voltage$hours status <- voltage$status # Example 1 - Using result of mixmod_em in mix_output: mix_mod_em <- mixmod_em( x = hours, status = status, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 150 ) plot_weibull_em <- plot_prob_mix( x = hours, status = status, id = id, distribution = "weibull", mix_output = mix_mod_em ) plot_weibull_emlines <- plot_mod_mix( p_obj = plot_weibull_em, x = hours, status = status, mix_output = mix_mod_em, distribution = "weibull" ) # Example 2 - Using result of mixmod_regression in mix_output: john <- johnson_method(x = hours, status = status) mix_mod_reg <- mixmod_regression( x = john$x, y = john$prob, status = john$status, distribution = "weibull" ) plot_weibull_reg <- plot_prob_mix( x = john$x, status = john$status, id = john$id, distribution = "weibull", mix_output = mix_mod_reg, ) plot_weibull_reglines <- plot_mod_mix( p_obj = plot_weibull_reg, x = john$x, status = john$status, mix_output = mix_mod_reg, distribution = "weibull" )
This function adds an estimated regression line to an existing probability plot (plot_prob).
## Default S3 method: plot_mod( p_obj, x, dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), title_trace = "Fit", ... )
## Default S3 method: plot_mod( p_obj, x, dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), title_trace = "Fit", ... )
p_obj |
A plot object returned by plot_prob. |
x |
A numeric vector containing the x-coordinates of the respective regression line. |
dist_params |
A (named) numeric vector of estimated location and scale
parameters for a specified distribution. The order of elements is important.
First entry needs to be the location parameter |
distribution |
Supposed distribution of the random variable. |
title_trace |
A character string which is assigned to the legend trace. |
... |
Further arguments passed to or from other methods. Currently not used. |
A plot object containing the probability plot with plotting positions and the estimated regression line.
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Vectors: cycles <- alloy$cycles status <- alloy$status # Probability estimation prob_tbl <- estimate_cdf(x = cycles, status = status, method = "johnson") # Example 1: Probability Plot and Regression Line Three-Parameter-Weibull: plot_weibull <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "weibull" ) rr <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "weibull3" ) plot_reg_weibull <- plot_mod( p_obj = plot_weibull, x = prob_tbl$x, dist_params = rr$coefficients, distribution = "weibull3" ) # Example 2: Probability Plot and Regression Line Three-Parameter-Lognormal: plot_lognormal <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "lognormal" ) rr_ln <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "lognormal3" ) plot_reg_lognormal <- plot_mod( p_obj = plot_lognormal, x = prob_tbl$x, dist_params = rr_ln$coefficients, distribution = "lognormal3" ) ## Mixture Identification # Vectors: hours <- voltage$hours status <- voltage$status # Probability estimation: prob_mix <- estimate_cdf( x = hours, status = status, method = "johnson" )
# Vectors: cycles <- alloy$cycles status <- alloy$status # Probability estimation prob_tbl <- estimate_cdf(x = cycles, status = status, method = "johnson") # Example 1: Probability Plot and Regression Line Three-Parameter-Weibull: plot_weibull <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "weibull" ) rr <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "weibull3" ) plot_reg_weibull <- plot_mod( p_obj = plot_weibull, x = prob_tbl$x, dist_params = rr$coefficients, distribution = "weibull3" ) # Example 2: Probability Plot and Regression Line Three-Parameter-Lognormal: plot_lognormal <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "lognormal" ) rr_ln <- rank_regression( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, distribution = "lognormal3" ) plot_reg_lognormal <- plot_mod( p_obj = plot_lognormal, x = prob_tbl$x, dist_params = rr_ln$coefficients, distribution = "lognormal3" ) ## Mixture Identification # Vectors: hours <- voltage$hours status <- voltage$status # Probability estimation: prob_mix <- estimate_cdf( x = hours, status = status, method = "johnson" )
This function adds one (or multiple) linearized CDF(s) to an existing plot grid.
plot_pop( p_obj = NULL, x, dist_params_tbl, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "exponential"), tol = 1e-06, title_trace = "Population", plot_method = c("plotly", "ggplot2") )
plot_pop( p_obj = NULL, x, dist_params_tbl, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "exponential"), tol = 1e-06, title_trace = "Population", plot_method = c("plotly", "ggplot2") )
p_obj |
A plot object to which the population line(s) is (are) added or
|
x |
A numeric vector of length two or greater used for the x coordinates
of the population line. If |
dist_params_tbl |
A |
distribution |
Supposed distribution of the random variable. The distinction
between a threshold distribution and the respective standard variant is made with
|
tol |
The failure probability is restricted to the interval
|
title_trace |
A character string which is assigned to the legend trace. |
plot_method |
Package, which is used for generating the plot output. Only
used when |
dist_params_tbl
is a data.frame
with parameter columns. An overview of the
distribution-specific parameters and their order can be found in section
'Distributions'.
If only one population line should be displayed, a numeric vector is also supported. The order of the vector elements also corresponds to the table in section 'Distributions'.
A plot object containing the linearized CDF(s).
The following table summarizes the available distributions and their parameters
location parameter ,
scale parameter or
and
threshold parameter .
The column order within dist_params_tbl
is given in the table header.
distribution |
dist_params_tbl[1] |
dist_params_tbl[2] |
dist_params_tbl[3] |
"sev" |
|
|
- |
"weibull" |
|
|
( ) |
"normal" |
|
|
- |
"lognormal" |
|
|
( ) |
"logistic" |
|
|
- |
"loglogistic" |
|
|
( ) |
"exponential" |
|
( ) |
- |
x <- rweibull(n = 100, shape = 1, scale = 20000) # Example 1 - Two-parametric straight line: pop_weibull <- plot_pop( p_obj = NULL, x = range(x), dist_params_tbl = c(log(20000), 1), distribution = "weibull" ) # Example 2 - Three-parametric curved line: x2 <- rweibull(n = 100, shape = 1, scale = 20000) + 5000 pop_weibull_2 <- plot_pop( p_obj = NULL, x = x2, dist_params_tbl = c(log(20000 - 5000), 1, 5000), distribution = "weibull" ) # Example 3 - Multiple lines: pop_weibull_3 <- plot_pop( p_obj = NULL, x = x, dist_params_tbl = data.frame( p_1 = c(log(20000), log(20000), log(20000)), p_2 = c(1, 1.5, 2) ), distribution = "weibull", plot_method = "ggplot2" ) # Example 4 - Compare two- and three-parametric distributions: pop_weibull_4 <- plot_pop( p_obj = NULL, x = x, dist_params_tbl = data.frame( param_1 = c(log(20000), log(20000)), param_2 = c(1, 1), param_3 = c(NA, 2) ), distribution = "weibull" )
x <- rweibull(n = 100, shape = 1, scale = 20000) # Example 1 - Two-parametric straight line: pop_weibull <- plot_pop( p_obj = NULL, x = range(x), dist_params_tbl = c(log(20000), 1), distribution = "weibull" ) # Example 2 - Three-parametric curved line: x2 <- rweibull(n = 100, shape = 1, scale = 20000) + 5000 pop_weibull_2 <- plot_pop( p_obj = NULL, x = x2, dist_params_tbl = c(log(20000 - 5000), 1, 5000), distribution = "weibull" ) # Example 3 - Multiple lines: pop_weibull_3 <- plot_pop( p_obj = NULL, x = x, dist_params_tbl = data.frame( p_1 = c(log(20000), log(20000), log(20000)), p_2 = c(1, 1.5, 2) ), distribution = "weibull", plot_method = "ggplot2" ) # Example 4 - Compare two- and three-parametric distributions: pop_weibull_4 <- plot_pop( p_obj = NULL, x = x, dist_params_tbl = data.frame( param_1 = c(log(20000), log(20000)), param_2 = c(1, 1), param_3 = c(NA, 2) ), distribution = "weibull" )
This function is used to apply the graphical technique of probability
plotting. It is either applied to the output of estimate_cdf
(plot_prob.wt_cdf_estimation
) or to the output of a mixture model from
mixmod_regression / mixmod_em (plot_prob.wt_model
). Note that in the
latter case no distribution has to be specified because it is inferred from
the model.
plot_prob(x, ...) ## S3 method for class 'wt_cdf_estimation' plot_prob( x, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "exponential"), title_main = "Probability Plot", title_x = "Characteristic", title_y = "Unreliability", title_trace = "Sample", plot_method = c("plotly", "ggplot2"), ... ) ## S3 method for class 'wt_model' plot_prob( x, title_main = "Probability Plot", title_x = "Characteristic", title_y = "Unreliability", title_trace = "Sample", plot_method = c("plotly", "ggplot2"), ... )
plot_prob(x, ...) ## S3 method for class 'wt_cdf_estimation' plot_prob( x, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "exponential"), title_main = "Probability Plot", title_x = "Characteristic", title_y = "Unreliability", title_trace = "Sample", plot_method = c("plotly", "ggplot2"), ... ) ## S3 method for class 'wt_model' plot_prob( x, title_main = "Probability Plot", title_x = "Characteristic", title_y = "Unreliability", title_trace = "Sample", plot_method = c("plotly", "ggplot2"), ... )
x |
A tibble with class |
... |
Further arguments passed to or from other methods. Currently not used. |
distribution |
Supposed distribution of the random variable. |
title_main |
A character string which is assigned to the main title. |
title_x |
A character string which is assigned to the title of the x axis. |
title_y |
A character string which is assigned to the title of the y axis. |
title_trace |
A character string which is assigned to the legend trace. |
plot_method |
Package, which is used for generating the plot output. |
If x
was split by mixmod_em, estimate_cdf with method "johnson"
is
applied to subgroup-specific data. The calculated plotting positions are
shaped according to the determined split in mixmod_em.
In mixmod_regression a maximum of three subgroups can be determined and thus being plotted. The intention of this function is to give the user a hint for the existence of a mixture model. An in-depth analysis should be done afterwards.
For plot_method == "plotly"
the marker label for x and y are determined by
the first word provided in the argument title_x
and title_y
respectively,
i.e. if title_x = "Mileage in km"
the x label of the marker is "Mileage".
The name of the legend entry is a combination of the title_trace
and the
number of determined subgroups (if any). If title_trace = "Group"
and the
data has been split in two groups, the legend entries are "Group: 1" and
"Group: 2".
A plot object containing the probability plot.
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Reliability data: data <- reliability_data( alloy, x = cycles, status = status ) # Probability estimation: prob_tbl <- estimate_cdf( data, methods = c("johnson", "kaplan") ) # Example 1 - Probability Plot Weibull: plot_weibull <- plot_prob(prob_tbl) # Example 2 - Probability Plot Lognormal: plot_lognormal <- plot_prob( x = prob_tbl, distribution = "lognormal" ) ## Mixture identification # Reliability data: data_mix <- reliability_data( voltage, x = hours, status = status ) prob_mix <- estimate_cdf( data_mix, methods = c("johnson", "kaplan") ) # Example 3 - Mixture identification using mixmod_regression: mix_mod_rr <- mixmod_regression(prob_mix) plot_mix_mod_rr <- plot_prob(x = mix_mod_rr) # Example 4 - Mixture identification using mixmod_em: mix_mod_em <- mixmod_em(data_mix) plot_mix_mod_em <- plot_prob(x = mix_mod_em)
# Reliability data: data <- reliability_data( alloy, x = cycles, status = status ) # Probability estimation: prob_tbl <- estimate_cdf( data, methods = c("johnson", "kaplan") ) # Example 1 - Probability Plot Weibull: plot_weibull <- plot_prob(prob_tbl) # Example 2 - Probability Plot Lognormal: plot_lognormal <- plot_prob( x = prob_tbl, distribution = "lognormal" ) ## Mixture identification # Reliability data: data_mix <- reliability_data( voltage, x = hours, status = status ) prob_mix <- estimate_cdf( data_mix, methods = c("johnson", "kaplan") ) # Example 3 - Mixture identification using mixmod_regression: mix_mod_rr <- mixmod_regression(prob_mix) plot_mix_mod_rr <- plot_prob(x = mix_mod_rr) # Example 4 - Mixture identification using mixmod_em: mix_mod_em <- mixmod_em(data_mix) plot_mix_mod_em <- plot_prob(x = mix_mod_em)
plot_prob_mix()
is no longer under active development, switching to
plot_prob is recommended.
plot_prob_mix( x, status, id = rep("XXXXXX", length(x)), distribution = c("weibull", "lognormal", "loglogistic"), mix_output, title_main = "Probability Plot", title_x = "Characteristic", title_y = "Unreliability", title_trace = "Sample", plot_method = c("plotly", "ggplot2"), ... )
plot_prob_mix( x, status, id = rep("XXXXXX", length(x)), distribution = c("weibull", "lognormal", "loglogistic"), mix_output, title_main = "Probability Plot", title_x = "Characteristic", title_y = "Unreliability", title_trace = "Sample", plot_method = c("plotly", "ggplot2"), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
id |
Identification for every unit. |
distribution |
Supposed distribution of the random variable. |
mix_output |
A list returned by mixmod_regression or mixmod_em, which
consists of values necessary to visualize the subgroups.The default value of
|
title_main |
A character string which is assigned to the main title. |
title_x |
A character string which is assigned to the title of the x axis. |
title_y |
A character string which is assigned to the title of the y axis. |
title_trace |
A character string which is assigned to the legend trace. |
plot_method |
Package, which is used for generating the plot output. |
... |
Further arguments passed to or from other methods. Currently not used. |
This function is used to apply the graphical technique of probability plotting to univariate mixture models that have been separated with functions mixmod_regression or mixmod_em.
If data has been split by mixmod_em
the function johnson_method
is applied
to subgroup-specific data. The calculated plotting positions are shaped
regarding the obtained split of the used splitting function.
In mixmod_regression a maximum of three subgroups can be determined and thus being plotted. The intention of this function is to give the user a hint for the existence of a mixture model. An in-depth analysis should be done afterwards.
The marker label for x and y are determined by the first word provided in the
argument title_x
and title_y
respectively, i.e. if
title_x = "Mileage in km"
the x label of the marker is "Mileage".
The name of the legend entry is a combination of the title_trace
and the
number of determined subgroups (if any). If title_trace = "Group"
and the
data has been split in two groups, the legend entries are "Group: 1" and
"Group: 2".
Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
# Vectors: hours <- voltage$hours status <- voltage$status # Example 1 - Using result of mixmod_em: mix_mod_em <- mixmod_em( x = hours, status = status ) plot_weibull_em <- plot_prob_mix( x = hours, status = status, distribution = "weibull", mix_output = mix_mod_em ) # Example 2 - Using result of mixmod_regression: john <- estimate_cdf( x = hours, status = status, method = "johnson" ) mix_mod_reg <- mixmod_regression( x = john$x, y = john$prob, status = john$status, distribution = "weibull" ) plot_weibull_reg <- plot_prob_mix( x = hours, status = status, distribution = "weibull", mix_output = mix_mod_reg )
# Vectors: hours <- voltage$hours status <- voltage$status # Example 1 - Using result of mixmod_em: mix_mod_em <- mixmod_em( x = hours, status = status ) plot_weibull_em <- plot_prob_mix( x = hours, status = status, distribution = "weibull", mix_output = mix_mod_em ) # Example 2 - Using result of mixmod_regression: john <- estimate_cdf( x = hours, status = status, method = "johnson" ) mix_mod_reg <- mixmod_regression( x = john$x, y = john$prob, status = john$status, distribution = "weibull" ) plot_weibull_reg <- plot_prob_mix( x = hours, status = status, distribution = "weibull", mix_output = mix_mod_reg )
This function is used to apply the graphical technique of probability plotting.
## Default S3 method: plot_prob( x, y, status, id = rep("XXXXXX", length(x)), distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "exponential"), title_main = "Probability Plot", title_x = "Characteristic", title_y = "Unreliability", title_trace = "Sample", plot_method = c("plotly", "ggplot2"), ... )
## Default S3 method: plot_prob( x, y, status, id = rep("XXXXXX", length(x)), distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "exponential"), title_main = "Probability Plot", title_x = "Characteristic", title_y = "Unreliability", title_trace = "Sample", plot_method = c("plotly", "ggplot2"), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
y |
A numeric vector which consists of estimated failure probabilities
regarding the lifetime data in |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
id |
Identification for every unit. |
distribution |
Supposed distribution of the random variable. |
title_main |
A character string which is assigned to the main title. |
title_x |
A character string which is assigned to the title of the x axis. |
title_y |
A character string which is assigned to the title of the y axis. |
title_trace |
A character string which is assigned to the legend trace. |
plot_method |
Package, which is used for generating the plot output. |
... |
Further arguments passed to or from other methods. Currently not used. |
For plot_method == "plotly"
the marker label for x and y are determined by
the first word provided in the argument title_x
and title_y
respectively,
i.e. if title_x = "Mileage in km"
the x label of the marker is "Mileage".
A plot object containing the probability plot.
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Vectors: cycles <- alloy$cycles status <- alloy$status # Probability estimation: prob_tbl <- estimate_cdf( x = cycles, status = status, method = "johnson" ) # Example 1: Probability Plot Weibull: plot_weibull <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id ) # Example 2: Probability Plot Lognormal: plot_lognormal <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "lognormal" )
# Vectors: cycles <- alloy$cycles status <- alloy$status # Probability estimation: prob_tbl <- estimate_cdf( x = cycles, status = status, method = "johnson" ) # Example 1: Probability Plot Weibull: plot_weibull <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id ) # Example 2: Probability Plot Lognormal: plot_lognormal <- plot_prob( x = prob_tbl$x, y = prob_tbl$prob, status = prob_tbl$status, id = prob_tbl$id, distribution = "lognormal" )
This function predicts the (failure) probabilities of a parametric lifetime distribution using the (log-)location-scale parameterization.
predict_prob( q, dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2") )
predict_prob( q, dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2") )
q |
A numeric vector of quantiles. |
dist_params |
A vector of parameters. An overview of the distribution-specific parameters can be found in section 'Distributions'. |
distribution |
Supposed distribution of the random variable. |
For a given set of parameters and specified quantiles the probabilities of the chosen model are determined.
A vector with predicted (failure) probabilities.
The following table summarizes the available distributions and their parameters
location parameter ,
scale parameter or
and
threshold parameter .
The order within dist_params
is given in the table header.
distribution |
dist_params[1] |
dist_params[2] |
dist_params[3] |
"sev" |
|
|
- |
"weibull" |
|
|
- |
"weibull3" |
|
|
|
"normal" |
|
|
- |
"lognormal" |
|
|
- |
"lognormal3" |
|
|
|
"logistic" |
|
|
- |
"loglogistic" |
|
|
- |
"loglogistic3" |
|
|
|
"exponential" |
|
- | - |
"exponential2" |
|
|
- |
# Example 1 - Predicted probabilities for a two-parameter weibull distribution: probs_weib2 <- predict_prob( q = c(15, 48, 124), dist_params = c(5, 0.5), distribution = "weibull" ) # Example 2 - Predicted quantiles for a three-parameter weibull distribution: probs_weib3 <- predict_prob( q = c(25, 58, 134), dist_params = c(5, 0.5, 10), distribution = "weibull3" )
# Example 1 - Predicted probabilities for a two-parameter weibull distribution: probs_weib2 <- predict_prob( q = c(15, 48, 124), dist_params = c(5, 0.5), distribution = "weibull" ) # Example 2 - Predicted quantiles for a three-parameter weibull distribution: probs_weib3 <- predict_prob( q = c(25, 58, 134), dist_params = c(5, 0.5, 10), distribution = "weibull3" )
This function predicts the quantiles of a parametric lifetime distribution using the (log-)location-scale parameterization.
predict_quantile( p, dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2") )
predict_quantile( p, dist_params, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2") )
p |
A numeric vector of probabilities. |
dist_params |
A vector of parameters. An overview of the distribution-specific parameters can be found in section 'Distributions'. |
distribution |
Supposed distribution of the random variable. |
For a given set of parameters and specified probabilities the quantiles of the chosen model are determined.
A vector with predicted quantiles.
The following table summarizes the available distributions and their parameters
location parameter ,
scale parameter or
and
threshold parameter .
The order within dist_params
is given in the table header.
distribution |
dist_params[1] |
dist_params[2] |
dist_params[3] |
"sev" |
|
|
- |
"weibull" |
|
|
- |
"weibull3" |
|
|
|
"normal" |
|
|
- |
"lognormal" |
|
|
- |
"lognormal3" |
|
|
|
"logistic" |
|
|
- |
"loglogistic" |
|
|
- |
"loglogistic3" |
|
|
|
"exponential" |
|
- | - |
"exponential2" |
|
|
- |
# Example 1 - Predicted quantiles for a two-parameter weibull distribution: quants_weib2 <- predict_quantile( p = c(0.01, 0.1, 0.5), dist_params = c(5, 0.5), distribution = "weibull" ) # Example 2 - Predicted quantiles for a three-parameter weibull distribution: quants_weib3 <- predict_quantile( p = c(0.01, 0.1, 0.5), dist_params = c(5, 0.5, 10), distribution = "weibull3" )
# Example 1 - Predicted quantiles for a two-parameter weibull distribution: quants_weib2 <- predict_quantile( p = c(0.01, 0.1, 0.5), dist_params = c(5, 0.5), distribution = "weibull" ) # Example 2 - Predicted quantiles for a three-parameter weibull distribution: quants_weib3 <- predict_quantile( p = c(0.01, 0.1, 0.5), dist_params = c(5, 0.5, 10), distribution = "weibull3" )
This function evaluates the coefficient of determination with respect to a given threshold parameter of a parametric lifetime distribution. In terms of Rank Regression this function can be optimized (optim) to estimate the threshold parameter.
r_squared_profiling(x, ...) ## S3 method for class 'wt_cdf_estimation' r_squared_profiling( x, thres, distribution = c("weibull3", "lognormal3", "loglogistic3", "exponential2"), direction = c("x_on_y", "y_on_x"), ... )
r_squared_profiling(x, ...) ## S3 method for class 'wt_cdf_estimation' r_squared_profiling( x, thres, distribution = c("weibull3", "lognormal3", "loglogistic3", "exponential2"), direction = c("x_on_y", "y_on_x"), ... )
x |
A |
... |
Further arguments passed to or from other methods. Currently not used. |
thres |
A numeric value for the threshold parameter. |
distribution |
Supposed parametric distribution of the random variable. |
direction |
Direction of the dependence in the regression model. |
Returns the coefficient of determination with respect to the threshold
parameter thres
.
# Data: data <- reliability_data( alloy, x = cycles, status = status ) # Probability estimation: prob_tbl <- estimate_cdf( data, methods = "johnson" ) # Determining the optimal coefficient of determination: ## Range of threshold parameter must be smaller than the first failure: threshold <- seq( 0, min( dplyr::pull( dplyr::filter( prob_tbl, status == 1, x == min(x) ), x ) - 0.1 ), length.out = 100 ) ## Coefficient of determination with respect to threshold values: profile_r2 <- r_squared_profiling( x = dplyr::filter( prob_tbl, status == 1 ), thres = threshold, distribution = "weibull3" ) ## Threshold value (among the candidates) that maximizes the coefficient of determination: threshold[which.max(profile_r2)] ## plot: plot( threshold, profile_r2, type = "l" ) abline( v = threshold[which.max(profile_r2)], h = max(profile_r2), col = "red" )
# Data: data <- reliability_data( alloy, x = cycles, status = status ) # Probability estimation: prob_tbl <- estimate_cdf( data, methods = "johnson" ) # Determining the optimal coefficient of determination: ## Range of threshold parameter must be smaller than the first failure: threshold <- seq( 0, min( dplyr::pull( dplyr::filter( prob_tbl, status == 1, x == min(x) ), x ) - 0.1 ), length.out = 100 ) ## Coefficient of determination with respect to threshold values: profile_r2 <- r_squared_profiling( x = dplyr::filter( prob_tbl, status == 1 ), thres = threshold, distribution = "weibull3" ) ## Threshold value (among the candidates) that maximizes the coefficient of determination: threshold[which.max(profile_r2)] ## plot: plot( threshold, profile_r2, type = "l" ) abline( v = threshold[which.max(profile_r2)], h = max(profile_r2), col = "red" )
This function evaluates the coefficient of determination with respect to a given threshold parameter of a parametric lifetime distribution. In terms of Rank Regression this function can be optimized (optim) to estimate the threshold parameter.
## Default S3 method: r_squared_profiling( x, y, thres, distribution = c("weibull3", "lognormal3", "loglogistic3", "exponential2"), direction = c("x_on_y", "y_on_x"), ... )
## Default S3 method: r_squared_profiling( x, y, thres, distribution = c("weibull3", "lognormal3", "loglogistic3", "exponential2"), direction = c("x_on_y", "y_on_x"), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
y |
A numeric vector which consists of estimated failure probabilities regarding the lifetime data in x. |
thres |
A numeric value for the threshold parameter. |
distribution |
Supposed parametric distribution of the random variable. |
direction |
Direction of the dependence in the regression model. |
... |
Further arguments passed to or from other methods. Currently not used. |
Returns the coefficient of determination with respect to the threshold
parameter thres
.
# Vectors: cycles <- alloy$cycles status <- alloy$status # Probability estimation: prob_tbl <- estimate_cdf( x = cycles, status = status, method = "johnson" ) # Determining the optimal coefficient of determination: ## Range of threshold parameter must be smaller than the first failure: threshold <- seq( 0, min(cycles[status == 1]) - 0.1, length.out = 100 ) ## Coefficient of determination with respect to threshold values: profile_r2 <- r_squared_profiling( x = prob_tbl$x[prob_tbl$status == 1], y = prob_tbl$prob[prob_tbl$status == 1], thres = threshold, distribution = "weibull3" ) ## Threshold value (among the candidates) that maximizes the ## coefficient of determination: threshold[which.max(profile_r2)] ## plot: plot( threshold, profile_r2, type = "l" ) abline( v = threshold[which.max(profile_r2)], h = max(profile_r2), col = "red" )
# Vectors: cycles <- alloy$cycles status <- alloy$status # Probability estimation: prob_tbl <- estimate_cdf( x = cycles, status = status, method = "johnson" ) # Determining the optimal coefficient of determination: ## Range of threshold parameter must be smaller than the first failure: threshold <- seq( 0, min(cycles[status == 1]) - 0.1, length.out = 100 ) ## Coefficient of determination with respect to threshold values: profile_r2 <- r_squared_profiling( x = prob_tbl$x[prob_tbl$status == 1], y = prob_tbl$prob[prob_tbl$status == 1], thres = threshold, distribution = "weibull3" ) ## Threshold value (among the candidates) that maximizes the ## coefficient of determination: threshold[which.max(profile_r2)] ## plot: plot( threshold, profile_r2, type = "l" ) abline( v = threshold[which.max(profile_r2)], h = max(profile_r2), col = "red" )
This function fits a regression model to a linearized parametric lifetime distribution for complete and (multiple) right-censored data. The parameters are determined in the frequently used (log-)location-scale parameterization.
For the Weibull, estimates are additionally transformed such that they are in line with the parameterization provided by the stats package (see Weibull).
rank_regression(x, ...) ## S3 method for class 'wt_cdf_estimation' rank_regression( x, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), conf_level = 0.95, direction = c("x_on_y", "y_on_x"), control = list(), options = list(), ... )
rank_regression(x, ...) ## S3 method for class 'wt_cdf_estimation' rank_regression( x, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), conf_level = 0.95, direction = c("x_on_y", "y_on_x"), control = list(), options = list(), ... )
x |
A |
... |
Further arguments passed to or from other methods. Currently not used. |
distribution |
Supposed distribution of the random variable. |
conf_level |
Confidence level of the interval. |
direction |
Direction of the dependence in the regression model. |
control |
A list of control parameters (see optim).
|
options |
A list of named options. See 'Options'. |
The confidence intervals of the parameters are computed on the basis of a heteroscedasticity-consistent (HC) covariance matrix. Here it should be said that there is no statistical foundation to determine the standard errors of the parameters using Least Squares in context of Rank Regression. For an accepted statistical method use maximum likelihood.
If options = list(conf_method = "Mock")
, the argument distribution
must be
one of "weibull"
and "weibull3"
. The approximated confidence intervals
for the Weibull parameters can then only be estimated on the following
confidence levels (see 'References' (Mock, 1995)):
conf_level = 0.90
conf_level = 0.95
conf_level = 0.99
A list with classes wt_model
, wt_rank_regression
and wt_model_estimation
which contains:
coefficients
: A named vector of estimated coefficients (parameters of the
assumed distribution). Note: The parameters are given in the
(log-)location-scale-parameterization.
confint
: Confidence intervals for the (log-)location-scale parameters.
For threshold distributions no confidence interval for the threshold parameter
can be computed. If direction = "y_on_x"
, back-transformed confidence
intervals are provided.
shape_scale_coefficients
: Only included if distribution
is "weibull"
or "weibull3"
(parameterization used in Weibull).
shape_scale_confint
: Only included if distribution
is "weibull"
or "weibull3"
. Approximated confidence intervals for scale and
shape
(and threshold
if
distribution = "weibull3"
).
varcov
: Only provided if options = list(conf_method = "HC")
(default).
Estimated heteroscedasticity-consistent (HC) variance-covariance matrix
for the (log-)location-scale parameters.
r_squared
: Coefficient of determination.
data
: A tibble
with class wt_cdf_estimation
returned by estimate_cdf.
distribution
: Specified distribution.
direction
: Specified direction.
If more than one method was specified in estimate_cdf, the resulting output
is a list with class wt_model_estimation_list
. In this case, each list element
has classes wt_rank_regression
and wt_model_estimation
, and the items listed
above, are included.
Argument options
is a named list of options:
Name | Value |
conf_method |
"HC" (default) or "Mock" |
Mock, R., Methoden zur Datenhandhabung in Zuverlässigkeitsanalysen, vdf Hochschulverlag AG an der ETH Zürich, 1995
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Reliability data preparation: ## Data for two-parametric model: data_2p <- reliability_data( shock, x = distance, status = status ) ## Data for three-parametric model: data_3p <- reliability_data( alloy, x = cycles, status = status ) # Probability estimation: prob_tbl_2p <- estimate_cdf( data_2p, methods = "johnson" ) prob_tbl_3p <- estimate_cdf( data_3p, methods = "johnson" ) prob_tbl_mult <- estimate_cdf( data_3p, methods = c("johnson", "kaplan") ) # Example 1 - Fitting a two-parametric weibull distribution: rr_2p <- rank_regression( x = prob_tbl_2p, distribution = "weibull" ) # Example 2 - Fitting a three-parametric lognormal distribution: rr_3p <- rank_regression( x = prob_tbl_3p, distribution = "lognormal3", conf_level = 0.99 ) # Example 3 - Fitting a three-parametric lognormal distribution using # direction and control arguments: rr_3p_control <- rank_regression( x = prob_tbl_3p, distribution = "lognormal3", conf_level = 0.99, direction = "y_on_x", control = list(trace = TRUE, REPORT = 1) ) # Example 4 - Fitting a three-parametric loglogistic distribution if multiple # methods in estimate_cdf were specified: rr_lists <- rank_regression( x = prob_tbl_mult, distribution = "loglogistic3", conf_level = 0.90 )
# Reliability data preparation: ## Data for two-parametric model: data_2p <- reliability_data( shock, x = distance, status = status ) ## Data for three-parametric model: data_3p <- reliability_data( alloy, x = cycles, status = status ) # Probability estimation: prob_tbl_2p <- estimate_cdf( data_2p, methods = "johnson" ) prob_tbl_3p <- estimate_cdf( data_3p, methods = "johnson" ) prob_tbl_mult <- estimate_cdf( data_3p, methods = c("johnson", "kaplan") ) # Example 1 - Fitting a two-parametric weibull distribution: rr_2p <- rank_regression( x = prob_tbl_2p, distribution = "weibull" ) # Example 2 - Fitting a three-parametric lognormal distribution: rr_3p <- rank_regression( x = prob_tbl_3p, distribution = "lognormal3", conf_level = 0.99 ) # Example 3 - Fitting a three-parametric lognormal distribution using # direction and control arguments: rr_3p_control <- rank_regression( x = prob_tbl_3p, distribution = "lognormal3", conf_level = 0.99, direction = "y_on_x", control = list(trace = TRUE, REPORT = 1) ) # Example 4 - Fitting a three-parametric loglogistic distribution if multiple # methods in estimate_cdf were specified: rr_lists <- rank_regression( x = prob_tbl_mult, distribution = "loglogistic3", conf_level = 0.90 )
This function fits a regression model to a linearized parametric lifetime distribution for complete and (multiple) right-censored data. The parameters are determined in the frequently used (log-)location-scale parameterization.
For the Weibull, estimates are additionally transformed such that they are in line with the parameterization provided by the stats package (see Weibull).
## Default S3 method: rank_regression( x, y, status, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), conf_level = 0.95, direction = c("x_on_y", "y_on_x"), control = list(), options = list(), ... )
## Default S3 method: rank_regression( x, y, status, distribution = c("weibull", "lognormal", "loglogistic", "sev", "normal", "logistic", "weibull3", "lognormal3", "loglogistic3", "exponential", "exponential2"), conf_level = 0.95, direction = c("x_on_y", "y_on_x"), control = list(), options = list(), ... )
x |
A numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
y |
A numeric vector which consists of estimated failure probabilities
regarding the lifetime data in |
status |
A vector of binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
distribution |
Supposed distribution of the random variable. |
conf_level |
Confidence level of the interval. |
direction |
Direction of the dependence in the regression model. |
control |
A list of control parameters (see optim).
|
options |
A list of named options. See 'Options'. |
... |
Further arguments passed to or from other methods. Currently not used. |
The confidence intervals of the parameters are computed on the basis of a heteroscedasticity-consistent (HC) covariance matrix. Here it should be said that there is no statistical foundation to determine the standard errors of the parameters using Least Squares in context of Rank Regression. For an accepted statistical method use maximum likelihood.
If options = list(conf_method = "Mock")
, the argument distribution
must be
one of "weibull"
and "weibull3"
. The approximated confidence intervals
for the Weibull parameters can then only be estimated on the following
confidence levels (see 'References' (Mock, 1995)):
conf_level = 0.90
conf_level = 0.95
conf_level = 0.99
A list with classes wt_model
, wt_rank_regression
and wt_model_estimation
which contains:
coefficients
: A named vector of estimated coefficients (parameters of the
assumed distribution). Note: The parameters are given in the
(log-)location-scale-parameterization.
confint
: Confidence intervals for the (log-)location-scale parameters.
For threshold distributions no confidence interval for the threshold parameter
can be computed. If direction = "y_on_x"
, back-transformed confidence
intervals are provided.
shape_scale_coefficients
: Only included if distribution
is "weibull"
or "weibull3"
(parameterization used in Weibull).
shape_scale_confint
: Only included if distribution
is "weibull"
or "weibull3"
. Approximated confidence intervals for scale and
shape
(and threshold
if
distribution = "weibull3"
).
varcov
: Only provided if options = list(conf_method = "HC")
(default).
Estimated heteroscedasticity-consistent (HC) variance-covariance matrix
for the (log-)location-scale parameters.
r_squared
: Coefficient of determination.
data
: A tibble
with columns x
, status
and prob
.
distribution
: Specified distribution.
direction
: Specified direction.
Argument options
is a named list of options:
Name | Value |
conf_method |
"HC" (default) or "Mock" |
Mock, R., Methoden zur Datenhandhabung in Zuverlässigkeitsanalysen, vdf Hochschulverlag AG an der ETH Zürich, 1995
Meeker, William Q; Escobar, Luis A., Statistical methods for reliability data, New York: Wiley series in probability and statistics, 1998
# Vectors: obs <- seq(10000, 100000, 10000) status_1 <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) cycles <- alloy$cycles status_2 <- alloy$status # Example 1 - Fitting a two-parametric weibull distribution: tbl_john <- estimate_cdf( x = obs, status = status_1, method = "johnson" ) rr <- rank_regression( x = tbl_john$x, y = tbl_john$prob, status = tbl_john$status, distribution = "weibull", conf_level = 0.90 ) # Example 2 - Fitting a three-parametric lognormal distribution: tbl_kaplan <- estimate_cdf( x = cycles, status = status_2, method = "kaplan" ) rr_2 <- rank_regression( x = tbl_kaplan$x, y = tbl_kaplan$prob, status = tbl_kaplan$status, distribution = "lognormal3" )
# Vectors: obs <- seq(10000, 100000, 10000) status_1 <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 0) cycles <- alloy$cycles status_2 <- alloy$status # Example 1 - Fitting a two-parametric weibull distribution: tbl_john <- estimate_cdf( x = obs, status = status_1, method = "johnson" ) rr <- rank_regression( x = tbl_john$x, y = tbl_john$prob, status = tbl_john$status, distribution = "weibull", conf_level = 0.90 ) # Example 2 - Fitting a three-parametric lognormal distribution: tbl_kaplan <- estimate_cdf( x = cycles, status = status_2, method = "kaplan" ) rr_2 <- rank_regression( x = tbl_kaplan$x, y = tbl_kaplan$prob, status = tbl_kaplan$status, distribution = "lognormal3" )
Create consistent reliability data based on an existing data.frame
(preferred) or on multiple equal length vectors.
reliability_data(data = NULL, x, status, id = NULL, .keep_all = FALSE)
reliability_data(data = NULL, x, status, id = NULL, .keep_all = FALSE)
data |
Either |
x |
Lifetime data, that means any characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. |
status |
Binary data (0 or 1) indicating whether a unit is a right censored observation (= 0) or a failure (= 1). |
id |
Identification of every unit. |
.keep_all |
If |
A tibble with class wt_reliability_data
containing the following
columns (if .keep_all = FALSE
):
x
: Lifetime characteristic.
status
: Binary data (0 or 1) indicating whether a unit is a right
censored observation (= 0) or a failure (= 1).
id
: Identification for every unit.
If .keep_all = TRUE
, the remaining columns of data
are also preserved.
If !is.null(data)
the attribute characteristic
holds the name of the
characteristic described by x
. Otherwise it is set to "x"
.
# Example 1 - Based on an existing data.frame/tibble and column names: data <- reliability_data( data = shock, x = distance, status = status ) # Example 2 - Based on an existing data.frame/tibble and column positions: data_2 <- reliability_data( data = shock, x = 1, status = 3 ) # Example 3 - Keep all variables of the tibble/data.frame entered to argument data: data_3 <- reliability_data( data = shock, x = distance, status = status, .keep_all = TRUE ) # Example 4 - Based on vectors: cycles <- alloy$cycles state <- alloy$status id <- "XXXXXX" data_4 <- reliability_data( x = cycles, status = state, id = id )
# Example 1 - Based on an existing data.frame/tibble and column names: data <- reliability_data( data = shock, x = distance, status = status ) # Example 2 - Based on an existing data.frame/tibble and column positions: data_2 <- reliability_data( data = shock, x = 1, status = 3 ) # Example 3 - Keep all variables of the tibble/data.frame entered to argument data: data_3 <- reliability_data( data = shock, x = distance, status = status, .keep_all = TRUE ) # Example 4 - Based on vectors: cycles <- alloy$cycles state <- alloy$status id <- "XXXXXX" data_4 <- reliability_data( x = cycles, status = state, id = id )
Distance to failure for 38 vehicle shock absorbers.
shock
shock
A tibble with 38 rows and 3 variables:
Observed distance.
One of two failure modes (mode_1
and mode_2
)
or censored
if no failure occurred.
If failure_mode
is either mode_1
or mode_2
this is 1
else 0
.
Meeker, William Q; Escobar, Luis A., Statistical Methods for Reliability Data, New York: Wiley series in probability and statistics (1998, p.630)
A sample of 58 segments of bars were subjected to a high voltage stress test. Two failure modes occurred, Mode D (degradation failure) and Mode E (early failure).
voltage
voltage
A tibble with 58 rows and 3 variables:
Observed hours.
One of two failure modes (D
and E
)
or censored
if no failure occurred.
If failure_mode
is either D
or E
this is 1
else 0
.
Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002