Article Type : Review Article
Authors : Sheng Pin Kuan
Keywords : Control chart; Sampling Acceptance; Orthogonal experiments; TQM
Referring the historical contributions of quality gurus, we
have to see the implication of their contributions rather than listen their
stories. These historical contributions are what Liu Yuan-Zhang, the father of
Chinese quality, said: “Three Old Pieces of TQM are the Control Chart, the
Sampling Acceptance table, and the Orthogonal Experiments. They all have their
own mathematical theory and successful applications, in the 1950s and 1960s;
most of quality practitioners could understand their usage. When I came back
China from U.S. and Japan in 1955, I just brought with these three pieces.” I always
put Liu Yuan-Zhang’s saying in my mind, because these “Three Old Pieces” are
the basic capabilities why I could do something in the early stage of my
quality career, so their theoretical basis and practical application should be
understood deeply. On the other hand, solving the quality problems, statistical
methods and thinking way is one of the important tools, but also need to
integrate the methodologies of essence of substance, process of business and
psychology. After years of professional integration in different fields, the
quality management gradually formed a multi-value organization team work mode
of continuous improvement, leadership, and management by objectives, full
participation, common language, problem solving, and response to change. It is
the integration of statistical science, management science, engineering
science, system science, information science, psychology, etc., in order to
improve the quality of human life. Therefore, to understand these philosophies,
systems, methodologies, techniques and tools of quality management, we must
extensively learn the knowledge of different fields; we must also thoroughly
understand the technical methods for various purposes and needs.
The quality management
was original from applying the principle of statistics to establish the control
charts and acceptance sampling tables, and to provide on-site personnel who do
not understand the theory so much, to judge the stability of process and the
acceptance of inspection lot. This kind of quality control tools are based on
statistical theory, if it does not associate with the practical work, it will
lose the value of applications. Speaking of the Control Chart, we cannot but
mention the masterpiece of Shewhart: The Economic Control of Quality of
Manufactured Product,” just as following description by ASQ website [1]. When
W.A. Shewhart (the father of modern quality control) described his books “an
indication of direction in which future developments may be expected to take
place,” could he have foreseen its enormous impact? This monumental work laid
the foundation discipline, and it remains current today as ever. It began as an
attempt to develop a scientific basis for attaining economic control of quality
through the establishment of control limits to indicate when quality is varying
more than is economically desirable. In his search for better knowledge of
economic in manufacture, Shewhart touches upon all aspects of statistical
quality control. The book includes a presentation of fundamental concepts and
advantages of statistical control; ways of expressing quality of product (a
section containing that has been described as on the meaning); the basis for
specification of quality control; sampling fluctuations in quality; allowable
variability in quality (which contains the first fully developed use of control
charts); and quality control in practice. This is required reading for every
serious in study of quality control. Sampling acceptance is based on a sampling
process to judge the inspection lots from supplier or production line accept or
not? How to judge to be reasonably, this is the basic problem of sampling
acceptance. In other words, what is the sampling plan; Sample size n=?
Acceptance number Ac=? The design of sampling plans according to the different
risk of protection; different disposal of inspection lot; the use of different
occasions, it divided into the standard sampling, rectify sampling, adjustment
and continuous production. These methodologies are developed on the basis of
mathematical theory. Such kind of methods of judging the truth through random
sampling will inevitably lead to the risk of misjudgement. The reason why
statistical methods are widely used in all fields of science and has become
universally recognized is that it can be calculated the probability of
misjudgement, allowing the user to assess the risk of misjudgement. In industry
often uses the producer risk ? and the consumer risk ? to ensure the quality of
the judgement of truth. This way of expressing risk by probability is less
understandable to the most people, so it is important to cultivate our own
statistical thinking way, and is the knowledge that modern citizens should
have.
Lottery is from 01, 02,
03 … 47, 48, and 49 choose six numbers randomly, according to the guesser who
guesses how many numbers correctly, and wins the prize. Excel hyper geometric
distribution HYPGEOMDIST () function is calculated to give the following
probability:
The probability of
guessing all 6 numbers correctly=
The probability of
guessing 5 numbers correctly=
The probability of
guessing 4 numbers correctly=
The probability of
guessing 3 numbers correctly=
The probability of
guessing 2 numbers correctly=
The probability of
guessing 1 numbers correctly=
The probability of
guessing nil numbers correctly=
From the above probability calculation, the probability of winning is 0.0186, and the probability of no prize is 0.9814, that is to say, 9,814 notes per 10,000 notes are no prize. The chance of a jackpot is 0.00000007; it is more likely occurred at least 10 million notes per issue. This is why the jackpot often accumulates many issues. The probability of an event occurring must be met by a large number of observations in accordance with theoretical calculations; this is so called the law of large numbers. For example, new products have no defective products in the pilot stage, but there are frequent defective products during mass production, because only one or two hundred pieces are made during the pilot stage, and thousands or tens thousands of pieces are made daily during the mass production. This is also the reason why Cp / Cpk can be used to estimate the percent of defective during the mass production (Table 1).
Table 1: Cp / Cpk and percent of defective.
Sigma Level |
Cp |
k=Ca |
CPU |
CPL |
Cpk |
percent
of defective (ppm) |
1.0 |
0.333 |
1.500 |
-0.167 |
0.833 |
-0.167 |
697,672.1 |
2.0 |
0.667 |
0.750 |
0.167 |
1.167 |
0.167 |
308,770.2 |
3.0 |
1.000 |
0.500 |
0.500 |
1.500 |
0.500 |
66,810.6 |
4.0 |
1.333 |
0.375 |
0.833 |
1.833 |
0.833 |
6,209.7 |
5.0 |
1.667 |
0.300 |
1.167 |
2.167 |
1.167 |
232.6 |
6.0 |
2.000 |
0.250 |
1.500 |
2.500 |
1.500 |
3.4 |
Table 2: Accuracy and Precision.
Quality Indicators |
Specification |
Distribution Parameter |
Statistics |
Process Capability |
Control Chart |
MSA |
Statistical Inference |
Accuracy |
Target: m |
? |
|
Ca |
|
Biased |
Unbiasedness |
Precision |
Tolerance:? |
? |
s / R |
Cp |
s / R |
%GRR |
Efficiency |
Every production, service, or management process contains a certain amount of variation due to the presence of many kinds of causes. Engineers and technicians usually check whether the products are conformed the specifications to ensure that the individual products meet the requirements of the customers. Repeated sampling from the same process in different intervals in term of time (for example by hour), to observe quality of products or services by measuring its specific characteristics; we will get a lot of data, after organizing the data we will get a distribution. For example, when we are sampling from a standard normal distribution N (0, 12), with sample size n=10, 100, 1,000, 10,000, 50,000, we will see it gradually form a distribution. At the same we also can see the number of noncomformed products occurred when sample size is larger, if the specification is 0 ± 3 (Figure 1,2).
Figure 1: Distribution of characteristics.
Figure 2: Shewhart’s ways of expressing quality.
Shewhart’s ways of
expressing quality of product by distribution statistically, as shown in Figure
2; the distribution of a characteristic is depited by the horizontal axis is
the size of the characteristics; the vertical axis is the possibility of occurrence
of the characteristic. Control charts grasp the variation of process by the
location, dispersion and skewness of distribution. In terms of product /
process quality characteristics, engineers and technicians usually check the
tolerance of the process / product with the specification tolerance to ensure
that individual products meet the customer's requirements. If the specification
is m±?, m: target value, ?: tolerance. USL=m+?: upper specification limit;
LSL=m-?: lower specification limit. If LSL<Y<USL, the product is
conformed; if Y<LSL or Y>USL, the product is nonconformed, as shown in
Figure 3. For example, the specification of a steel pipe diameter is 30±0.1 mm,
m=30 mm, ?=0.1 mm, USL=30.1 mm, and LSL=29.9 mm. On the right side of Figure 3,
the GO-NOGO gauge is checked. If Y<30.1 then GO and Y>29.9 then NOGO,
indicating that the product is conformed (Figure 3).
When the process is in the state of under control, the distributions of process over time are fixed. Repeated sampling in different times can be regarded as repeated sampling in the same distribution. What is the sampling distribution of the sample statistics? As shown in Figure 4: A total of k subgroups are sampled, the size of each subgroup is n (Figure 4).
Figure 3: Product characteristics and specification.
Figure 4: Sampling distribution of statistics.
The control charts are
plotted with the statistics, such as, and other statistics, as shown in Figure
5. If there are no special causes for the process, there is only common cause,
that is, the distribution of the process is fixed. Repeated sampling in
different times can be regarded as repeated sampling in the same distribution.
Therefore, we can study the sampling distribution of statistics under the
normal distribution to understand the variation pattern of the statistics when
repeated sampling from the process with only common cause (Figure 5).
The masterpiece of W.A. Shewhart, in Part IV
“Sampling Fluctuation in Quality” used four chapters to explain the sampling
distribution of various statistics [2,3]. At the same time, many complicated
experiments were carried out to verify the correctness of the theory. We have
also been experimented by computer software simulation since our quality
control teaching career. These experiments are quite helpful for the
establishment of statistical thinking. Figure 6 is the sampling distribution of
average (X), standard deviation (s), and range (R) under
normal distribution (Figure 6).
However, the control chart is not controlling a product / process characteristic directly. It is mainly to analyze or control process variations at different times, different lots, different machines, and so on. The variation of the process is not to judge the difference by the result of a test or inspection, but to judge the variation by repeating the observation data of the input / output from the same process.
Figure 5: Statistics and control charts.
Figure 6: Sampling distribution of X, S, and R under normal distribution.
There are two ways to
judge variation, one is to determine the difference between the mean value ? of
the repeated observation data and the target value m=|?-m|, the smaller the
deviation, the more accurate; the other is to compare the standard deviation ?
of the observation data with the tolerance ? (?=?/z) or (?=z?), the smaller ?
is, the more precise is. The product / process quality index is nothing more
than the measure of accuracy and precision. The following definition can tell
that various quality indicators are concerned with accuracy and precision
(Table 2).
Accuracy: manufacturing same product repeatedly, the
difference between the mean value ? and the target value m;
Precision: manufacturing same product repeatedly, the
consistence between the products, in other word, the standard deviation ?.
When reading a
statistical analysis report, most people often have only superficial concept of
p-value and significance. However, in most statistical analysis reports, these
methods are used to interpret the meaning of the data. If you do not understand
these statistical concepts, you may mislead the analysis report and make a
wrong decision. First, define the so-called statistical hypothesis. In order to
judge whether accept or reject the hypothesis, the statistician divides the
statistical hypothesis into the Null Hypothesis, which is labeled as H0; the
Alternative Hypothesis, which is labelled as H1. In practice, we often set the
null hypothesis H0 to the factual state suspected by the position of the
verifier, which is the same as the principle of presumption of innocence of the
judiciary [2]. In terms of problem orientation, there are many problems need to
verify, such as, countless goods transactions happening in the industry every
day; the science and technology personnel of the science and technology
laboratory often have some ideas that dare not confirm their benefits; the
production process needs to monitor whether the quality is stable every day;
how producers guarantee quality before users use the products. If these
problems can be judged by intuition, we probably won't use statistical data to
judge, and the null hypothesis H0 is the truth that we can't judge by
intuition. Therefore, we set the null hypothesis of the goods transaction as if
the transaction lot is a good lot, the hypothesis of the scientific and
technological personnel's creativity is set to be the same as the benefit of the
control group, and the null hypothesis of monitoring the process quality is set
to the stability of the process quality, and null hypothesis of guaranteeing
the product quality is set to quality as standard. However, if these hypotheses
describing the truth are not quantified in statistical terms, we still cannot
compare them with the data, so the null hypothesis should be expressed in a
statistical language that is consistent with the probability distribution
calculation; so that the testing statistical hypothesis is carry out. For
example, the facts as described above are expressed in statistical language.
The so-called alternative hypothesis is a statistical hypothesis that is
contrary to the null hypothesis (Table 3).
In practice, the test criteria are used to judge statistical hypotheses; it is often determined whether the H0 is rejected by the degree of difference between H0 and the test criteria. The larger the difference, the more H0 should be rejected. As for how the degree of difference between H0 and the test criteria is calculated, we define the so-called significance level, and the probability that the test criteria is present under the H0 hypothesis, it is the so-called p-value. In general, the smaller the chance, the less we believe that H0 is true. This is the same thinking way about the truth of our human beings. We are usually observing the occurrence of some events, subjectively, there will be a subjective spectrum in our mind, and this spectrum is the null hypothesis H0. After data collection and analysis, if the results of analysis are very different from our subjective spectrum, we often say that it is too unreasonable! This method of judging the truth based on factual data is the testing statistical hypothesis. Part of the research work of statisticians is to find the probability distribution of the test criteria and its inferences in various fields, and to derive the sampling distribution to calculate its accuracy, precision and p-value. In general, p-value is small, the difference is very significant, statisticians recommend p-value <0.05 or 0.01, and the most industry standards are also recommended. The implication is that if the same random sampling is repeated 100 times under the null hypothesis H0, the occurrence of test criteria is only 5 times or 1 time. And we only have one such random sampling, which is rare under the null hypothesis H0, so H0 should be rejected. The following is an example of some rules for testing special causes of the control chart. Assume that the distribution of process characteristics is normal and the process is under control, the following are some rules for testing of special causes of control charts.
Table 3: Null hypothesis.
H0:
Factual state |
H0:
Statistical language |
H0: Transaction
lot is a good lot |
H0: The percent of defective of Transaction lot p’?AQL |
H0: The benefit of experimental
group and control group are the same |
H0: means of experimental group
and control group are equal ?1=?2 |
H0: Process is stable |
H0: Process mean ? and standare
deviation ? meet requirements |
H0: Quality of product meet
requirement |
H0: Product MTBF?10,000
hours |
The following are the 8
rules for Testing Special Causes:
·
1
point in or outside A zone, p-value=0.00135×2=0.0027;
·
9
points in a row in or outside C zone, p-value=2×(0.5)9=0.0039;
·
6
points in a row, all increasing or all decreasing, p-value=2× (1/6?)=0.0028;
·
14
points in a row, alternating up and down; p-value=2×(0.5)13=0.00024;
·
2
out 3 points in or outside A zone (same side),
p-value=2×(3C2(0.0228)2×(0.9772)+3C3(0.0228)3)=0.0031;
·
4
out 5 points in or outside B zone (same side),
p-value=2×(5C4(0.1587)4×(0.8413)+5C5(0.1587)5)=0.0055
·
15
points in a row, within C zone (either side), p-value= (0.6826)15=0.0033;
8 points in a row, outside C zone (either side), p-value= (0.3174)8=0.0001.
Figure 7: Zone chart.
In formal scientific
reports or risk management, there are many uncertain events that need to be
assessed as to whether the evidence is possible, credible, and consistent, such
as COVID-19 traceability surveys, public opinion surveys, greenhouse gas
emissions impact assessments etc. These assessments also require scientific
observation, questionnaires, expert interviews, scenario analysis etc., such
kind of subjective and objective or quantitative and qualitative analysis. The
following summary terms are used to describe the available evidence: limited,
medium, or robust; and for the degree of agreement: low, medium, or high. A
level of confidence is expressed using five qualifiers: very low, low, medium,
high, and very high. For a given evidence and agreement statement, different
confidence levels can be assigned, but increasing levels of evidence and
degrees of agreement are correlated with increasing confidence. These uncertain
events descriptions are defined below [4].
Uncertainty
A cognitive state of
incomplete knowledge that can result from a lack of information or from
disagreement about what is known or even knowable. It may have many types of
sources, from imprecision in the data to ambiguously defined concepts or
terminology, or uncertain projections of human behavior. Uncertainty can
therefore be represented by quantitative measures (a probability density
function) or by qualitative statements (reflecting the judgment of a team of
experts).
Agreement
The degree of agreement
is the level of concurrence in the literature on a particular finding as
assessed by the authors.
Evidence
Information indicating
the degree to which a belief or proposition is true or valid. The degree of
evidence reflects the amount, quality, and consistency of scientific /
technical information on which the authors are basing their findings.
Confidence
The validity of a
finding based on the type, amount, quality, and consistency of evidence
(mechanistic understanding, theory, data, models, expert judgment) and on the
degree of agreement.
Likelihood
The chance of a specific outcome occurring,
where this might be estimated probabilistically. This is expressed using a
standard terminology: virtually certain 99 – 100 % probability, very likely 90
– 100 %, likely 66 – 100 %, about as likely as not 33 – 66 %, unlikely 0 – 33
%, very unlikely 0 – 10%, exceptionally unlikely 0 – 1 %.
Under the guidance of
Internet thinking, one is big data thinking, which is written like this: “In
the era of big data, corporate strategy will shift from business mobilization
to data mobilization. The data information of massive user access behavior is
chaotic, but behind it is the inevitable logic of consumer behaviour. Big data
analysis can learn about the inventory and pre-sales of products in various
regions, time periods, and consumer groups, and then conduct market judgments,
and adjust products and operations based on this.” “Users generally generate
data, behavior, and relationship data on the network. The precipitation of
these data helps enterprises to make predictions and decisions. In the era of
big data, data has become an important asset of enterprises, even core assets.”
The value of big data is not only big, but the ability to mine and predict. The
core of big data thinking is to understand the value of data, create business
value through data processing, data assets become core competitiveness, and
small enterprises must have big data also [5]. Traditional quality data is
nothing more than variable data, attribute data, defects, internal failure costs,
external failure costs, etc., and these data are also collected through data
gathering, data processing, statistical analysis, finding root cause, and so
on. In the past, quality practitioners also relied on these so-called
professional jobs for a position in the company. When these quality data are
collected, organized, analysed and monitored automatically by computer, it will
be difficult for quality practitioners to keep up with the times. Precision
tool machines are also embedded various IOTs due to the development of Internet
and IOT technology, collecting machine operation status, component diagnosis,
machine life estimation, consumables monitoring, power consumption monitoring,
utilization monitoring, and various attributes data analysis, and so on.
Production site data is transmitted to the cloud through the Internet, such
kind of data mining and forecasting, will be the future of quality professional
field worthy of thinking.
Statistical thinking is
very important for modern management and technical personnel. When a manager or
engineer is be required for reporting related issues, if there is additional
objective statistical data to enhance the evidence of the report, then it will
be easier to convince readers. As D.J. Finney said that “The purpose of
statistical science is to provide an objective basis for the analysis of
problems in which the data depart from the laws of exact causality [6]”. It is
something significant for us to state the relationship between quality
management and statistical science, rather than say quality management relying
on the science of statistics, fields of quality lead to some researches in
statistics, for example “Engineering and Industrial Statistics”. As for how to
cultivate self-fulfilling statistical thinking ability, besides being familiar
with statistical theory, but also being learns some of statistical analysis
skills.