Towards Proper-Inconsistency in Weldability Prediction Using k-Nearest Neighbor Regression
Towards Proper-Inconsistency in Weldability Prediction Using
k-Nearest Neighbor Regression and Generalized Regression Neural
Network with Mean Acceptable Error
Junheung Park1, Kyoung-Yun Kim1*, and Raj Sohmshetty2
* Corresponding author: Tel.: (313) 577-4396; E-mail: kykim@eng.wayne.edu
1Department of Industrial and Systems Engineering
Wayne State University
Detroit, MI 48202, USA
2Ford Motor Company
Dearborn, MI 48124, USA
ABSTRACT
A significant inconsistency problem exists in the quality of resistance spot welding, and yet it offers various
advantages in production. These inconsistent welding data can be eliminated using anomaly detection or instance
selection methods. However, in the weldability prediction problem, this inconsistency we refer to as
proper-inconsistency, may not be eliminated since it can be used to extract additional information. In this research,
we examine the effects of this inconsistency on prediction performance using two machine learning methods,
k-Nearest Neighbors (kNN) regression and Generalized Regression Neural Network, in order to identify an
approach towards tackling the proper-inconsistency problem in weldability prediction. We also propose a new
prediction performance measure, Mean Acceptable Error (MACE), for prediction models in the presence of
proper-inconsistency. The proposed method is tested with actual weldability test data.
1. INTRODUCTION
In this study, we investigate the inconsistency problem that frequently occurs during the resistance spot welding
(RSW) process. RSW is one of the most widely used metal joining processes in many industries, including the
automobile industry [8]. A significant inconsistency problem still exists in the quality of welding, and yet it offers
various advantages, such as high speed, high volume operations, and high rate production [1]. Many researchers report
that some of the important factors that determine the welding quality include weld current, time, force, electrode
displacement, temperature variation, and dynamic resistance [1, 8].
The reliability of RSW is one of the main factors that affects production costs. Several studies have been conducted
to predict the weldability of the different welding parameters in order to support various tasks, such as quality
monitoring and material selection [9, 11]. Due to the complex nature of the welding process, the data collected to
construct a prediction model often contains a significant amount of inconsistency, especially where data instances that
correspond to the same welding parameters have different welding quality measures (such as nugget width).
In general, this type of inconsistent data is treated as noise in machine learning literature. If data contains noise, the
model built from this data is likely to be unreliable. Therefore, the inconsistent data are often eliminated using
anomaly detection or instance selection methods. However, in the weldability prediction problem, the inconsistency,
which we call proper-inconsistency, may not be eliminated since the inconsistency can be used to extract additional
important information. For instance, the electrode wears off as it undergoes the welding process. It may lead to
deteriorated welding quality. However, the information about the electrode wear is usually not included in the physical
test data. If the data is recorded as in the order of the actual physical experiments, one can perform a quality
monitoring task by identifying where and when the inconsistency takes place.
In the presence of proper-inconsistency, it is inappropriate to perform the same approach generally employed with
machine learning algorithms. Most of these machine learning algorithms aim to minimize the overall errors for all data
instances (i.e., errors between prediction and target values). Due to the numerical characteristics of
proper-inconsistency, it is likely to achieve vague prediction results from the model. For instance, suppose there are
two inconsistent data sets. They both have the same welding parameters that are used as inputs to construct the
machine learning model. As an illustration, if their nugget width values are 5mm and 0mm, respectively, the
prediction model will likely end up with a nugget width of about 2.5mm. This result has the minimum error for both
data instances. However, in the case of proper-inconsistency, it might be more beneficial to predict these data close to
Flexible Automation and Intelligent Manufacturing, FAIM2014
one of its target values, either 5mm or 0mm, instead of 2.5mm. In order to perform such predictions, we consider the
k-Nearest Neighbor (kNN) regression with a small number of k neighbors. We anticipate that a small number of k
neighbors is enough to learn the data characteristics and to predict accurately, while separating the influence that the
inconsistent and correct data affect each other in the learning process. We examine this capacity of kNN regression
with a small number of k neighbors and compare it to another machine learning method, Generalized Regression
Neural Network (GRNN). GRNN uses all data instances in order to predict new data, and therefore, the inconsistent
and correct data affect each other in the learning process.
Regarding the prediction performance measure, one can use mean square error (MSE), root mean square error
(RMSE), mean absolute error (MAE), or mean percentage errors (MPE). These measures indicate the errors between
the predicted and target values. We find these traditional measures unsuitable for the proper-inconsistency problem for
several reasons. For instance, from the aforementioned example, the error between the target, 0mm, and prediction,
2.5mm is 2.5mm. Suppose we achieve an error of 5mm from another machine learning method. Considering the
context of this problem, both the 2.5mm and 5mm errors may be unacceptable since the difference is too high.
However, to some extent, the 5mm error may be useful in identifying the location of inconsistency explained above,
even though the error is numerically higher. To address this problem, we propose a new prediction performance
measure called mean acceptable error (MACE) to measure the performance of prediction models constructed with the
presence of proper-inconsistency.
For the rest of the paper, we briefly review related research in Chapter 2. In Chapter 3, we discuss the MACE
measure, kNN regression, and GRNN algorithms. Then, we apply these methods and provide results on the
weldability prediction problem in Chapter 4. We conclude the paper with summary and future works in Chapter 5.
2. LITERATURE REVIEW
Ouafi et al. [1] develop an on-line quality assessment system for the resistant spot welding process. The system is
based on Neural Network and is able to predict the welding quality, such as nugget width and penetration, when
different welding parameters are used. Kim et al. [11] apply Neural Network to arc welding parameter selection
problems. The system developed is able to determine welding parameters and to avoid inappropriate welding design.
Nagesh and Datta [13] use a Neural Network model to predict the weldability in the metal-arc welding process.
Welding parameters considered include bead geometry and penetration, and they show that the Neural Network model
predicts the weldability more precisely. Pal et al. [9] develop a neural network-based system to monitor the weld joint
strength in pulsed metal inert gas welding. The system showed good performance in predicting the weld joint strength
with relatively few errors. Gohsal and Chaki [4] present a hybrid approach that combines Neural Network and
Bayesian regularization. The method is applied to predict the penetration depth performed in the hybrid laser bean
welding. Raghavendra et al. [7] combine a Neural Network model and ant colony optimization algorithm in order to
predict the weld joint strength for the pulsed metal inert gas welding.
Inconsistent data have typically the same values for the input features, but have different output values. They are
usually considered as noise in the literature. Gamberger et al. [14] define inconsistency as a type of measurement error.
Many studies have been conducted to reduce noise using anomaly detection methods. A few recent surveys on
anomaly detection are available in [6, 10]. Instance selection methods aim to select relevant instances resulting in
eliminating irrelevant, redundant, and noise in data. Recent surveys on different methods and analysis on real-world
datasets are also available in [3, 5, 12].
k-Nearest Neighbor (kNN) is a machine learning scheme that can be applied to classification or regression
problems. It is known for its simplicity while providing high efficiency and effectiveness. Many researchers employ
kNN algorithm in different applications. Detailed information on kNN regression is available in [19]. Another related
research area focuses on developing efficient data structures in order to calculate the distances between data instances,
and a recent survey on these research works is available in [16]. Generalized Regression Neural Network (GRNN) is
an instance-base learning algorithm, which was first introduced in [15]. It has been used in many applications where a
function approximation is required. In [2], GRNN is applied to the feature selection problem. They measure the
performance of different subsets of features using GRNN. Şenkal [17] develop a solar radiation prediction system
using GRNN. Input features to the GRNN model include latitude, longitude, and a few coefficients achieved by
satellites. Li et al. [18] present a GRNN based approach along with a new optimization algorithm for the parameter for
GRNN. The developed approach is applied to predict the annual power load while providing a more tractable
computational cost and higher performance compared to other methods.
Towards Proper-Inconsistency in Weldability Prediction Using k-Nearest Neighbor Regression
3. PROPOSED METHODS
3.1. MEAN ACCEPTABLE ERROR (MACE)
The prediction performance is typically measured using MSE, RMSE, MAE, and MPE for machine learning
algorithms. For example, MSE is represented as follows:
MSE = 1
N
(yi −
yˆi )2
i=1
NΣ
(1)
where, N is the number of data instances and yi and
yˆi are the target and predicted values for data
instance i respectively.
We find these measures unsuitable for the weldability prediction problem for the reasons explained in Chapter 1.
Therefore, we define MACE to measure the performance of prediction models as follows:
MACE = 1
N
yi −
yˆi ≤ T
i=1
NΣ
(2)
where, [ ] is the Iverson bracket and T is a threshold. The Iverson bracket returns 1, if the statement inside the
bracket is true otherwise 0. T can be specified such that the prediction values are reasonably acceptable compared to
its target value depending on the context of the problem. In other words, MACE measures the performance of
prediction models as in the number of acceptable predictions. In this study, we specify a threshold T as follows:
T = 0.1× 1
N
yi
i=1
NΣ
Essentially, T is used to define predictions that are in the 10 percent range of the mean target value as acceptable
and therefore correctly predicted.
3.2. K NEAREST NEIGHBOR REGRESSION AND GENERALIZED REGRESSION NEURAL NETWORK
In this section, we introduce k Nearest Neighbor (kNN) regression and Generalized Regression Neural Network
(GRNN). kNN regression predicts a new data instance based on the similarities between the new data instance and
those in the training data set. The similarities can be calculated using Euclidean or any other distance metrics such as
Cosine distance. Once the distances are measured, k nearest data instances and their target values determine the new
data instance’s prediction value. The Euclidean distance can be calculated as follows:
D2 (X, Xj ) = D2j
= (X − Xi )T ⋅ (X − Xi ) (3)
where, X = (x1, x2,..., xp ) and Xj = (xj1, xj 2,..., xjp ) are the new data instance and jth training data instance
respectively.
kNN regression can be performed using the following equation:
Yˆ(X) = Y(X ')e(−D2j
/2σ 2 )
e(−D2j
/2σ 2 )
X '⊂N ( X )
X '⊂N(X ) Σ
Σ (4)
where,
Yˆ(X) = (
yˆ1,
yˆ2,...,
yˆl )is the predicted value of the new data instance X , nis the number of training data
instances, X ' is the k nearest training data instances that are closest to the new data instance to be predicted X , and
e . n o i t c n u f l a i t n e n o p x e e h t s i E d e s u s i e c n a t s n i a t a d g n i n i a r t h c a h t i w n o i t c n u f t h g i e w a , y l l a c i f i c e p s e(−D2j
/2σ 2 ) to
predict the new input data. Each of these distances contributes to the predicted value. Note that the weight function
m r e t e(−D2j
/2σ 2 ) increases when the distance D
2
j is close and decreases as the distance increases. Therefore, the
Flexible Automation and Intelligent Manufacturing, FAIM2014
response value Y(X)j = (yj1, yj 2,..., yjl ) of training data instances that are closer to the new data instance X has more
contributions to the final prediction value
Yˆ(X). The smaller σ becomes, the more each training data contributes to
the prediction.
GRNN is an instance-based learning algorithm. The computational cost in training and predicting is generally
tractable, compared to those that are not instance-based learning algorithms (e.g., Neural Network with
back-propagation algorithm). Another advantage is that there is only one parameter called smoothing parameter, σ ,
which makes it easy to select the optimal parameter. Some other advantages avoiding a local minima and over-fitting
and robustness to outliers are reported in [2]. Using GRNN, predictions can be performed by first calculating the
distance between the new data instance and training data instances using the same equations defined above for kNN
regression.
Similar to kNN regression, GRNN predicts a new data instance as follows:
Yˆ(X) =
Y (X)j e(− D2j
/2σ 2 )
e(− D2j
/2σ 2 )
j=1
nΣ
j=1
nΣ
(5)
Note that GRNN uses all the training data instances to predict
Yˆ(X) instead of using k nearest training data
instances.
Most machine learning algorithms attempt to minimize the overall errors between the predicted and target values.
That is the correct approach, in general, since inconsistency is assumed as noise and thus eliminated before applying
any machine learning methods. However, in the case of proper-inconsistency, these properly-inconsistent data should
not be affected by the correct data instances so that they can be further examined. On the contrary, the correct data
should not be affected by these properly-inconsistent data. With that notion, we attempt to segregate the learning
mechanisms for the properly-inconsistent data from the correct ones and vice versa using kNN regression with a small
number of k. In other words, the properly-inconsistent data should learn mostly from themselves, and similarly, the
correct ones should do the same by not affecting each other. In order to validate the performance of kNN regression
with a small k, we compare its performance measured by MACE to the same of GRNN’s performance. This is to see
whether using a small k nearest neighbors performs well on predicting both the correct and properly-inconsistent data.
We present and discuss the results in the next section.
4. RESULTS
In our weldability prediction data set, there are 1,280 data instances. Each data instance has its corresponding 16
input feature values that specify different welding parameters and one output feature value, which is nugget width.
These 16 input values consist of material characteristics, welding force, welding current, and other relevant features.
We use kNN regression with a number of small k values to construct prediction models and we compare the results
with those achieved by GRNN.
We perform 10-fold cross validation to construct the models. The entire data set is randomly shuffled and split into
ten folds. The data instances in the first fold are kept for testing the model constructed from the rest of data instances
that belong to the nine other folds. The predictions are calculated for the data instances in the first fold. Then, the
second fold is held out for testing and the rest is used to train another model. This process is repeated until we obtain all
the predictions for the entire data set.
Towards Proper-Inconsistency in Weldability Prediction Using k-Nearest Neighbor Regression
Figure 1 shows how the same welding parameters can result in inconsistent welding quality. We group data
instances that have the same welding parameters, which result in 262 different groups and then we plot them on the
x-axis. Their nugget width values are plotted on the y-axis. Groups 1 through 76 have no inconsistency as their nugget
width values are all zeros. Groups 77 through 103 have a significant inconsistency and groups 104 through 262 have
both a slight inconsistency and variations. We name these three types of groups as Group A, Group B, and Group C
respectively. Each group in Group A and Group B consists of about 2 to 5 data instances. Whereas in Group C, the
number of data instances are about 10 to 20 in most of the groups and majority of data, which is about 80% of the
entire dataset fall into this group.
Figure 1. Inconsistent welding quality.
In Table 1, we summarize the experiments conducted using the kNN regression and GRNN. Since we employ
10-fold cross validation by randomly choosing data instances, we replicate 50 times and, for each replication, the
prediction performance is measured using the MACE measure we presented in Chapter 3. The threshold was T = 0.456
in our experiment which is the mean nugget width calculated from the entire data set.
Table 1. Results of kNN regression and GRNN (minimum, maximum, mean, and standard deviation of MACE).
Flexible Automation and Intelligent Manufacturing, FAIM2014
The results show that using a small number of k neighbors provides a better MACE performance than that of
GRNN in Group A and B. For instance, we see that from Table 1, when compared to GRNN with σ = 0.01, kNN
regression with K =1 has higher performance for every group. Similarly, kNN with K = 3 provides higher MACE
performance for Group A and B resulting in a better performance throughout the entire data set. When less K training
data instances are used, we also observe that kNN is generally robust to the parameter σ . It doesn’t seem that there is
a strong correlation between K and the standard deviation even though the standard deviation increases in many cases
as K increases. In addition, for Group B where the most significant inconsistency exists, we observe that a smaller
number of k neighbors achieves a better MACE performance as kNN with K =1 provides MACE performance is
about 35.97% acceptable in terms of the mean performance calculated from the 50 replications. Whereas using GRNN,
the inconsistent data could not be predicted in the acceptable range. This is because kNN, with a small number of k
neighbors, predicts the properly-inconsistent data mostly based on the similar ones from the training dataset. As we
increase K , we can see the prediction performance becomes closer to that of GRNN where it uses every data instance
in the training dataset to predict a new data instance.
5. CONCLUSION
In this paper, we investigate the inconsistent quality problem in resistance spot welding (RSW). We define this
inconsistency as proper-inconsistency since they capture the nature of the RSW process. We claim that they should
not be treated as noise. Therefore, our attempt was to identify a solution for machine learning algorithms to perform
better in the presence of proper-inconsistency in the dataset. We proposed a prediction performance measure, called
mean acceptable error (MACE), which was used to measure the prediction performance of kNN regression and GRNN.
We showed using a smaller number of k neighbors provides a better MACE performance in predicting
properly-inconsistent data.
We are interested in further examining the possibility of adjusting the number k neighbors for different types of
data characteristics. We also consider other applications than RSW where there is a significant inconsistency in data
that should not be treated as noise. Some other future works focus on examining other machine learning algorithms
and comparing their behaviors in learning from similar inconsistent data sets by using different types of performance
measures compared to MACE. In addition, we will study how our approach can be used to support in making decisions
(e.g., quality monitoring and material selection) for intelligent manufacturing.
Towards Proper-Inconsistency in Weldability Prediction Using k-Nearest Neighbor Regression
ACKNOWLEDGEMENTS
This research is partially supported by the NSF I/UCRC for e-Design and Ford Motor Company.
REFERENCES
[1] A. E. Ouafi, R. Bélanger, and J. Méthot: “Artificial neural network-based resistance spot welding quality assessment
system”, Revue de Métallurgie, Vol. 108, No. 6, pp.343–355, 2011.
[2] I. A. Gheyas and L. S. Smith: “Feature subset selection in large dimensionality domains”, Pattern Recognition, Vol. 43, No.
1, pp.5–13, 2010.
[3] H. Liu, Instance selection and construction for data mining, Springer-Verlag, 2010.
[4] S. Ghosal and S. Chaki: “Estimation and optimization of depth of penetration in hybrid CO2 LASER-MIG welding using
ANN-optimization hybrid model”, The International Journal of Advanced Manufacturing Technology, Vol. 47, No. 9,
pp.1149–1157, 2010.
[5] J. A. Olvera-López, J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, and J. Kittler: “A review of instance selection methods”,
Artificial Intelligence Review, Vol. 34, No. 2, pp.133–143, 2010.
[6] V. Chandola, A. Banerjee, and V. Kumar: “Anomaly detection: A survey”, ACM Computing Surveys (CSUR), Vol. 41, No. 3,
pp.1–58, 2009.
[7] N. Raghavendra, R. Koranne, S. Pal, S. K. Pal, and A. K. Samantaray: “Joint strength prediction in a pulsed MIG welding
process using hybrid neuro ant colony-optimized model”, The International Journal of Advanced Manufacturing
Technology, Vol. 41, No. 7, pp.694–705, 2009.
[8] G. Xu, J. Wen, C. Wang, and X. Zhang, “Quality monitoring for resistance spot welding using dynamic signals”, IEEE
International Conference on Mechatronics and Automation, pp.2495–2499, 2009.
[9] S. Pal, S. K. Pal, and A. K. Samantaray: “Artificial neural network modeling of weld joint strength prediction of a pulsed
metal inert gas welding process using arc signals”, Journal of materials processing technology, Vol. 202, No. 1, pp.464–474,
2008.
[10] V. J. Hodge and J. Austin: “A survey of outlier detection methodologies”, Artificial Intelligence Review, Vol. 22, No. 2,
pp.85–126, 2004.
[11] I.-S. Kim, Y. Jeong, C. Lee, and P. Yarlagadda: “Prediction of welding parameters for pipeline welding using an intelligent
system”, The International Journal of Advanced Manufacturing Technology, Vol. 22, No. 9, pp.713–719, 2003.
[12] H. Liu and H. Motoda: “On issues of instance selection”, Data mining and knowledge discovery, Vol. 6, No. 2, pp.115–130,
2002.
[13] D. Nagesh and G. Datta: “Prediction of weld bead geometry and penetration in shielded metal-arc welding using artificial
neural networks”, Journal of materials processing technology, Vol. 123, No. 2, pp.303–312, 2002.
[14] D. Gamberger, N. Lavrač, and S. Džeroski, Noise elimination in inductive concept learning: A case study in medical
diagnosis, Algorithmic Learning Theory, Springer Berlin Heidelberg, pp.199–212, 1996.
[15] D. F. Specht: “A general regression neural network”, IEEE transactions on neural networks, Vol. 2, No. 6, pp.568–576,
1991.
[16] N. Bhatia and Vandana: “Survey of nearest neighbor techniques”, International Journal of Computer Science and
Information Security, Vol. 8, No. 2, pp.302-305, 2010.
[17] O. Şenkal: “Modeling of solar radiation using remote sensing and artificial neural network in Turkey”, Energy, Vol. 35, No.
12, pp.4795–4801, 2010.
[18] H. Z. Li, S. Guo, C. J. Li, and J. Q. Sun: “A hybrid annual power load forecasting model based on generalized regression
neural network with fruit fly optimization algorithm”, Knowledge-Based Systems, Vol. 37, pp.378–387, 2012.
[19] C. G. Atkenson, A. W. Moore, and S. Schaal: “Locally weighted learning for control”, Artificial Intelligence Review, Vol. 11,
pp.75–113, 1997.