Numerical Methods for Information Tracking of Noisy and Non-smooth Data in Large-scale Statistics

Main Article Content

B. S. Avinash
T. Srisupattarawanit
H. Ostermeyer

Abstract

In our universe, there is a presence of random bit of disorder in every field that has to be contemplated and understood clearly. This random bit of disorder in a physical system is known as noise. Noise in the field of statistics can be defined as an additional meaningless information that cannot be clearly interpreted which is present in the entire dataset. In large-scale statistics, noisy data has an adverse effect on the results and it can lead to skewness in any data analysis process, if not properly understood or handled. The adverse effect on the results is mainly due to uncorrelated (zero autocorrelation) property of noise. This makes it completely unpredictable at any given point in time, hence thorough investigation and removal of noise plays a vital role in data analysis process. In the field of engineering, measurement of experimental data obtained by using scientific instruments consists of some values that are independent of the experimental setup. One of most widely technique is the optimization methods viz, gradient descent, conjugate gradient, Newton’s method etc. Most of these methods require the determination of derivative of a function specified by the dataset (using finite-difference approximation). If the noisy data is approximated using a specific finite difference method this results in the amplification of noise present in the data. In order to overcome the aforementioned problem of amplification of noise in the derivative of a function, various regularization methods are employed. The parameter that plays a vital role in these methods are termed as regularization parameter. One of the most important technique used in the field of regularization is known as total variation regularization. This review aimed at gathering the disperse literature on the current state of various noises and their regularization methods.

Keywords:
Large-scale statistics, noisy data, regularization, data driven methods, amplification

Article Details

How to Cite
Avinash, B. S., Srisupattarawanit, T., & Ostermeyer, H. (2019). Numerical Methods for Information Tracking of Noisy and Non-smooth Data in Large-scale Statistics. Journal of Engineering Research and Reports, 6(4), 1-16. https://doi.org/10.9734/jerr/2019/v6i416957
Section
Review Article

References

Rani NS, Rao PS, Anurag. Study an analysis of noise effect on big data analytics.Int. J. Management, Technology and Engineering. 2018;8(XII):5841-5850.

Chartrand R. Numerical differentiation of noisy, no smooth data. ISRN Applied Mathematics. 2011;1–11.
DOI:10.5402/2011/164564

Hansen PC. Analysis of discrete ill-posed problems by means of the l-curve. SIAM Review. 1992;34(4):561–580.

Belge M, Kilmer ME, Miller EL. Efficient determination of multiple regularization parameters in a generalized l-curve framework. Inverse problems. 2002;18(4): 1161–1183.

Hansen PC. Kilmer ME. A parameter-choice method that exploits residual information. PAMM. 2007;7(1):1021705–1021706.

Rust BW, Dianne PO. Residual periodograms for choosing regularization parameters for ill-posed problems. Inverse Problems. 2008;24(3):034005.

Jansen M, Malfait M, Bultheel A. Generalized cross validation for wavelet thresholding. Signal Processing. 1997; 56(1):33–44.

Rudy SH, Brunton SL, Proctor JL, Kutz JN. Data-driven discovery of partial differential equations. Science Advances. 2017;3(4): e1602614.

Hariri RH, Fredericks EM, Bowers KM. Uncertainity in big data analysis: Survey, opportunities and challenges. J.Big Data. 2019;6:44.
Available:https://doi.org/10.1186/s40537-019-0206-3

Kumar RK, Chadrasekaran RM. Attribute correction-data cleaning using association rule and clustering methods. Int. J. Data Mining & Knowledge Management Process. 2011;1(2):22-32.

Ogu AI, Inyama SC, Achugamonu PC. Methods of Detecting Outliers in A Regression Analysis Model. West African J. Industrial and Academic Research. 2013;7(1):105-113.

Martinez WL, Martinez AR, Solka J. Exploratory Data Analysis with MATLAB, Second Edition. Chapman & Hall/CRC; 2010.
[ISBN 9781439812204]

Mahmoudi A. Adaptive Algorithm for Estimation of Two-Dimensional Autoregressive Fields from Noisy Observations. Int. J. Stochastic Analysis. 2014;7.
[Article ID: 502406]

Alexander Ch. Sadiku M, Fundamentals of electric circuits. Fifth Edition. McGraw- Hill; 2013.

Bubba TA, Porta F, Zanghirati G, Bonettini S. A nonsmooth regularisation approach based on shearlets for Poisson noise removal of ROI tomography. Applied Mathematics and Computation. 2018;318: 131-152

Sliney DH. What is light? The visible spectrum and beyond. Eye (Lond). 2016; 30(2):222–229.

Song S, Chandhuri K, Sarwate AD. Learning from data with heterogeneous noise using SGD. JMLR Workshop Conf.Proc. 2015;894-902

Srivastava N, Hinton G, Krizhevsky A, Sutskeverilya E, Salakhutdinov R. Dropout: A simple way to prevent neural networks from over fitting. J. Machine Learning Research. 2014;15:1929-1958.

Wieringen WNV, Lecture notes on ridge regression – version 0.20; 2018.
Available: https://arxiv.org/pdf/1509.09169

Chang H, Zhang D. Machine learning subsurface flow equations from data. Computational Geosciences; 2019.
Available:http://doing.org/10.1007/s10596-019-09847-2

Hastie T, Tibshirani R, Wainwright M. Statistical Learning with Sparsity: The Lasso and Generalizations, CRC Press; 2015.

Jiang Y, Yunxiao H, Zhang H. Variable selection with prior information for generalized linear models is the prior lasso method. J. American Statistical Association. 2016;111(513):355–376.