

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved.
94
www.persistent.com
References
[1] Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy and Bob Becker, The Data Warehouse Lifecycle
Toolkit, 2nd edition, Wiley, 2008.
http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-lifecycle-toolkit/
[2] Redman, T.C. (1997). Data Quality for the Information Age. Artech House Computer Science Library, Norwood,
MA, USA, 1997.
[3]
http://booksite.elsevier.com/9780123970336/downloads/Sebastian-Coleman_Appendix%20B.pdf[4] Haug, A., Zachariassen, F. & van Liempd, D. The costs of poor data quality. Journal of Industrial Engineering and
Management, 2011 – 4(2): 168-193.
[5] Fernando Velez, Sunil Agrawal, Data Lakes: Discovering, governing and transforming raw data into strategic data
assets, Persistent SystemsWhite Paper,August 2016,
https://www.persistent.com/wp-content/uploads/2016/05/Data-Lakes-Whitepaper.pdf[6] Gartner, p.6, Private Cloud Matures, Hybrid Cloud is Next, Gartner G00255302, Sept 6, 2013,
https://www.gartner.com/doc/2585915/private-cloud-matures-hybrid-cloud[7]
https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/[8]
https://www.gartner.com/doc/3000017/market-guide-selfservice-data-preparation[9] Kimball Design Tips -
http://www.kimballgroup.com/category/articles-design-tips/[10] Xing Luna Dong, Divesh Srivastava Big Data Integration, Morgan & Claybook Publishers, Feb 2015
https://books.google.co.in/books/about/Big_Data_Integration.html?id=p7d1BwAAQBAJ&redir_esc=y .An ICDE
tutorial was initially published in 2013 and is part of the VLDB endowment:
http://www.vldb.org/pvldb/vol6/p1188-srivastava.pdf
[11] Flach, P. and Savnik I., Database dependency discovery: a machine learning approach, Journal of AI Com.,
12(3):139–160, 1999
[12] ZAbedjan et.al., DFD: Efficient Functional Dependency Discovery, CIKM2014
[13] A. Rahman, A Novel Machine Learning Approach Toward Quality Assessment of Sensor Data IEEE Sensors
Journal, Vol 14(4),April 2014.
[14], J. Freire et.al., Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets
SIGMOD 2016
[15] T. Dasu, J. M. Loh, and D. Srivastava. Empirical glitch explanations. In KDD, pages 572–581, 2014.
[16] D.A. Cohn, L. Atlas, and R.E. Ladner, Improving Generalization with Active Learning, Machine Learning, vol. 15,
no. 2, pp. 201- 221, 1994.
[17] ZAbedjan et.al., Detecting Data Errors: Where are we and what needs to be done?, VLDB 2016
[18] Raman, A. Retail-data quality: evidence, causes, costs, and fixes (2000). Technology in Society, 22, 97–109.
dx.doi.org/10.1016/S0160-791X(99)00037-8