My Intro to Data Management: Data Quality

The 4 V’s of data, volume, velocity, variety, and value are at the center of the big data quality challenge.

The Genesis of Big Data

Technological advances have led to the generation of new types of data from various sources at a rate exponential rate. The upside of this new reality has not been completely determined. We’ve seen early on how “old world” industries like utilities and fashion have begun to use big data but we also know this is only the beginning. What is clear are the data quality challenges that the big data environment presents. The 4 V’s of data, volume, velocity, variety, and value are at the center of the big data quality challenge.

As a content creator and educator I found variety to be the most interesting feature of data; how unstructured data (documents, video audio), and semi-structured (software packages and spreadsheets); combinedthese accounting for over 80% of the total amount of data. Much of this data is generated and full of information that is typically as inaccessible as the data itself.

The Difference Between Information and Data

A useful way to distinguish information from data. “Information being meaningful patterns of data that communicates and promotes understanding of the complex.” Continuing with the theme of, “how do we assess data quality?” Some would say it’s about data entry, and they would be partly right. Another lens of data quality assurance is the importance of making data accessible, readable, and discoverable regardless of its location.

I think that data openness is something many don’t think of as part of “quality” but questions about data readability are great benchmarks for guaranteeing that our data is “first class” with regards to openness and usability. Story-telling is communal, data-informed stories can mirror that ethos with high data accessibility. You can learn more about data accessibility in my piece about meta data.

Bibliography:

Cai, L., & Zhu, Y. (2015). The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal, 14(0), 2. https://doi.org/10.5334/dsj-2015-002

Common Errors in Ecological Data Sharing. (n.d.). Retrieved from http://escholarship.umassmed.edu/cgi/viewcontent.cgi?article=1024&context=jeslib

Data without Peer: Examples of Data Peer Review in the Earth Sciences. (n.d.).
Eurostat-HANDBOOK ON DATA QUALITY ASSESSMENT METHODS AND TOOLS I.pdf. (n.d.). Retrieved from http://unstats.un.org/unsd/dnss/docs-nqaf/Eurostat-

Framework of Quality Indicators. (n.d.). Retrieved from http://www.laceproject.eu/deliverables/d3-1-quality-indicators.pdf

Understanding and Managing the Risks of Analytics in Higher Education: A Guide. (n.d.). Retrieved from https://net.educause.edu/ir/library/pdf/epub1201.pdf

Leave a Reply

%d bloggers like this: