AI & the Datafication of Our Everyday Lives, Get Smarter About Data, 3 Free Courses, & More
Ed 12 | The grftf Newsletter
Hi friends -
There is a hidden industry that has taken over the world. Data.
Everyone talks about artificial intelligence, but AI cannot function without data. In fact, the much-vaunted large language models that have overrun our lives quickly degrade without a constant feed of high-quality human sourced data.
No data = no AI
As AI eats the world, the underlying data industry is really calling the shots.
In this edition, we explore this secret world of data.
Enjoy!
Misha
What you’re getting this week:
The grftf Podcast Ep 12 | AI & the Datafication of Our Everyday Lives — Wendy Wong
A Day in Data
How Data Scales: Bytes to Yottabytes
5 Data Terms that Will Make You Smarter
Types of Big Data
The 6 Vs of Big Data
3 Free Courses
Latest from The grftf Podcast
Ep 12 | AI & the Datafication of Our Everyday Lives — Wendy Wong
I have an eye-opening conversation with Dr. Wendy H. Wong about the $400 billion data industry that’s taking over our lives.
More about Wendy H. Wong:
Wendy is Professor of Political Science and Principal's Research Chair at the University of British Columbia, Okanagan, where she studies global governance and the governance of emerging technologies.
She is particularly attentive to how non-state actors (e.g. nongovernmental organizations, civil society actors, social movements, corporations) govern at the global and domestic levels. Her areas of interest are AI, Big Data, human rights, and humanitarian assistance.
She has written two award-winning books, dozens of peer-reviewed articles, and has contributed to outlets such as The Globe and Mail, The Toronto Star, and The Conversation.
She received her PhD from the UC San Diego. She did her undergrad at UC Berkeley.
Her latest book is We, the Data: Human Rights in the Digital Age
Listen on Apple Podcasts, Spotify, YouTube, Amazon Music, iHeart, & wherever you get your podcasts.
Share this newsletter & help the world get ready for the future.
A Day in Data
An explosion of data created via the internet is powering the age of AI.
Here’s what’s happening every day:
How Data Scales: Bytes to Yottabytes
5 Data Terms that Will Make You Smarter
Data literacy - the ability to read, work with, analyze, and argue with data. It is a collection of skills needed to help people navigate the digital age. LINK
Datafication - is a technological trend turning many aspects of our life into data which is subsequently transferred into information realized as a new form of value. Datafication is not the same as digitization, which takes analog content—books, films, photographs—and converts it into digital information, a sequence of ones and zeros that computers can read. Datafication is a far broader activity: taking all aspects of life and turning them into data [...] Once we datafy things, we can transform their purpose and turn the information into new forms of value. ~ Wikipedia
Big Data - the origins of the term “big data” can be traced back to the 1990s. John Mashey, then chief scientist at Silicon Graphics, was the first to use the phrase to describe large data sets. The concept further gained prominence in the early 2000s when digital innovations became widespread, creating more data.
Data Lake - a central location that holds a large amount of data in its native, raw format. Compared to a hierarchical data warehouse, which stores data in files or folders, a data lake uses a flat architecture and object storage to store the data. Object storage stores data with metadata tags and a unique identifier, which makes it easier to locate and retrieve data across regions and improves performance. By leveraging inexpensive object storage and open formats, data lakes enable many applications to take advantage of the data.
Data Mining - the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information (with intelligent methods) from a data set and transforming the information into a comprehensible structure for further use. ~ Wikipedia
Data Analytics - the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision making. Data analytics encompasses data analysis (the process of deriving information from data), data science (using data to theorize and forecast) and data engineering (building data systems). Data analysts, data scientists, and data engineers are all data analytics professionals.
Types of Big Data
Big data has three main classifications: business or structured data, unstructured data, and semi-structured data.
1. Structured data
It is the type of data most familiar to us, as it’s what we see in our everyday lives. For example, birthdays, addresses, phone numbers- even this blog post you are reading right now.
The business value of structured data lies within how well an organization can utilize its existing systems and processes for analysis purposes. This information typically comes from internal sources, such as ERP systems and CRM applications.
2. Unstructured data
It is information that does not have a predefined data model or format associated with it. This includes text documents (e.g., reports), email messages, social media feeds (Facebook status updates), XML files, video/audio recordings, etc. Unstructured data, in contrast to structured data, does not have predefined field names or variable constraints.
The business value of unstructured information lies within how well an organization can utilize its existing systems and processes for analysis purposes. This information typically comes from external sources such as social media platforms and other web-based data feeds.
Additionally, unstructured information is also known as “dark data” because it cannot be analyzed without the proper software tools or expertise to make sense of this type of content.
3. Semi-structured data
It is the information that falls somewhere between structured and unstructured data. This type of information typically comes from external sources, such as social media platforms or other web-based data feeds. Semi-structured content is often used to store metadata about a business process (e.g., an API), but it can also include files containing machine instructions for computer programs.
The business value of semi-structured data lies within how well an organization can utilize its existing systems and processes for analysis purposes. This information typically comes from external sources such as social media platforms or other web-based data feeds.
The 6 Vs of Big Data
Volume: Big data sets contain large quantities of data ingested from numerous sources, such as IoT devices, web browsing activity, social media, and other apps and equipment.
Velocity: Big data sources generate data in real time or near real time, requiring organizations to implement a data architecture that handles a constant stream of data.
Variety: Big data is usually a mix of structured, semi-structured, and unstructured data.
Veracity: The data may be of varying quality and require processing and integration with other data to provide value.
Value: Organizations can use insights from big data to improve products, build campaigns, and leverage them in many other ways that bring value to the business.
Variability: The stream of data is unpredictable. Specific events, such as holidays, may result in an increased data flow.
3 Free Data Science Courses
Very informative as always. Thank you
Fascinating to learn the brilliant minds that created AI in all its forms could have their work undone by the simple minds feeding a never ending stream of poor quality data, (conversation).