A well-known quote in the business world is “cash is king.” Nothing will destroy a business faster than a lack of liquidity to meet a financial emergency. What your worth may not matter as much as what you can spend that makes a difference.
However, there is now a challenge to this mantra. In the world of data science, there is the belief that data is king. This can potentially make sense as using data to foresee financial disaster can help people to have cash ready.
In this post, we are going to examine the different types of data in the world of data science. Generally, there are two types of data which are unstructured and structured data.
Unstructured data is data that is produced by people. Normally, this data is text heavy. Examples of unstructured data include twits on Twitter, customer feedback on Amazon, blogs, emails, etc. This type of data is very challenging to work with because it is not necessarily in a format for analysis.
Despite the challenges, there are techniques available for using this information to make decisions. Often, the analysis of unstructured data is used to target products and make recommendations for purchases by companies.
Structured data is in many ways the complete opposite of unstructured data. Structured data has a clear format and a specific place for various pieces of data. An excel document is one example of structured data. A receipt is another example. A receipt has a specific place for different pieces of information such as price, total, date, etc. Often, structured data is made by organizations and machines.
Naturally, analyzing structured data is often much easier than unstructured data. With a consistent format, there is less processing required before analysis.
Working With Data
When approaching a project, data often comes from several sources. Normally, the data has to be moved around and consolidated into one space for analysis. When working with unstructured and or structured data that is coming from several different sources, there is a three-step process used to facilitate this. The process is called ETL which stands for extract, transform, and load.
Extracting data means taking it from one place and planning to move it somewhere else. Transform means changing the data in some way or another. For example, this often means organizing it for the purposes of answer research questions. How this is done is context specific.
Load simply means placing all the transformed data into one place for analysis. This is a critical last step as it is helpful to have what you are analyzing in one convenient place. The details of this will be addressed in a future post.
In what may be an interesting contradiction, as we collect more and more data, data is actually becoming more valuable. Normally, an increase in a resource lessens its value but not with data. Organizations are collecting data at a recording break in order to anticipate the behavior of people. This predictive power derived from data can lead to significant profits, which leads to the conclusion that perhaps data is now the king.