Big Data is going to change our lives. But how exactly? That remains difficult to answer. However, to get a bit closer to an answer, we are going to share our experiences in this blog. Sometimes very practical, sometimes a bit more theoretical; sometimes from a technical point of view, sometimes from a business perspective. But of course we start ... at the beginning.
Many readers have signed up for our blog, which is fantastic. But all those people have different levels of knowledge and different interests when it comes to Big Data. So if you are already a power user of Excel or database programs and you already know a lot about the subject, we ask you for a little bit of patience. We will dive deeper into analytics later. And for those who do not (want to) understand databases and machine learning algorithms: we will also discuss business cases. And the best news is, perhaps, that we also take requests! So if you have a specific question that you would like to be answered here, do not hesitate to email us!
The term Big Data became popular in the 1990s. At that time the term was mainly used to designate datasets that were so large and complex that the databases and analysis tools at that time could not handle them properly. But what was big in the 90s may no longer be considered big or complex today. Despite the fact that storing and processing large data sets has become simpler and cheaper, we have continued to use the term Big Data. Nowadays, when we use the term Big Data, we actually mean any type of (predictive) analysis that can create information from one or more raw datasets. So, it is no longer about the size of the dataset, but more about the methods we use to analyze the data.
1. Data sources:
Here the data is generated. Usually it concerns transactional data (such as sales), click behavior on websites, e-mails, data collected by sensors, GPS trackers etc.
2. Integration:
The phase in which the source data is moved, and sometimes transformed in order to be able to better store, update or edit it.
3. Data stores:
The databases where the analyses are performed. We use databases that are specially designed for analysis, making it more efficient and we do not have to disrupt the source systems by querying for analysis.
4. Analytical methods and techniques:
Structured analysis. This can be done in Excel, but also with specialized analysis tools and platforms that support advanced analytical methods. Here, artificial intelligence and machine learning models are developed.
5. Data visualization, reporting or interactive sharing of information from databases or outcomes of analyses.
6. Integration of results and models into applications that can be in daily business.
In the last integration step, the results of analyses and models are used in practice. A good and simple example of a machine learning model is the advice that web shops give to you as a customer, based on purchases and search behavior from the past, of yourself and thousands of other visitors. If you are looking for a book in a webshop like Amazon, you get the advice to buy two more books, for example, under a heading like 'Frequently bought together' or 'Others viewed too'. All of you probably recognize this phenomenon. The algorithm behind it is the association algorithm, which is usually called the shopping-basket analysis because of this application. So, now you know how it is called too. We will explain another time how that algorithm works. I drew in red how the previous solution would look:
At Tecknoworks we also work on this type of models. It does not necessarily have to be about giving good advice in a bookstore. For example, with a comparable model we try to say something about medication use in patients with diabetes. Based on medication history, we can make a prediction about the next medication that a patient will probably need. Other analysis projects we are involved in include fraud detection with insurance claims submitted, identifying customers who may be about to leave, or predicting consumption of goods or services over a certain period of time.
If you are not yet structurally involved with Big-Data projects: you can just start today. Take the first step: keep your data. Then think about how you can use that data. Try to discover trends or patterns and experiment. Start small, and build it up slowly. Excel is really great to start with. You can always apply complex algorithms later. If you are already working on data projects, I wonder what you have achieved and what you might encounter. Call us, email us, and we can think along with you! And we may write about it next time.