Data discovery tools are a relatively recent phenomenon as can be witnessed by the fact that there is no separate Gartner Magic Quadrant for them. Data discovery tools allow users without programming skills to wrangle and transform raw data.
There are two main use cases for this type of tools
(1) Self-service Business Intelligence. Self-service BI promises business users without development skills to create reports and get insights from their data. We have been promised the breakthrough of self-service Business Intelligence for the last ten years. Unfortunately, it just never happened. The main reason for this is that existing BI tools are just not up for the challenge of transforming raw data. They simply lack this functionality. As a result workarounds have to be implemented. When pointed at raw data sources these tools tend to fail miserably and users become more and more frustrated.
(2) Self-service Analytics is another use case for data discovery tools. Data discovery tools help data scientists with various aspects of the modelling lifecycle. First of all, they help them to profile and explore the data. In this step data scientists gain a core understanding of their data, e.g. outliers, patterns, correlations, data distributions etc. Secondly, they help to put the data into shape for predictive models. Predictive models require the data to be in a certain format, e.g. a feature vector can only work with numeric data types or the feature values must not be empty. Data discovery tools help the data scientist to transform the data at hand into the required form and also help with feature selection. In the past this typically required some programming skills in R or Python. Data discovery tools promise to make this process easier. Thirdly, data discovery tools also have functionality to evaluate, train, and compare models. Finally, a good data discovery tool will help to deploy and retrain a model. In summary, data discovery tools take the programming out of data science
Data discovery tools work best when they are paired with a data lake, a central data hub for all of the raw enterprise data.
We are a Big Data company based in Ireland. We are experts in data lake implementations, clickstream analytics, real time analytics, and data warehousing on Hadoop and Spark. We also run the Hadoop User Group Ireland (HUG Ireland). We can help with your Big Data implementation. You can get in touch today, we would love to hear from you!