Before launching a new lactose-free product, a company needs to understand market demand and consumer needs. Data analysis is essential for identifying what enhancements are needed and ensuring a product isn't released without proper market understanding.
Case Study Overview
A new company plans to launch a lactose-free product in a city. To assess market feasibility, they collect and analyze data from grocery stores and healthcare centers.
Key Areas of Study:
Target Audience Identification:
Frequency of lactose-free product purchases
Age demographics of consumers
Number and age group of lactose-intolerant patients
Feasibility Study:
Comparison with current leading products
Survey of families interested in alternatives
Complaints against existing products
Cost Analysis:
Pricing of current leading lactose-free products
Using DataOps in the Survey
What is DataOps?
DataOps is a methodology inspired by DevOps and Agile, aiming to streamline the entire data lifecycle—from collection and cleaning to validation, transformation, and orchestration—ensuring collaboration and data quality for decision-making.
End-to-End DataOps Workflow for Survey
Raw Data Collection:
Survey responses collected in CSV format (e.g., participant demographics, lactose intolerance status, dairy/lactose-free usage frequency)
Data Processing (Python):
Cleans the data (standardizes fields, formats timestamps, renames columns)
Uses Airflow for automation
Data Quality Check (Great Expectations):
Ensures fields like gender, usage frequency, and diagnosis values meet expectations
Flags missing or invalid entries before further processing
Data Modeling (dbt):
Aggregates cleaned data (e.g., by age group, gender, diagnosis) into summary tables for analysis
Workflow Orchestration (Apache Airflow DAG):
Automates the pipeline:
Step 1: Clean raw data
Step 2: Validate cleaned data
Ensures all tasks run in correct order with dependency management