Open Source Data Quality TestGen by DataKitchen

Data Profiling Is Just The Start

TestGen profiles your tables, builds a data catalog, detects 27 common data hygiene issues, and generates thousands of data quality tests—automatically.
Open Source And Complete. Low Cost Enterprise Version

 

 

The Open Source Way To Data Profiling and Data Quality

data error

51 Data Profiling Types

Uncover column-level insights and understand problematic rows.

27 Data Hygiene Detectors

After profiling is completed, TestGen automatically identifies 27 common data errors for your review

Blazing-Fast In-Database Execution

TestGen pushes queries directly into your database for speed & security.

Data Catalog

A full 360° view of metadata, hygiene issues, PII risks, data test results, and Critical Data Elements.

Data Quality Scoring & Dashboards

Automated customizable scorecards with drill-down actions to improve data quality.

One Button Data Quality Checks

Instantly generate 1000s of automated data quality tests. Start fast, scale effortlessly.

Anomaly Detection

Stay ahead of data issues with automated alerts on freshness, volume, schema, and data drift.

Shareable Issue Reports

No time? Get influence and action on data quality with a single click.

All the Checkmarks. None of the Typical Cost Burden.


DataKitchen's TestGen delivers enterprise-grade capabilities without enterprise-level costs, democratizing access to critical data
quality tools. 

What Are TestGen's 51 Data Profiling Column Characteristics?

Data profiling is the periodic X-ray of tables in a database to gather extensive information about the contents of each column. Results are stored in a standard table in DataOps TestGen. This table is available for direct review and is used to derive downstream rules.

Examples include:
• Averages
• Column & Table Types & Names
• Date Characteristics
• Min/Max Value
• Numeric Counts:
• Percentiles
• Positions
• Unique Values

What Are Examples Of The Data Hygiene Issue TestGen Finds After Data Profiling?

Once data profiling is complete, Data Hygiene Detection Tests automatically confirm how closely data structures and assumptions match the actual contents of each column. Results can be used to assist the Data Engineer in refining data structure definitions and target the addition of data ‘patching’ steps, which help to generate a more usable, analyzable dataset. 

Examples Include:

  • Invalid Zip Code Format
  • Leading Spaces
  • Mostly Dates In String
  • Mostly not null, empty, or filled values.
  • Multiple Data Types Per Column Name
  • No Column Values Present
  • Non-standard Blank Values
What Are The 31 Data Tests TestGen Creates Automatically?

The goal of Automatically Generated Data Tests is to cast a wide net for data problems that can’t be predicted by targeted testing devised in advance.  It’s the same way you might set up a burglar alarm in your home by deploying sensors at all possible entrances to catch a burglar who would only try one window.  Your goal in refining these tests is to maintain maximum sensitivity to real problems while minimizing false positives that are not worth following up on.   

Examples of Test Are:

  • Alpha Truncation
  • Average Shift
  • Constant Value Present
  • Daily Record Count
  • Value present in List-of-Values
  • Distinct Value Change
  • Value present in List-of-Values
  • Future Date
  • Incremental Average Shift
What Are TestGen's 10 Types Of Custom Tests?

Business Rule Configurable Data Tests allow you to configure data quality validation tests that can’t be gleaned automatically from prior data. It is faster and easier to set up Business Rule Configurable Data Tests than to program custom SQL. Business Rule and Data Test logic are already programmed, tested, and verified to work. To collaborate on rules and documentation, they can be configured and shared with business users in the UI.  User-created configurable Data Tests let you create reusable data quality validation tests tailored to your data sets and customers.

Examples include:

  • Data Match
  • Prior Match
  • Aggregate Match No Drops
How Do I Go From Data Profiling And Testing to End-to-End Data Observability

Data quality testing is the start, not the end. DataKitchen Open Source DataOps Observability monitors every data journey-from source to customer value-so that problems are detected, localized, and understood immediately..

Who Are You Guys?

DataKitchen: Predictable & Sustainable
• Transparent Pricing: The Enterprise version is just $100 per user per connection. Predictable costs that scale reasonably.  Typical Vendors: Opaque Pricing & Price Increases
• No Venture-Backed Uncertainty: We won't suddenly pivot pricing models or sunset features. We are a profitable, stable, reliable partner committed to customer success.

TestGen Screens

AI Drives The Brutal Math Of Manual Data Quality

AI Data Quality Crisis and Open Source

The terrifying truth is that AI amplifies data quality and data observability failures exponentially. A single schema drift that once meant a broken report now means thousands of incorrect predictions per second.

That missing data validation you postponed? It just trained your model to be confidently wrong at scale.