Identifying Columns with Missing Values in a Table
Dec 4, 2020 by Robert Gravelle
Sometimes a database administrator (DBA) needs to furnish a report on the number of missing values in a table or tables. Whether the goal is to show counts or row content with missing values, there are a couple of ways to go about it, depending on how flexible you want to be about it. The first would be to construct a query against the table(s) in question, using information that you have about field names, data types, and constraints. The second, more elaborate, approach would be to write a stored procedure that fetches column info from the INFORMATION_SCHEMA.COLUMNS table. In today’s blog, we’ll take a look at the non-generic approach, while next week’s blog will address the stored procedure solution.
Since not every field in a table can accept null values, it is helpful to inspect the table design and see which fields may contain nulls. In Navicat database development and admin clients, the Table Design identifies all mandatory columns using a checkbox under the Not null header. Hence, all columns whose checkbox is not checked may contain nulls. Those would be the fields that our query will focus:
One way to find fields with nulls is to build the query using the Query Builder tool. It lets us select many conditions from a menu, including “is null”, “is not null”, “is empty”, “is not empty”, etc…
Once built, we can insert the SQL directly into the Editor:
Here are all rows of a customers table which contain at least one column with a null value:
In cases where we only want statistics on filled versus empty fields, we can use the count() function to tally fields which are either filled or null. In the following query, percentages are expressed as a the number of rows which contain a null value for that particular field:
Likewise, we can count and shows null columns for a specific row, identified here by the customerNumber:
In the above query, a CASE statement is employed to only include null values in the counts. This time, the percentage is showing how many of the total fourteen table columns contain nulls, rounded to 2 decimal places.
In today’s blog, we learned how to query for missing values in a table or tables. Next week’s blog will introduce a more generic approach using a stored procedure. In the meantime, here’s a query for SQL Server to whet your appetite. It fetches column metadata from the INFORMATION_SCHEMA.COLUMNS table in order to generate queries for every table:
The above query will return a list of select queries. We can then copy & paste these to the Navicat Query Editor with a ‘union’ between the selects to find missing values in every table in a database!