Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 200 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 7.9 KiB |
| Average record size in memory | 40.6 B |
Variable types
| Numeric | 4 |
|---|---|
| Categorical | 1 |
Alerts
CustomerID is highly correlated with Annual Income (k$) | High correlation |
Annual Income (k$) is highly correlated with CustomerID | High correlation |
CustomerID is highly correlated with Annual Income (k$) | High correlation |
Annual Income (k$) is highly correlated with CustomerID | High correlation |
CustomerID is highly correlated with Annual Income (k$) | High correlation |
Annual Income (k$) is highly correlated with CustomerID | High correlation |
CustomerID is highly correlated with Age and 2 other fields | High correlation |
Age is highly correlated with CustomerID and 1 other fields | High correlation |
Annual Income (k$) is highly correlated with CustomerID and 1 other fields | High correlation |
Spending Score (1-100) is highly correlated with CustomerID and 2 other fields | High correlation |
CustomerID is uniformly distributed | Uniform |
CustomerID has unique values | Unique |
Reproduction
| Analysis started | 2023-03-27 15:14:17.825636 |
|---|---|
| Analysis finished | 2023-03-27 15:14:22.032250 |
| Duration | 4.21 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
CustomerID
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIFORMUNIQUE| Distinct | 200 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 100.5 |
| Minimum | 1 |
|---|---|
| Maximum | 200 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 10.95 |
| Q1 | 50.75 |
| median | 100.5 |
| Q3 | 150.25 |
| 95-th percentile | 190.05 |
| Maximum | 200 |
| Range | 199 |
| Interquartile range (IQR) | 99.5 |
Descriptive statistics
| Standard deviation | 57.87918451 |
|---|---|
| Coefficient of variation (CV) | 0.5759122837 |
| Kurtosis | -1.2 |
| Mean | 100.5 |
| Median Absolute Deviation (MAD) | 50 |
| Skewness | 0 |
| Sum | 20100 |
| Variance | 3350 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1 | 0.5% |
| 138 | 1 | 0.5% |
| 128 | 1 | 0.5% |
| 129 | 1 | 0.5% |
| 130 | 1 | 0.5% |
| 131 | 1 | 0.5% |
| 132 | 1 | 0.5% |
| 133 | 1 | 0.5% |
| 134 | 1 | 0.5% |
| 135 | 1 | 0.5% |
| Other values (190) | 190 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 200 | 1 | |
| 199 | 1 | |
| 198 | 1 | |
| 197 | 1 | |
| 196 | 1 | |
| 195 | 1 | |
| 194 | 1 | |
| 193 | 1 | |
| 192 | 1 | |
| 191 | 1 |
Gender
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.7 KiB |
| Female | |
|---|---|
| Male |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 5.12 |
| Min length | 4 |
Characters and Unicode
| Total characters | 1024 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Male |
|---|---|
| 2nd row | Male |
| 3rd row | Female |
| 4th row | Female |
| 5th row | Female |
Common Values
| Value | Count | Frequency (%) |
| Female | 112 | |
| Male | 88 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| female | 112 | |
| male | 88 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 312 | |
| a | 200 | |
| l | 200 | |
| F | 112 | 10.9% |
| m | 112 | 10.9% |
| M | 88 | 8.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 824 | |
| Uppercase Letter | 200 | 19.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 312 | |
| a | 200 | |
| l | 200 | |
| m | 112 | 13.6% |
Uppercase Letter
| Value | Count | Frequency (%) |
| F | 112 | |
| M | 88 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1024 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 312 | |
| a | 200 | |
| l | 200 | |
| F | 112 | 10.9% |
| m | 112 | 10.9% |
| M | 88 | 8.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1024 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 312 | |
| a | 200 | |
| l | 200 | |
| F | 112 | 10.9% |
| m | 112 | 10.9% |
| M | 88 | 8.6% |
| Distinct | 51 |
|---|---|
| Distinct (%) | 25.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.85 |
| Minimum | 18 |
|---|---|
| Maximum | 70 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 28.75 |
| median | 36 |
| Q3 | 49 |
| 95-th percentile | 66.05 |
| Maximum | 70 |
| Range | 52 |
| Interquartile range (IQR) | 20.25 |
Descriptive statistics
| Standard deviation | 13.96900733 |
|---|---|
| Coefficient of variation (CV) | 0.3595626083 |
| Kurtosis | -0.6715728616 |
| Mean | 38.85 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 0.485568851 |
| Sum | 7770 |
| Variance | 195.1331658 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 32 | 11 | 5.5% |
| 35 | 9 | 4.5% |
| 19 | 8 | 4.0% |
| 31 | 8 | 4.0% |
| 30 | 7 | 3.5% |
| 49 | 7 | 3.5% |
| 40 | 6 | 3.0% |
| 38 | 6 | 3.0% |
| 47 | 6 | 3.0% |
| 27 | 6 | 3.0% |
| Other values (41) | 126 |
| Value | Count | Frequency (%) |
| 18 | 4 | |
| 19 | 8 | |
| 20 | 5 | |
| 21 | 5 | |
| 22 | 3 | 1.5% |
| 23 | 6 | |
| 24 | 4 | |
| 25 | 3 | 1.5% |
| 26 | 2 | 1.0% |
| 27 | 6 |
| Value | Count | Frequency (%) |
| 70 | 2 | |
| 69 | 1 | 0.5% |
| 68 | 3 | |
| 67 | 4 | |
| 66 | 2 | |
| 65 | 2 | |
| 64 | 1 | 0.5% |
| 63 | 2 | |
| 60 | 3 | |
| 59 | 4 |
| Distinct | 64 |
|---|---|
| Distinct (%) | 32.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 60.56 |
| Minimum | 15 |
|---|---|
| Maximum | 137 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 15 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 41.5 |
| median | 61.5 |
| Q3 | 78 |
| 95-th percentile | 103 |
| Maximum | 137 |
| Range | 122 |
| Interquartile range (IQR) | 36.5 |
Descriptive statistics
| Standard deviation | 26.26472117 |
|---|---|
| Coefficient of variation (CV) | 0.4336975093 |
| Kurtosis | -0.09848708653 |
| Mean | 60.56 |
| Median Absolute Deviation (MAD) | 16.5 |
| Skewness | 0.3218425499 |
| Sum | 12112 |
| Variance | 689.8355779 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 54 | 12 | 6.0% |
| 78 | 12 | 6.0% |
| 48 | 6 | 3.0% |
| 71 | 6 | 3.0% |
| 63 | 6 | 3.0% |
| 62 | 6 | 3.0% |
| 87 | 6 | 3.0% |
| 60 | 6 | 3.0% |
| 88 | 4 | 2.0% |
| 77 | 4 | 2.0% |
| Other values (54) | 132 |
| Value | Count | Frequency (%) |
| 15 | 2 | |
| 16 | 2 | |
| 17 | 2 | |
| 18 | 2 | |
| 19 | 4 | |
| 20 | 4 | |
| 21 | 2 | |
| 23 | 2 | |
| 24 | 2 | |
| 25 | 2 |
| Value | Count | Frequency (%) |
| 137 | 2 | |
| 126 | 2 | |
| 120 | 2 | |
| 113 | 2 | |
| 103 | 4 | |
| 101 | 2 | |
| 99 | 2 | |
| 98 | 2 | |
| 97 | 2 | |
| 93 | 2 |
| Distinct | 84 |
|---|---|
| Distinct (%) | 42.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.2 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 34.75 |
| median | 50 |
| Q3 | 73 |
| 95-th percentile | 92 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 38.25 |
Descriptive statistics
| Standard deviation | 25.82352167 |
|---|---|
| Coefficient of variation (CV) | 0.5144127822 |
| Kurtosis | -0.8266291062 |
| Mean | 50.2 |
| Median Absolute Deviation (MAD) | 20 |
| Skewness | -0.04722020137 |
| Sum | 10040 |
| Variance | 666.8542714 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 42 | 8 | 4.0% |
| 55 | 7 | 3.5% |
| 46 | 6 | 3.0% |
| 73 | 6 | 3.0% |
| 35 | 5 | 2.5% |
| 52 | 5 | 2.5% |
| 59 | 5 | 2.5% |
| 48 | 5 | 2.5% |
| 75 | 5 | 2.5% |
| 50 | 5 | 2.5% |
| Other values (74) | 143 |
| Value | Count | Frequency (%) |
| 1 | 2 | |
| 3 | 1 | 0.5% |
| 4 | 2 | |
| 5 | 4 | |
| 6 | 2 | |
| 7 | 1 | 0.5% |
| 8 | 1 | 0.5% |
| 9 | 1 | 0.5% |
| 10 | 2 | |
| 11 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 99 | 1 | 0.5% |
| 98 | 1 | 0.5% |
| 97 | 2 | |
| 95 | 2 | |
| 94 | 1 | 0.5% |
| 93 | 2 | |
| 92 | 3 | |
| 91 | 2 | |
| 90 | 2 | |
| 89 | 1 | 0.5% |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| CustomerID | Gender | Age | Annual Income (k$) | Spending Score (1-100) | |
|---|---|---|---|---|---|
| 0 | 1 | Male | 19 | 15 | 39 |
| 1 | 2 | Male | 21 | 15 | 81 |
| 2 | 3 | Female | 20 | 16 | 6 |
| 3 | 4 | Female | 23 | 16 | 77 |
| 4 | 5 | Female | 31 | 17 | 40 |
| 5 | 6 | Female | 22 | 17 | 76 |
| 6 | 7 | Female | 35 | 18 | 6 |
| 7 | 8 | Female | 23 | 18 | 94 |
| 8 | 9 | Male | 64 | 19 | 3 |
| 9 | 10 | Female | 30 | 19 | 72 |
Last rows
| CustomerID | Gender | Age | Annual Income (k$) | Spending Score (1-100) | |
|---|---|---|---|---|---|
| 190 | 191 | Female | 34 | 103 | 23 |
| 191 | 192 | Female | 32 | 103 | 69 |
| 192 | 193 | Male | 33 | 113 | 8 |
| 193 | 194 | Female | 38 | 113 | 91 |
| 194 | 195 | Female | 47 | 120 | 16 |
| 195 | 196 | Female | 35 | 120 | 79 |
| 196 | 197 | Female | 45 | 126 | 28 |
| 197 | 198 | Male | 32 | 126 | 74 |
| 198 | 199 | Male | 32 | 137 | 18 |
| 199 | 200 | Male | 30 | 137 | 83 |