Overview

Dataset statistics

Number of variables5
Number of observations200
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.9 KiB
Average record size in memory40.6 B

Variable types

Numeric4
Categorical1

Alerts

CustomerID is highly correlated with Annual Income (k$)High correlation
Annual Income (k$) is highly correlated with CustomerIDHigh correlation
CustomerID is highly correlated with Annual Income (k$)High correlation
Annual Income (k$) is highly correlated with CustomerIDHigh correlation
CustomerID is highly correlated with Annual Income (k$)High correlation
Annual Income (k$) is highly correlated with CustomerIDHigh correlation
CustomerID is highly correlated with Age and 2 other fieldsHigh correlation
Age is highly correlated with CustomerID and 1 other fieldsHigh correlation
Annual Income (k$) is highly correlated with CustomerID and 1 other fieldsHigh correlation
Spending Score (1-100) is highly correlated with CustomerID and 2 other fieldsHigh correlation
CustomerID is uniformly distributed Uniform
CustomerID has unique values Unique

Reproduction

Analysis started2023-03-27 15:14:17.825636
Analysis finished2023-03-27 15:14:22.032250
Duration4.21 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

CustomerID
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct200
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.5
Minimum1
Maximum200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-03-27T15:14:22.220099image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile10.95
Q150.75
median100.5
Q3150.25
95-th percentile190.05
Maximum200
Range199
Interquartile range (IQR)99.5

Descriptive statistics

Standard deviation57.87918451
Coefficient of variation (CV)0.5759122837
Kurtosis-1.2
Mean100.5
Median Absolute Deviation (MAD)50
Skewness0
Sum20100
Variance3350
MonotonicityStrictly increasing
2023-03-27T15:14:22.522808image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.5%
1381
 
0.5%
1281
 
0.5%
1291
 
0.5%
1301
 
0.5%
1311
 
0.5%
1321
 
0.5%
1331
 
0.5%
1341
 
0.5%
1351
 
0.5%
Other values (190)190
95.0%
ValueCountFrequency (%)
11
0.5%
21
0.5%
31
0.5%
41
0.5%
51
0.5%
61
0.5%
71
0.5%
81
0.5%
91
0.5%
101
0.5%
ValueCountFrequency (%)
2001
0.5%
1991
0.5%
1981
0.5%
1971
0.5%
1961
0.5%
1951
0.5%
1941
0.5%
1931
0.5%
1921
0.5%
1911
0.5%

Gender
Categorical

Distinct2
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
Female
112 
Male
88 

Length

Max length6
Median length6
Mean length5.12
Min length4

Characters and Unicode

Total characters1024
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female112
56.0%
Male88
44.0%

Length

2023-03-27T15:14:22.809675image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-27T15:14:23.077216image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
female112
56.0%
male88
44.0%

Most occurring characters

ValueCountFrequency (%)
e312
30.5%
a200
19.5%
l200
19.5%
F112
 
10.9%
m112
 
10.9%
M88
 
8.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter824
80.5%
Uppercase Letter200
 
19.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e312
37.9%
a200
24.3%
l200
24.3%
m112
 
13.6%
Uppercase Letter
ValueCountFrequency (%)
F112
56.0%
M88
44.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1024
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e312
30.5%
a200
19.5%
l200
19.5%
F112
 
10.9%
m112
 
10.9%
M88
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1024
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e312
30.5%
a200
19.5%
l200
19.5%
F112
 
10.9%
m112
 
10.9%
M88
 
8.6%

Age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct51
Distinct (%)25.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.85
Minimum18
Maximum70
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-03-27T15:14:23.286477image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile19
Q128.75
median36
Q349
95-th percentile66.05
Maximum70
Range52
Interquartile range (IQR)20.25

Descriptive statistics

Standard deviation13.96900733
Coefficient of variation (CV)0.3595626083
Kurtosis-0.6715728616
Mean38.85
Median Absolute Deviation (MAD)11
Skewness0.485568851
Sum7770
Variance195.1331658
MonotonicityNot monotonic
2023-03-27T15:14:23.568069image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3211
 
5.5%
359
 
4.5%
198
 
4.0%
318
 
4.0%
307
 
3.5%
497
 
3.5%
406
 
3.0%
386
 
3.0%
476
 
3.0%
276
 
3.0%
Other values (41)126
63.0%
ValueCountFrequency (%)
184
2.0%
198
4.0%
205
2.5%
215
2.5%
223
 
1.5%
236
3.0%
244
2.0%
253
 
1.5%
262
 
1.0%
276
3.0%
ValueCountFrequency (%)
702
1.0%
691
 
0.5%
683
1.5%
674
2.0%
662
1.0%
652
1.0%
641
 
0.5%
632
1.0%
603
1.5%
594
2.0%

Annual Income (k$)
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct64
Distinct (%)32.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60.56
Minimum15
Maximum137
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-03-27T15:14:23.823412image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile19
Q141.5
median61.5
Q378
95-th percentile103
Maximum137
Range122
Interquartile range (IQR)36.5

Descriptive statistics

Standard deviation26.26472117
Coefficient of variation (CV)0.4336975093
Kurtosis-0.09848708653
Mean60.56
Median Absolute Deviation (MAD)16.5
Skewness0.3218425499
Sum12112
Variance689.8355779
MonotonicityIncreasing
2023-03-27T15:14:24.118815image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5412
 
6.0%
7812
 
6.0%
486
 
3.0%
716
 
3.0%
636
 
3.0%
626
 
3.0%
876
 
3.0%
606
 
3.0%
884
 
2.0%
774
 
2.0%
Other values (54)132
66.0%
ValueCountFrequency (%)
152
1.0%
162
1.0%
172
1.0%
182
1.0%
194
2.0%
204
2.0%
212
1.0%
232
1.0%
242
1.0%
252
1.0%
ValueCountFrequency (%)
1372
1.0%
1262
1.0%
1202
1.0%
1132
1.0%
1034
2.0%
1012
1.0%
992
1.0%
982
1.0%
972
1.0%
932
1.0%

Spending Score (1-100)
Real number (ℝ≥0)

HIGH CORRELATION

Distinct84
Distinct (%)42.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.2
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-03-27T15:14:24.386767image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q134.75
median50
Q373
95-th percentile92
Maximum99
Range98
Interquartile range (IQR)38.25

Descriptive statistics

Standard deviation25.82352167
Coefficient of variation (CV)0.5144127822
Kurtosis-0.8266291062
Mean50.2
Median Absolute Deviation (MAD)20
Skewness-0.04722020137
Sum10040
Variance666.8542714
MonotonicityNot monotonic
2023-03-27T15:14:24.676592image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
428
 
4.0%
557
 
3.5%
466
 
3.0%
736
 
3.0%
355
 
2.5%
525
 
2.5%
595
 
2.5%
485
 
2.5%
755
 
2.5%
505
 
2.5%
Other values (74)143
71.5%
ValueCountFrequency (%)
12
1.0%
31
 
0.5%
42
1.0%
54
2.0%
62
1.0%
71
 
0.5%
81
 
0.5%
91
 
0.5%
102
1.0%
111
 
0.5%
ValueCountFrequency (%)
991
 
0.5%
981
 
0.5%
972
1.0%
952
1.0%
941
 
0.5%
932
1.0%
923
1.5%
912
1.0%
902
1.0%
891
 
0.5%

Interactions

2023-03-27T15:14:20.480181image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:18.088307image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:18.894480image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:19.681116image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:20.693755image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:18.296168image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:19.097819image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:19.890538image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:20.881626image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:18.493973image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:19.290779image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:20.089833image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:21.279565image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:18.705745image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:19.485612image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:20.283196image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-03-27T15:14:24.975711image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:25.290487image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:25.588564image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-03-27T15:14:25.924746image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Missing values

2023-03-27T15:14:21.588509image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-03-27T15:14:21.924024image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

CustomerIDGenderAgeAnnual Income (k$)Spending Score (1-100)
01Male191539
12Male211581
23Female20166
34Female231677
45Female311740
56Female221776
67Female35186
78Female231894
89Male64193
910Female301972

Last rows

CustomerIDGenderAgeAnnual Income (k$)Spending Score (1-100)
190191Female3410323
191192Female3210369
192193Male331138
193194Female3811391
194195Female4712016
195196Female3512079
196197Female4512628
197198Male3212674
198199Male3213718
199200Male3013783