联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2024-08-27 07:10

COSC2960 Foundations of Articial Intelligence for STEM

Assessment 2B

Data Analysis and Modelling Assignment

1.0 Statement of Problem/Introduction

In the era of data abundance, the significance lies not merely in its volume, but in our ability to derive meaningful insights. This task invites the application of class-acquired techniques to dissect and interpret data.

Our  focus  rests  on  scrutinizing  the  pivotal  role  of  sleep-in  overall  health  and  well-being, encompassing cognitive, emotional, and physical realms. The analysis of the Sleep Health and Lifestyle Dataset, collected via wearable devices, is paramount. Tasked as data scientists, our mission is to enhance sleep tracking accuracy and elucidate the impact of lifestyle factors on sleep quality.

Through meticulous analysis, we seek to uncover trends and correlations, illuminating pathways for enhanced sleep hygiene and the mitigation of sleep-related disorders. This endeavour extends to the identification of predictors for sleep duration and quality, clustering individuals based on sleep behaviour and lifestyle, and discerning patterns indicative of sleep disorders.

Our  pursuit is to contribute to the scholarly discourse on sleep health, ofering actionable recommendations for fostering improved sleep habits and overall well-being.

2.0 Data Summary

The following dataset contains a large variety of data categories and variables within them that are respective to our study. To be exact, there are 14 categories which outline all key aspects of sleep  health and our  knowledge of  it. We will segment them into  Numerical and  Nominal variables respectively:

2.1 Numeric Categories

-     Age: The Age of a Person in Years.

-     Sleep Duration (hours): The Number of Hours a Person has Slept in a Day.

-     Physical Activity Level (minutes/day): The Number of Minutes the Person Engages in Physical Activity per Day.

-     Systolic Blood Pressure: The blood pressure measurement of the person.

-     Diastolic Blood Pressure: The blood pressure measurement of the person.

-     Heart Rate (bpm): The resting heart rate of the person in beats per minute.

-     Daily Steps: The number of steps the person takes per day.

-     Stress Level: A subjective rating of the stress level experienced by the person.

-     Quality of Sleep: A subjective Rating of Quality of Sleep.

2.2 Nominal Categories

-     Gender: The gender of the person.

-     Occupation: The occupation or profession of the person.

-     BMI Category: The BMI category of the person.

-     Sleep Disorder: The presence or absence of a sleep disorder in the person.

3.0 Data Cleaning and Processing

3.1 Values Missing:

Category

Values Missing

Sleep Duration

2 (1%)

Quality of Sleep

4 (1%)

Physical Activity Level

1 (>1%)

Stress Level

1 (>1%)

BMI Category

7 (2%)

Systolic BP

1 (>1%)

Diastolic BP

3 (1%)

Heart Rate

4 (1%)

Daily Steps

7 (2%)

TOTAL

30

Table 1. Missing Values in Sleep Data Set

Using WEKA function NumericCleaner to display all missing values, we can clearly identify each single missing unit in our data set. 

Fig 1. WEKA Attribute Missing Values Display (Sleep Duration)

Fig 2. WEKA Attribute Missing Value Example (ID 15, Sleep Duration)

As displayed, a total of 30 values were missing from the 374 People surveyed for the study. Missing values in a large dataset can occur due to various reasons such as data entry errors, equipment  malfunction,  incomplete  surveys,  participant  non-response,  or  simply  because certain variables were not applicable or recorded. These missing values may be sporadic and random, or they could follow a pattern influenced by specific factors within the data collection process.

3.2 Dealing with Values Missing:

After having identified all missing values, still lies the decision on how to approach dealing with them. Two clear options are viable in order to contain the homogeneity of the data, that being:

1. Removing all missing values altogether.

2. Adjusting the missing values with the mean/mode of their respective category.

3. Replacing all missing values with a constant arbitrary variable.

For   the   following   reasons,    replacing   the    missing   values   using   the   WEKA   function ReplaceMissingValues with their  respective  mean/mode categories came to  be the clearer option:

Preservation of Data Integrity: By filling in missing values with the mean of their respective categories, you retain more data points, thus preserving the integrity and completeness of your dataset. This ensures that you can still analyse trends and patterns across all variables without significant loss of information.

Maintaining Sample Size: Removing missing values altogether could result in a reduction in sample size, which might afect the statistical power of your analysis. By imputing missing values with the mean, you avoid this loss of sample size, allowing for more robust statistical analysis and potentially more accurate results. 

Fig 3. WEKA Attribute Information Post Replacement (Sleep Duration)

Mitigating Bias: Removing missing values can introduce bias, especially when the missing values aren’t random. Replacing missing values with the average helps to reduce this bias by keeping the overall data distribution consistent across each category.

Simplicity and Ease of Interpretation: The average is a straightforward method to fill in missing values with. It’s simple to implement and interpret, avoiding the complexity of more complex imputation techniques.

Category

Mean/Mode Value

Sleep Duration

7.135

Quality of Sleep

7.308

Physical Activity Level

59.088

Stress Level

5.389

BMI Category

Normal

Systolic BP

128.523

Diastolic BP

84.668

Heart Rate

68.457

Daily Steps

6843.869

Table 2. Mean/Mode Values of Categories with Missing Values

Fig 4. WEKA Attribute Replaced Value Example (ID 15, Sleep Duration)

3.3 Outliers

Using the WEKA InterquartileRange function, we are able to identify all the outliers of the data set. A total of 14 outliers out of 374 instances is identified in the Sleep Data set. In other words, any instance labelled with a ‘yes’ is considered as an outlier instance, whether that be one or various values that define it as that.

Fig 5. WEKA Attribute Outliers Information 

Fig 6. WEKA Outliers Example on Data Set Table

3.4 Dealing with Outliers and Scaling Method

To deal with the outliers in the dataset, it was decided that removing all instances defined as outliers are to be removed. This scaling method is called standardisation, removing outliers through the use of the standard deviation which acts as a ‘range’ to easier identify which outliers need removing from the dataset. Utilising the use the WEKA RemoveWithValues function, all outliers are removed from the dataset. We repeat the process with ‘Extreme Values’, these are also considered outliers, just to a larger extreme than outliers.

Fig 7. WEKA Attribute Outliers Information (Post Removal)

Fig 8. WEKA Outliers Removal Example on Data SetTable

Removing outliers from a dataset can be crucial for several reasons:

Maintaining  Data  Integrity:   By  definition,  outliers  are  information  focuses  that  contrast fundamentally from most of the informational index. Even though they might address substantial perceptions,  they  may  likewise  be  because  of  blunders,  commotion,  or  exceptions.   By eliminating   outliers,  you  guarantee  that  your  dataset   precisely   mirrors  the  foundation peculiarities being considered and, in this manner, safeguards its honesty.

Enhancing Statistical Analysis: Outliers can contort factual measures like the mean, middle, and standard deviation, prompting misdirecting translations. Eliminating outliers balances out these  measurements,  making factual  investigation  more  dependable. This  guarantees that elucidating insights give a clearer perspective on the focal inclination and fluctuation of the information.

Improving Model Performance: Outliers can adversely influence the exhibition of prescient models by adding commotion and predisposition. Models prepared on datasets with outliers can deliver  less  exact  expectations  and  more  unfortunate speculations to  new  information.  By eliminating outliers, you work on the nature of the information used to prepare the model, prompting more exact and solid forecasts.

Facilitating  Visualization:  Outliers  can   misshape  information   representations,  making  it challenging to distinguish significant examples and connections. Eliminating outliers can work on the lucidity and interpretability of perceptions, empowering a superior comprehension of the fundamental construction of the information.

Ensuring Assumptions Hold: Numerous factual procedures and AI calculations expect that the information  is  regularly  dispersed  or  has  specific  properties.  Outliers  can  disregard these presumptions  and  lead  to  possibly  wrong  ends.  Eliminating  outliers  guarantees  that  the suppositions fundamental the investigation are right, subsequently working on the legitimacy of the outcomes.

4.0 Data Cleaning and Processing

4.1 Unique Values within each Category of the Dataset

Category

Unique Values

Age

1 (>1%)

Occupation

1 (>1%)

Physical Activity Level

2 (1%)

Stress Level

1 (>1%)

Systolic BP

3 (1%)

Diastolic BP

1 (>1%)

Heart Rate

2 (1%)

Daily Steps

2 (1%)

TOTAL

13

Table 3. Unique Values in Sleep Data Set

4.2 Most Common Occupations

In our deep dive into sleep data, we're not just crunching numbers; we're peeling back the layers of diferent jobs and how they shape our snooze habits.

In this massive sea of data, each job tells its own story, giving us a peek into how folks catch their Z's. Whether you're a night owl keeping watch or a day hustler hitting the grind, we're sailing through this patchwork of jobs, shedding light on how work and sleep dance together.

Fig 9. Count of Occupation Graph

Table 4. Count of Occupation

4.3 Sleep Duration Dependency on Occupation Type

In our investigation, we're diving into the impact of various professions on sleep duration. We're interested in comparing how specific occupations—scientist, salesperson, teacher, software engineer, manager, doctor, nurse, accountant, lawyer, and engineer—afect both the length and quality of sleep.

Through rigorous analysis, we aim to uncover the intricate connections between these professional roles and the crucial aspect of achieving adequate rest. Join us as we explore the relationship between occupation and sleep duration, shedding light on the diverse factors influencing our nightly rejuvenation.

Fig 10. Occupation vs Sleep Duration 

Table 5. Mean/Modal/Median Sleep Duration vs Occupation

Fig 11. Occupation Mean/Modal/Median Sleep Duration

4.4 Most Common BMI Categories

We turn our attention to the prevalence of Body Mass Index (BMI) categories. Specifically, we aim to dissect the distribution of individuals across the most common BMI classifications: Normal, Normal Weight, Obese, and Overweight.

By scrutinizing this data, we seek to unveil patterns and trends regarding the prevalence of each BMI category within our sample population.

Through meticulous analysis, we endeavour to gain insights into the frequency and proportions of individuals falling into each category, providing valuable information on the distribution of BMI across our dataset. 

Fig 12. BMI Categories Count Graph

Table 6. BMI Category Count

4.5 Age Groups and Their Respective BMI Index Categories

Through rigorous analysis, we aim to explore the demographic composition of individuals across various  BMI  ranges  by examining  both  mean and  median ages. Our goal  is to  identify any discernible patterns or trends in age distribution within each BMI category. By uncovering these insights, we hope to shed light on the relationship between BMI and age within our dataset. Join us as we delve into this analytical journey to unravel the complexities of BMI categories and age distribution, gaining valuable insights into their interplay. 

Fig 13. BMI Categories vs Age Graph

Table 7. BMI Categories and Mean/Median Age

4.6 Blood Pressure dependency on BMI Index Category

Our aim is to uncover any observable patterns or trends in blood pressure levels across diferent BMI categories, ofering valuable insights into how BMI may influence cardiovascular health within our dataset. Join us as we delve into this analytical endeavour, examining the relationship between BMI categories and blood pressure to gain a deeper understanding of cardiovascular health metrics. 

Fig 14. BMI Categories vs Blood Pressures Graph 

Table 8. Systolic/Diastolic Blood Pressure Mean Values

4.7 Diferent Sleep Quality against Increasing Stress Levels

We focus on understanding the potential impact of increased stress levels on the quality of sleep within a diverse and extensive dataset. By scrutinizing a range of variables related to stress and sleep quality, we aim to discern any correlations or trends that may exist.

Through rigorous analysis, we shed light on the intricate relationship between stress levels and sleep quality, ofering valuable insights into the factors that may influence individuals' ability to attain restful sleep amidst varying stress levels.

Fig 15. Stress Level vs Sleep Quality Graph

Table 9. Stress Level vs Sleep Quality Mean/Modal Values

4.8 Male vs Female Sleeping Disorder Count

We're examining the occurrence of sleeping disorders across genders in our dataset, aiming to identify diferences between males and females. By carefully documenting the instances of sleeping disorders in each gender, we aim to quantify and contrast their frequency. Through comprehensive analysis, our goal is to uncover any noticeable patterns or trends that might indicate which gender has a higher prevalence of sleeping disorders. 

Fig 16. Female Sleep Disorder Count Graph 

Table 10. Female Sleep Disorder Count

Fig 17. Male Sleep Disorder Count Graph

Table 11. Male Sleep Disorder Count

4.9 How Daily Steps afect Quality of Sleep

In this study, we're exploring how boosting daily step count might afect sleep quality. We're analysing data that tracks individuals' daily steps alongside metrics of sleep quality to investigate the connection between physical activity and sleep patterns. Our aim is to gain a deeper understanding of how changes in daily step count could impact factors like sleep duration, eficiency, and disturbances. We're aiming to uncover valuable insights into the potential advantages of upping physical activity levels for better sleep quality. Ultimately, our goal is to enhance our understanding of lifestyle factors that influence overall sleep health.

Fig 18. Daily Steps vs Sleep Quality Graph

 

 



版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp