GPH GU 2450 - Intermediate Epidemiology
Analysis Project – SPRING 2024
OVERVIEW
Over the course of this semester students will plan, analyze, interpret and prepare reports of findings from an epidemiologic research investigation. Students will receive feedback on these assignments so that they are prepared to conduct the analyses and write a paper describing their findings as detailed in the following instructions.
DATA
For this assignment, the instructor will provide a subset of data from the 2017-prepandemic March 2020 National Health and Nutrition Examination Survey (NHANES), administered by the Centers for Disease Control and Prevention (CDC). In addition, you will be provided with a codebook that provides information on all relevant variables.
TOPIC
This semester, the topic will be depression. Students should review the following Morbidity and
Mortality Weekly Report, published by the CDC, to gain additional background and understanding of recent epidemiological patterns in depression among adults in the U.S.:
Lee B, Wang Y, Carlson SA, et al. National, State-Level, and County-Level Prevalence Estimates of Adults Aged ≥18 Years Self-Reporting a Lifetime Diagnosis of Depression — United States, 2020. Morbidity and Mortality Weekly Report 2023 June 16;72(24):644-650.
Important notes:
1. The following guidelines must be followed for submitting all components of the individual project:
a. 1.5 spacing for all text
b. 1-inch margins on all pages
c. Calibri 12-point font
d. Use the table templates provided and follow all instructions for formatting these tables.
PART A: RESEARCH QUESTION & HYPOTHESES
When initiating an epidemiologic investigation, an investigator must first develop a testable research question that clearly identifies the relationship between the health outcome of interest and potential determinants of this health outcome. A clearly delineated and well-conceptualized research question ensures that the study is focused and manageable, maintains a high level of scientific integrity throughout the actual process of study implementation, data collection, and data analysis.
For this part of the project, we ask that you use the information presented in the background paper identified below to obtain background information on the epidemiology of depression among adults in the U.S. :
Lee B, Wang Y, Carlson SA, et al. National, State-Level, and County-Level Prevalence Estimates of Adults Aged ≥18 Years Self-Reporting a Lifetime Diagnosis of Depression — United States, 2020. Morbidity and Mortality Weekly Report 2023 June 16;72(24):644-650.
in combination with the codebook that accompanies the data provided in the NHANES 2017-2020 dataset, to craft a clearly conceptualized and testable research question with an appropriate hypothesis(ses).
1. The research question should clearly identify the following components:
a) population of interest (e.g., women between the ages XX-YY, men between the ages of XX- YY, total population),
b) ONE main exposure of interest,
c) one main outcome of interest (this semester, depression),
d) At least ONE specific potential confounder (you may control for up to 3 confounders), and
e) ONE potential effect measure modifier
2. A hypothesis that clearly identifies the expected direction of the relationship.
3. Provide a DAG (directed acyclic graph) that displays all of the following relationships to be examined:
a) the main effect,
b) the relationship between the confounder(s) and the main outcome of interest, and
c) the relationship between the confounder(s) and the main exposure of interest.
PART B: LITERATURE REVIEW OF EPIDEMIOLOGICAL FINDINGS
Once you have developed and refined your research question, you will need to conduct a brief review
the literature to develop the “Background” or “Introduction” section of your report. The goal here is to begin the process of thinking about how you can clearly and succinctly state how this issue is a public
health problem and provide essential background data on the exposure-outcome association of interest, identified in your research question, to put your study into context.
For this assignment, you will need to:
A. Conduct a brief review of the scientific and epidemiologic literature on the risk factor of interest (exposure) and covariates proposed in the research question and a clear explanation of how they are related to the outcome of interest.
B. The literature cited in your Background/Introduction should only include studies that were conducted after the year 2000. You may use PubMed as your database.
C. The review should include no more than 5 references that provide support for the above
information. All information from the literature should be written in your own words and cited.
This assignment must be done on your own and without the use of any artificial intelligence
(AI) tools. If we detect the use of any AI tools for this assignment, you will receive an F (0
points) for this assignment. You will be able to re-write the assignment and your final grade
for the assignment will be an average of the grade on the first submission (F, 0 points) plus the grade on the second submission.
Based on your review of the literature, include the following information in your
Background/Introduction:
1. Data on the overall (US-based) prevalence of the main outcome and how the outcome is distributed within the population by person, place, and time (1 paragraph max), this should include:
o a succinct summary of studies that provide information on race, ethnicity, SES, or other disparities in the outcome of interest based on identified covariates and as related to your research question,
2. What are the negative consequences of experiencing this outcome? This discussion helps make the case for why depression is an issue of public health importance. (1 paragraph, max)
3. A succinct (1-2 paragraphs max) summary of what we know about the relationship of
interest between the selected main exposure and main outcome specifically.
4. The evidence presented above should lead you to conclude your background/introduction with your clearly stated and testable research question with appropriate hypothesis(ses).
The above information should be summarized 4 – 5 paragraphs. No more than 1-2 pages!
PART C: METHODS
The methods section of an epidemiological research paper is considered the ‘road map’ of the study, as it should provide sufficient detail for another investigator to be able to replicate the study.
With this in mind, the goal of this part of the assignment is to prepare the methodology section of your research paper. Even though the data you are using come from the NHANES, you should summarize the NHANES methodology. Do not copy the NHANES methods directly! You will need to provide the following information in your methods section, written in your own words :
REQUIRED SECTIONS:
1. Study Design (1 Paragraph)
a. Summary of NHANES study design & data collection:
o Purpose of the overall NHANES study
o Design of the overall NHANES study
o Mode of data collection: NHANES includes 2 components: interviewer administered surveys and a clinician administered medical examination.
b. Brief description of NHANES sampling design strategy:
o NHANES employs a multistage, stratified, clustered probability-based sampling design
o Note: we do not expect a full description of the details of the sampling strategy. What we
expect is that, at a minimum, students will identify that (1) it is a probability-based sampling approach and (2) what the overall design of the approach was (multistage, stratified, clustered)
c. Cite information about NHANES methodology (e.g., documentation, website)
2. Study Population (1 Paragraph)
a. Brief description of the NHANES sample includes mention of:
. NHANES only includes noninstitutionalized, civilian U.S. population
. Sample size of NHANES – provide either the sample size for the sample that completed the questionnaire (N=15,560) or the sample size for those that also did the medical exam
(N=14,300)
a. Description of the target population for student’s analysis as derived from the NHANES sample.
For example, is the student is focusing on males 45 years or older? All adults >18yo? Women between 24-45yo? The entire NHANES sample? Etc.
b. Sample size of the FINAL analytic sample for their study (n=xxxx) (i.e., if only looking at men 45- 60yo, how many men <45 and >60 were excluded/ how many women were excluded, etc. to arrive at final analytic sample size for their study). This will require looking at the actual dataset first!
3. Main Outcome of Interest (1 Paragraph)
i. Description of how the outcome (depression), as indicated in the student’s research
question, was measured in NHANES and will be operationalized for the present analysis.
4. Main Exposure of Interest (1 Paragraph)
i. Description of how the main exposure, as indicated in the student’s research question, was measured in NHANES (survey question(s), physical exam, etc.) and will be operationalized for the present analysis.
5. Covariates Of Interest (1-2 Paragraphs)
i. Description of how all other covariates, as indicated in the student’s research question, were measured in NHANES (survey question, physical exam, etc.). This would include any potential confounder(s) and the effect modifier selected previously.
Important Note: FOR ALL VARIABLES NOTED ABOVE include the following:
. Explanation of categorization or recategorization of measures to operationalize covariates (only if applicable) and citations used to inform. this categorization (only if applicable)
. If variables were combined (e.g., due to skip patterns), this must be described.
. Are multiple variables being combined in order to create a new variable? If yes, need to explain.
. In describing the measures, do not refer to them using variable names from the dataset
. Information on sociodemographic characteristics (gender, age, race/ethnicity, education,
marital status, and family income to poverty ratio) in order to be able to describe their sample better in Table 1.
6. Statistical Analysis Plan – (1-2 Paragraphs)
a. Descriptive (univariable) analysis: frequencies and percentages for categorical variables; means (sd) or medians (IQR or range) for continuous/ordinal variables – should include these measures and they should be appropriate based on the variable types
b. Bivariable analysis : describe correct statistical test for evaluating associations between
exposure, outcome, and other covariates. Test should be appropriate based on variable types.
o Describe how you would produce stratified measures of association (e.g., strata-specific OR) to assess variables identified as confounders and effect modifiers.
o Include the alpha level for statistical significance testing (generally p<0.05) and the software used (e.g., Stata Version 17)
c. Multivariable analysis: describe model building procedure for each, linear and logistic, regression analyses.
PART D: UNIVARIABLE ANALYSIS & SUMMARY
Background:
Conducting univariable analysis is essential for creating a strong foundation for your analysis and also for contextualizing your findings in your report or paper.
This is of critical importance for the following reasons:
1. In order to begin to understand if your study sample is representative of the target population, you need key summary statistics about your study sample so you can compare them to that of your target population.
2. In general, you want to provide a picture of what your study sample looks like, beyond just the factors you are examining based on your research question.
3. As we’ve discussed in class, providing information on your study sample, also provides context for explaining the results from your study. For example, what happens if your study results do not support your hypothesis?
We conduct univariable analyses on:
1. Key sociodemographic characteristics to be able to describe the analytic sample.
2. Main outcome (include, the variable in continuous and categorical formats)
3. Main exposure (include, as appropriate and necessary, the variable in continuous and categorical formats)
4. All other covariates identified in the research question and analysis plan (include, as appropriate and necessary, the variable in continuous and categorical formats)
We present results from univariable analyses in:
1. Tables:
- For continuous variables, present means (SE), medians (IQR), min/max ranges, % missing
- For categorical variables, present n(%) for each category within the variable, % missing
2. Figures: (For Continuous variables only)
- prepare histograms showing the distribution of the data
- identify any outliers and explain how you will handle these data points
We summarize the results in the tables and figures in 1-2 paragraphs.
The summary generally is the 1st 1-2 paragraphs of your Results section. The summary should include:
a. A written summary of the distribution of the sociodemographic characteristics of the study
sample, information on the main outcome, main exposure, and covariates. Specific estimates (e.g., percentages) should be reported.
b. As appropriate, are there any skip patterns that need to be accounted for in understanding the distribution of any of the above variables, if yes, need to explain.
Presenting results from univariable analyses in Tables and Figures
Table Formatting Requirements (FOR ALL TABLES):
1. A complete title at the TOP of each table that includes: Table number (i.e. Table 1, etc.) type of analysis, general description of factors examined, study population, study or dataset name,
geographic location, years of data collection, and analytic sample size.
2. Each variable gets 1 row, categories are placed under the variable name! Do not place variable names and categories in different rows!
3. Are there abbreviations that you are used in the table? If yes, then they should be explained at the bottom of the table.
4. Use font sizes that are legible. (11- or 12-point font)
5. Tables should take up only 1 page, those that run over – ask yourself why and how you can better format the table.
6. The templates below are examples of well formatted tables. Take a look at how things are spaced, lined up, etc.
7. Do not submit tables in Excel, STATA or R. You will not be able to submit them to a journal in excel, STATA or R, so practice and develop good formatting habits and skills now.
8. Bear in mind: When you prepare tables for a report, paper or presentation, you are preparing something that will be disseminated to professional and lay audiences!
Column 1:
a. Title – some tables include a general column title as well as sub-headings to delineate different types of variables summarized
b. Names of variables – presents succinct and accurate variable names! Do not use variable labels as variable names in the table.
c. Unit of measurement – how are the data going to be presented for that variable?
d. Categories for variables – should include all categories for that variable (note that category names are indented to make reading easier!)
Column 2:
a. Title of column 2: how are the data presented? Is it n (%) – makes sense if the data are categorical. But what about for continuous data? Then you present mean (SD) and median (IQR)
b. Decimal places – be consistent in your use (i.e. if you are reporting to the tenths place, do that
consistently throughout the table, do not switch from tenths to hundredths). Also, presenting more than the hundredths place is unnecessary. BUT make sure that you are rounding correctly!
c. Use leading zeros (e.g., 0.01 not .01).
d. Do your column totals for each variable add up to the total analytic sample size? If not, please provide a note explaining the degree of missingness.
Figure Formatting Requirements:
1. All figures should be clear and present data consistently.
2. Text for all figures should be clear and legible.
3. Each figure should be on one page
4. Titles should provide the same information as Tables and be at the BOTTOM of each figure.
Table 1. Univariable analysis of … among XX type of people, Study Name, Location, Years
COLUMN 1 |
COLUMN 2 n (%) mean (SD) median (IQR) |
Sociodemographic characteristics |
|
Gender Male Female |
|
Age Mean (SD, range) Median age (IQR) |
|
Race/Ethnicity Group 1 name Group 2 name Group 3 name |
|
Education level Group 1 name Group 2 name Group 3 name |
|
Marital Status Group name Group name Group name |
|
Ratio of family income to poverty <1 1 ≥ 1 |
|
Main Outcome |
|
|
|
Depression No Yes |
|
Variable X (unit of measurement), Exposure and for all Covariates |
|
Category 1 |
|
Category 2 |
|
Category 3 |
|
Category 4 |
|
SD = standard deviation
IQR = interquartile range
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。