EPI 534 Fall 2019: Midterm Exam (SAS)
EPI 534 Fall 2022: SAS Final Exam
Please read the following instructions carefully.
1. This is an EXAM. You may NOT discuss the exam or work together on the exam with any other person. If you have questions about the exam, please contact the instructor. This exam is being offered as a take-home exam as a courtesy to you – please do not put the rest of the class at a disadvantage by sharing or discussing the exam.
2. You have 2 weeks to complete this exam. You must submit your final exam on Canvas by Wednesday, 10/19/2022, 5:00pm.
a. As noted in the syllabus, exams turned in late without the permission of the instructor will receive a 20% point reduction for each 24-hour period past the deadline. Late exams will also need to be submitted directly to the instructor by email as Canvas will be closed for submissions after 5:00pm on 10/19/2022.
b. Please take advantage of the next 2 weeks - do not wait until the last minute to complete the exam! This exam will take more time than a single homework assignment and this course (and others) will be continuing to move forward at the same time. You want to make sure you carve out enough time to complete the exam and have an opportunity to ask questions; it might not be possible to meet or receive a response if you request help too close to the deadline.
3. You may use your class notes, books, or other materials to assist you with the exam.
a. You are welcome to attend any regularly scheduled office hours with the Teaching Assistants during this time, but note that the TAs are not grading the exams and will only assist with general questions about concepts covered on the exam.
b. Please direct any specific questions about the exam to the instructor by sending an email (firstname.lastname@example.org) or a private Slack message.
4. To complete the exam, please show all of your programming. Use comments to indicate each question number so it will be easy to read. You should also use comments to number and answer any follow-up questions about the data in your program. You do not need to include screenshots of your output from frequency or print procedures, only the code and the comments with your response to any questions.
5. Similar to the homework, you will copy and paste your final SAS code into a Word document for the submission on Canvas.
a. PLEASE MAKE SURE YOUR WORD DOCUMENT MAINTAINS THE COLOR AND FORMATTING OF THE SAS EDITOR. If you are using Apporto, the easiest way to do this is to create the Word document in Apporto, then paste from SAS. Do not take screenshots of your code. If you do not understand what this means or have not been able to do this successfully for the homework, PLEASE ask the TAs or instructor for assistance before you submit your exam.
6. Read each question carefully and make sure you are following all instructions.
7. Check your work! Even if you are not asked for a specific step to check something as part of a question, you may still want to take some steps to confirm that your results are as expected.
The SAS datasets patient_demo.sas7bdat and patient_clinic.sas7bdat contain demographic and clinical information about participants enrolled in a recent cross-sectional research study. The goal of this exam is to guide you through the process of using SAS to inspect and prepare the dataset for analysis.
Use SAS to perform the following database management tasks:
1. You have been given two files, patient_demo.sas7bdat and patient_clinic.sas7bdat. Take a look at these files in the ways that you have learned and use the appropriate method to combine them into one temporary SAS dataset called ALLDATA.
2. To become familiar with the data, run frequency distributions on ALL of the character variables in the dataset and run a univariate procedure on ALL of the numeric variables. You may also find it useful to look at the frequencies for some of the numeric variables. During the remainder of the exam you will be getting this dataset ready for analysis. Make sure that the final dataset contains ALL of the data changes that you are asked to make. Answer the following questions using comments in your program:
a. What percent of subjects are in treatment group B (TXGROUP)?
b. How many participants have moderate disease (DISLEVEL)?
c. What is the latest/most recent visit date a participant had in the study (VISDATE)?
d. What is the birth date of the oldest subject (DOB)?
3. Starting with your dataset ALLDATA, perform the following database management tasks in a new temporary dataset called TEMP1:
a. Calculate each person’s age as of the date of their visit (VISDATE) using the birth date (DOB). The age should be an INTEGER value and have the variable name AGE.
b. Create a new variable called FTIME that is the number of years between the date of their visit and October 2, 2022.
c. You have determined that values of WEIGHTKG greater than the 99th percentile (131 kg) and those less than the 1st percentile (14.5 kg) are data entry errors. Set the weights for subjects with values outside of these limits equal to missing.
d. The scale was not calibrated properly before the first study visit on January 1, 2019 but fortunately, only one participant had a visit on that day and the scale was repaired before any other study participants were seen. Correct the weight value for the first participant by subtracting 3 kg from their current weight value (WEIGHTKG).
e. Use the HEIGHTCM and WEIGHTKG variables to calculate BMI for each person (100 centimeters = 1 meter). Run a procedure that will let you make sure that your program correctly calculated BMI for each person.
4. The variables STATUS, HASDIS, FAMHX, TXGROUP, and DISLEVEL are character variables. In a temporary dataset called TEMP2, create new variables called STATUS2, HASDIS2, FAMHX2, TXGROUP2 and DISLEVEL2 that are numeric equivalents. In the new variables, code any established patients as 0, any new patients as 1 and any transfer patients as 2; code Y as 1 and N as 0; code treatment A as 0 and treatment B as 1; and code mild as 0, moderate as 1 and severe as 2. Perform a procedure that allows you to compare the values between the old character and new numeric variables and check for any issues.
5. In a temporary dataset called TEMP3, rename the HASDIS2 variable CASE. In the dataset, label the STATUS2 variable “Status”, the FAMHX2 variable “Family history”, and the DISLEVEL2 variable “Severity of condition”. Run a cross tabulation of each of these variables with CASE so that you can see the labels.
6. Use the univariate procedure to determine the tertile (3-level) cutpoints for the variable EVENTS. Using these values, in a dataset called TEMP4 create a categorical variable called EVENTLEV that has the values 0, 1 and 2 for the three levels. Run a procedure that gives the distribution of this variable for the cases only (CASE=1).
7. The variables prod_1 through prod_30 store information about products used by the patient during the last 6 months to alleviate symptoms. Because a patient may have used more than one product during the period, a separate variable was used for each product and was coded as 1 if the patient used that product or 0 if they did not. For example, a patient that used product 1 would have the value 1 in the variable prod_1, a patient that did not use product 3 would have a value of 0 in the variable prod_3, etc. Suppose Company Meds-R-US made products 1, 2, 3, 10, 12, 14, 19, 25, 28, and 30. You want to know how many patients used products made by this company. In a dataset called TEMP5, use an array to create a variable called USED which has a value of 1 if the patient used ANY of these 10 Company Meds-R-US products and 0 otherwise.
a. What percentage of patients used any of these 10 products made by the Company Meds-R-US?
8. From the dataset created in question #7 (TEMP5), create a subset as a temporary dataset called MILDDIS that has all of the variables except for DOB and VISDATE, and that contains only patients who are less than 40 years old and who have mild disease.
9. You are wondering if you should create a categorical variable for weight in your dataset. To see what it might look like you decide to create a temporary format called WTFMT. Create this format for 20 kilogram weight groups (i.e., 10-29.9, 30-49.9, 50-69.9, 70-89.9, 90-109.9, 110+). Using the TEMP5 dataset, apply the format in a cross tabulation of the patient’s weight and the case variable.
10. Create a permanent format library with the following formats: CASEFMT where 1 is Case, 0 is Control; and SEVFMT where 0 is Mild, 1 is Moderate, and 2 is Severe. Create a permanent SAS dataset called SFINAL from TEMP5 in which you permanently assign the MMDDYY10 format to the date variables (VISDATE and DOB), the CASEFMT to the variable CASE and SEVFMT to DISLEVEL2. Use your permanent dataset SFINAL to perform a cross tabulation of CASE and DISLEVEL2 so that you can see the results of the formatting.
11. Run a contents procedure on the ALLDATA and SFINAL datasets. In addition, run a COMPARE procedure on the datasets.
a. How many new variables did you create? For how many subjects did you change the value of a variable as part of your data cleaning?
Congratulations, your dataset is now ready for analysis.
BONUS POINT: What code can be used above the libname statement to get SAS to open a dataset with permanent formats assigned when it does not have the format library?