Python Dataframes

There's 2 problems with a few parts to each problem. 

Get Help With a similar task to - Python Dataframes

Login to view and/or buy answers.. or post an answer
Additional Instructions:

For each one of the following questions, write Python code in PyCharm. For each question, create a new Python file. Name it HW4_Q1_lastname_firstname.py Create a header in each file using comments to display your name and HW information. After that write your Python code. Problem 1 mtcars.csv – Dataset extracted from the 1974 Motor Trend US magazine comprising fuel consumption and 10 aspects (attributes) of automobile design and performance for 32 automobiles is a famous problem dataset used in machine learning and data analysis. We will use this dataset to perform data abstraction, slicing, dicing and basic analysis using Pandas Series and DataFrame, as taught in the class. As you can observe in the cover page of the magazine, one of the main purpose of this data was to assist decision making on which car to purchase during the 1973 Oil Crisis which began in October 1973, when the members of the Organization of Arab Petroleum Exporting Countries proclaimed an oil embargo and the affected countries were Canada, Japan, the Netherlands, the United Kingdom and the United States and later extended to Portugal, Rhodesia and South Africa. By the end of the embargo in March 1974, the oil price had risen nearly 400%, from US$3 per barrel to nearly $12 globally; US prices were significantly higher. The embargo caused an oil crisis, or "shock", with many short and long-term effects on global politics and the global economy. It was famously called the “First Oil Shock”. 1. Despite the “First Oil shock” crisis, Jack needs to buy a new car for his daily commute to work and he decides to perform data analysis using Pandas to determine the best car to buy. He decides to perform the following tasks, but requires help in coding the requirements and hence would like to approach ITP449 students for help. (5 points) As part of first task, help Jack perform the following: a) Read the csv file using Pandas. Store the output into a dataframe frame. b) Print the dataframe. c) You notice that the index is 0..31. There is a column Car Name. d) Set the index of the dataframe to the Car Name. In other words, make the column Car Name the index of frame. e) Print frame. 2. Having obtained satisfactory results in Question1, Jack would now like to obtain the details of economic cars which are powerful and hence would like to perform following tasks: (10 points) a) Create a DataFrame using attributes: ‘Car Name’, ‘cyl, ,’gear’, ‘hp’, ‘mpg’. Make Car Name the index. Rename the columns to : Cylinders, Gear, Horsepower, Miles Per Gallon. Print the DataFrame b) Now, Jack would like to determine cars with ‘Horsepower’ more than 110, and add a separate column called ‘Powerful’ to the data frame. Print the dataframe. c) Oops, Jack accidentally deleted the column ‘Horsepower’ from the DataFrame even though it is the master column using which the column ‘Powerful’ was determined but to his surprise the DataFrame still has Powerful! Print the DataFrame with the column ‘Horse Power’ deleted, d) Using the original DataFrame (with ‘Horsepower’ column), Jack would like to list cars with ‘Miles Per Gallon’ greater than 25.0 and sort the cars in descending order of ‘Horsepower’ e) Finally, Jack decides to purchase a car that is powerful and has the highest Miles Per Gallon. Help Jack filter to that car. Problem 2 Exploring COVID-19 data from the Johns Hopkins Center for Systems Science and Engineering data repo on github. https://github.com/CSSEGISandData/COVID-19. There are two datasets that may help you with this analysis. · Go to this link - https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data. Download the latest Daily Reports dataset. E.g. 10-4-2020.csv · Go to this link for the time series data - https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series Field description · FIPS: US only. Federal Information Processing Standards code that uniquely identifies counties within the USA. · Admin2: County name. US only. · Province_State: Province, state or dependency name. · Country_Region: Country, region or sovereignty name. The names of locations included on the Website correspond with the official designations used by the U.S. Department of State. · Last Update: MM/DD/YYYY HH:mm:ss (24 hour format, in UTC). · Lat and Long_: Dot locations on the dashboard. All points (except for Australia) shown on the map are based on geographic centroids, and are not representative of a specific address, building or any location at a spatial scale finer than a province/state. Australian dots are located at the centroid of the largest city in each state. · Confirmed: Confirmed cases include presumptive positive cases and probable cases, in accordance with CDC guidelines as of April 14. · Deaths: Death totals in the US include confirmed and probable, in accordance with CDC guidelines as of April 14. · Recovered: Recovered cases outside China are estimates based on local media reports, and state and local reporting when available, and therefore may be substantially lower than the true number. US state-level recovered cases are from COVID Tracking Project. · Active: Active cases = total confirmed - total recovered - total deaths. · Incidence_Rate: Admin2 + Province_State + Country_Region. · Case-Fatality Ratio (%): = confirmed cases per 100,000 persons. · US Testing Rate: = total test results per 100,000 persons. The "total test results" is equal to "Total test results (Positive + Negative)" from COVID Tracking Project. · US Hospitalization Rate (%): = Total number hospitalized / Number confirmed cases. The "Total number hospitalized" is the "Hospitalized – Cumulative" count from COVID Tracking Project. The "hospitalization rate" and "hospitalized - Cumulative" data is only presented for those states which provide cumulative hospital data. Read the dataset(s) into pandas dataframe(s). 1. What state in the US currently has the highest number of active cases? 2. What state in the US has the highest fatality rate (deaths as a ratio of infection)? 3. What is the difference in the testing rate between the state that tests the most and the state that tests the least? 4. Plot the number of daily new cases in the US for the top 5 states with the highest confirmed cases (as of today). From March 1 – today. Use Subplot 1. 5. Plot the number of daily deaths in the US for the top 5 states with the highest confirmed cases (as of today). From March 1 – today. Use Subplot 2.

Related Questions

Similar orders to Python Dataframes
22
Views
0
Answers
introduction to programming fundamentals -
Problem Description Every business needs a mechanism to manage and track its transactions, which should be robust and efficient. You are hired by a new high-street clothing brand to develop software that manages and tracks every transaction. You are requi...
20
Views
0
Answers
Simple Python Work
To create the task shown in the screenshot: Create Number 9...
18
Views
0
Answers
Python assignment
At least 2/3 of the assignment done please...
29
Views
0
Answers
Python assigment, 3 questions and 55 marks total
There are three questions, and it would be great to have at least 2/3 of the questions completed....