×
Samples Blogs Make Payment About Us Reviews 4.9/5 Order Now

Create a Program to Implement NYC Datasets in Python Assignment Solution

July 04, 2024
Dr. Andrew Taylor
Dr. Andrew
🇨🇦 Canada
Python
Dr. Andrew Taylor, a renowned figure in the realm of Computer Science, earned his PhD from McGill University in Montreal, Canada. With 7 years of experience, he has tackled over 500 Python assignments, leveraging his extensive knowledge and skills to deliver outstanding results.
Key Topics
  • Instructions
    • Objective
  • Requirements and Specifications
Tip of the day
Use Python libraries effectively by importing only what you need. For example, if you're working with data, using libraries like pandas and numpy can save time and simplify complex tasks like data manipulation and analysis.
News
In 2024, the Biden-Harris Administration has expanded high-dosage tutoring and extended learning programs to boost academic achievement, helping programming students and others recover from pandemic-related setbacks. These initiatives are funded by federal resources aimed at improving math and literacy skills​

Instructions

Objective

Write a program to implement NYC datasets in python language.

Requirements and Specifications

Step 1 (20 pts.): Select two datasets from NYC Open Data https://opendata.cityofnewyork.us/ (Links to an external site.). Write a paragraph about how they might be related and why looking at the data might be helpful. Be sure to check the data dictionary which will explain each column/attribute. NYC School Datasets are an example of a category that are easy to merge as they share a common key. Alternative data sets could be used, but must have my approval.

Step 2 (50 pts.): Write a python assignment that cleans and merges the data. Get rid of the columns you aren't using. I expect to see the use of head() in your data where relevant, so that I can see what is going on in your dataframes. Use comments to explain how you dealt with missing data and other issues in the dataset.

Step 3 (30 pts.): Use seaborn (regression plot) or other plots/graphs to graphically illustrate the relationship between some variables in each dataset. Write another paragraph on what insights you've gained.

Upload all Juypter notebooks, along with your code saved as a .pdf. Save your notebooks with output! Otherwise, I might need to get your data from you to test your work.

.py files are not accepted as a substitute for Juypter notebooks. Any submission that is only a .py file will receive a grade of 0.

Source Code

# Datasets # Datasets ###COVID-19 Daily Counts of Cases, Hospitalizations, and DeathsHealth https://data.cityofnewyork.us/Health/COVID-19-Daily-Counts-of-Cases-Hospitalizations-an/rc75-m7u3 **Download link:** https://data.cityofnewyork.us/api/views/rc75-m7u3/rows.csv?accessType=DOWNLOAD ### Emergency Department Visits and Admissions for Influenza-like Illness and/or Pneumonia https://data.cityofnewyork.us/Health/Emergency-Department-Visits-and-Admissions-for-Inf/2nwg-uqyg **Download link:** https://data.cityofnewyork.us/api/views/2nwg-uqyg/rows.csv?accessType=DOWNLOAD The datasets mentioned above have information on the number of cases of infection and death reported in NY per day. The other dataset contains the number of visits to hospitals or ER rooms for cases of pneumonia, influenza or other similar symptoms. It is planned to verify a direct relationship between these datasets from the date the first cases of COVID-19 were reported in NY. import pandas as pd import requests import matplotlib.pyplot as plt import io import seaborn as sns ## Step 1: Download Datasets data1 = pd.read_csv('https://data.cityofnewyork.us/api/views/rc75-m7u3/rows.csv?accessType=DOWNLOAD') data1.head() data2 = pd.read_csv('https://data.cityofnewyork.us/api/views/2nwg-uqyg/rows.csv?accessType=DOWNLOAD') data2.head() # Step 2: Clean Datasets ### For first dataset, select only the columns of interest columns_of_interest = ['DATE_OF_INTEREST', 'CASE_COUNT', 'HOSPITALIZED_COUNT', 'DEATH_COUNT'] df1 = data1[columns_of_interest] ### Convert 'DATE_OF_INTEREST' to Datetime df1['DATE_OF_INTEREST'] = pd.to_datetime(df1['DATE_OF_INTEREST']) df1.head() ### Let's do the same for the second dataset columns_of_interest= ['extract_date', 'date', 'total_ed_visits', 'ili_pne_visits'] df2 = data2[columns_of_interest] df2['extract_date'] = pd.to_datetime(df2['extract_date']) df2['date'] = pd.to_datetime(df2['date']) df2.head() # Let's find the date for the first COVID-19 case reported start_date = data1.loc[0, 'DATE_OF_INTEREST'] print(start_date) ### Find all rows in second dataset from start_date to present df2 = df2[df2['date'] >= start_date] df2.head() # Step 3: Plots ### Plot the number of COVID-19 case and number of ER visits per day **NOTE: ** We will normalize (between 0 and 1) the number of cases for both datasets. This is because we only want to compare the shape of curves and not the values df1_grouped = df1.groupby(by=['DATE_OF_INTEREST']).sum().sort_values(by=['DATE_OF_INTEREST'], ascending=True) df1_grouped = (df1_grouped-df1_grouped.min())/(df1_grouped.max() - df1_grouped.min()) df1_grouped.head() df2_grouped = df2.groupby(by=['date']).sum().sort_values(by=['date'], ascending = True) df2_grouped = (df2_grouped-df2_grouped.min())/(df2_grouped.max() - df2_grouped.min()) df2_grouped.head() ### Plot plt.figure() ax = df1_grouped.plot(y = 'CASE_COUNT', label = 'COVID-19 Cases') df2_grouped.plot(y='ili_pne_visits', label = 'Hospital Visits', ax = ax) plt.legend() plt.show() We see that there is a clear correlation between the curves. At the beginning, the curves are very similar, and it is because when the pandemic began, everyone was very scared and at the first symptom (even if it was minimal) people attended the ER room. As time passed and social distancing measures began to be implemented, we see that the number of covid cases decreased as did visits to the ER. By October 2020, the number of cases increased again (second wave) and ER visits also increased, although in lesser quantity and it is because people were no longer so scared. ## Correlation maps for each dataset corr1 = data1.corr() corr2 = data2.corr() ### Dataset 1 sns.heatmap(corr1) ## Dataset 2 sns.heatmap(corr2)

Similar Samples

Explore our array of programming assignment samples at ProgrammingHomeworkHelp.com. From Java to Python, C++, and beyond, our samples illustrate effective coding solutions across various languages and topics. Each example is designed to assist students in mastering programming concepts and techniques. Dive into our samples to find inspiration and guidance for your next assignment.