Home / Expert Answers / Computer Science / create-a-python-program-nbsp-that-reads-in-nbsp-data-from-your-dataset-nbsp-from-homework-3-nbsp-yo-pa793

(Solved): create a Python program that reads in data from your dataset from homework 3 (yo ...



create a Python program that reads in data from your dataset from homework 3 (you may use a different data set if the dataset from homework 3 didn't have at least 6 columns or didn't have at least one column with categorical variables) and creates a Pandas DataFrame and then cleans and prepares the data in the DataFrame.  Your program should include the following:

  • Comments to explain what is happening at each step as well as one in the beginning of your code that has your name and the date the code was created and/or last modified.
  • The creation of a DataFrame that stores some (or all) of the data in your dataset.
  • Use of the .columns property to report the names of the columns in the DataFrame.  Next reduce the number of columns in your DataFrame to five specific columns, with at least one of them containing categorical data.
  • Use of the .count() method to report the number of rows in the DataFrame. Next remove some rows from the DataFrame according to some criteria that you feel is appropriate (i.e. remove rows that have the value 0 in a given column, or values greater than a specified amount, etc.).  Then, report the number of rows in the DataFrame after you removed the rows.
  • Use the .replace() method to change some values in your DataFrame as you deem appropriate.  Examples could be updating a salary amount using a specific raise percentage, or replacing missing values with the value of 0, etc.
  • Use of the .query() method to report the rows in the DataFrame that satisfy some user-specified criteria.  For this, prompt the user to enter in some information and report the results that correspond to their entry.
  • Use the .get_dummies() method to create some dummy(indicator) variables for one of the columns in your DataFrame that has categorical values and save those columns with your DataFrame.
  • Use of the .head() method to report the first 7 rows in the DataFrame
  • Use of the .tail() method to report the last 4 rows in the DataFrame
  • Save your DataFrame to a new CSV file.

 

My code so far:

import pandas as pd
import csv
import numpy as np

#create a pandas dataframe from the data in the taxi_1000 Underscores.csv file
dataframe=pd.read_csv("C:/Users/wstev/Desktop/INFS 791/week 4/HW5/Divvy_Trips_2020_Q1.csv")
print(dataframe.columns)

#ruduced number of columns to 5
specific_columns=dataframe[["ride_id", "start_station_name", "start_station_id",
                            "end_station_name", "end_station_id"]]
print("The column is: ", specific_columns)

#.count() method to report the number of rows in the DataFrame
#and remove some rows from the DataFrame
print("The number of rows are: ", dataframe.count())
Divvy=dataframe.drop(dataframe[dataframe.start_station_id>13].index, inplace=True)
print (dataframe)

#Use the .replace() method to change some values in your DataFrame as you deem appropriate
dataframe['start_station_id']=dataframe['start_station_id'].replace(
        to_replace=['13'],
        value='lucky number 13')
print (dataframe)

#Use of the .query() method to report the rows in the DataFrame that satisfy some user-specified criteria
dataframe.query('start_station_id=="13"').head()
print (dataframe)


We have an Answer from Expert

View Expert Answer

Expert Answer


How to clean data? Data cleaning puts data into the right shape and quality for analysis. It includes many different steps, for example: Basics (selec
We have an Answer from Expert

Buy This Answer $5

Place Order

We Provide Services Across The Globe