create a Python program that reads in data from your dataset from homework 3 (you may use a different data set if the dataset from homework 3 didn't have at least 6 columns or didn't have at least one column with categorical variables) and creates a Pandas DataFrame and then cleans and prepares the data in the DataFrame. Your program should include the following:
My code so far:
import pandas as pd import csv import numpy as np #create a pandas dataframe from the data in the taxi_1000 Underscores.csv file dataframe=pd.read_csv("C:/Users/wstev/Desktop/INFS 791/week 4/HW5/Divvy_Trips_2020_Q1.csv") print(dataframe.columns) #ruduced number of columns to 5 specific_columns=dataframe[["ride_id", "start_station_name", "start_station_id", "end_station_name", "end_station_id"]] print("The column is: ", specific_columns) #.count() method to report the number of rows in the DataFrame #and remove some rows from the DataFrame print("The number of rows are: ", dataframe.count()) Divvy=dataframe.drop(dataframe[dataframe.start_station_id>13].index, inplace=True) print (dataframe) #Use the .replace() method to change some values in your DataFrame as you deem appropriate dataframe['start_station_id']=dataframe['start_station_id'].replace( to_replace=['13'], value='lucky number 13') print (dataframe) #Use of the .query() method to report the rows in the DataFrame that satisfy some user-specified criteria dataframe.query('start_station_id=="13"').head() print (dataframe)