Home / Expert Answers / Computer Science / a-web-scraping-program-has-been-developed-to-extract-the-data-and-save-it-in-csv-format-in-a-text-pa995

(Solved): A web scraping program has been developed to extract the data and save it in CSV format in a text ...



A web scraping program has been developed to extract the data and save it in CSV format in a
text file. Here are the first fi

A web scraping program has been developed to extract the data and save it in CSV format in a text file. Here are the first five lines of the file that contains the songs shown in the screenshot: "November 3, 2019 8:53 PM", "After A Few", "Travis Denning" "November 3, 2019 8:49 PM", "Better Life", "Keith Urban" "November 3, 2019 8:46 PM", "Mercy", "Brett Young" "November 3, 2019 8:42 PM", "What She Wants Tonight", "Luke Bryan" "November 3, 2019 8:39 PM", "Prayed For You", "Matt Stell". The files that you are given for the assignment may contain a different list of songs. If you are given multiple files, combine them into one file. There may be duplicate records. Your program will process the data file to answer the following questions: ? Breaks for commercials and other contents. After continuous play of a few songs, the station takes a break for commercials or other contents (e.g., announcements, interviews, or taking calls from listeners). Your task is to identify times at which a song is followed 1 by a break. This can be detected by examining the time pattern (e.g., a song appearing to be longer than 5 minutes most likely includes time for break). Top songs. Songs on heavy rotation will be aired multiple times on a given day. Your task is to identify these songs. ? Top artists. On a given day, most artists have one song aired during the day (even though the song may be aired several times), some artists may have more than one song aired. Your task is to use two metrics to identify top artists: (1) those who have the most distinct songs (count of distinct songs); (2) those who have the most air plays (all songs combined). Pause here for a moment and think about how you would accomplish these tasks, manually or using any tool you know. Write down the steps. The solutions MUST NOT use the pandas package; instead, use lists, tuples, and dictionaries to process the data. 1.1 Solution Approach Here is one approach (there are other approaches). To identify commercial breaks: ? Order the data by time Calculate the time between two songs to obtain "nominal" time of all but the last song If a song's nominal time is longer than 5 minutes, a break follows the song ? We choose to use 5 minutes because most songs on radio are around 3 minutes. To find the top songs, we need to count how many times each song gets aired. This is a simple task conceptually - for each distinct song, just count its occurrences in dataset. In practice, we need a way of keeping track of records, something that guarantees uniqueness (for song) and can scale up (from only a few songs to millions of songs). Then sort by count to find top songs. To find top artists, we need to keep track the air plays for each artist: which song, how many times the song is air. Then we can count how many distinct songs each artist has, and we can also count how many total airplays each artist has.


We have an Answer from Expert

View Expert Answer

Expert Answer


NOTE:- IF
We have an Answer from Expert

Buy This Answer $5

Place Order

We Provide Services Across The Globe