Assume that the complexity of a story (or an article) is determined by the sum of:
Percentage of words with 8 or more letters
Percentage of sentences with more than 40 words
This is the beginning of the 2nd part.
Develop a function Clevel(story) that takes a file name of a story as a parameter and returns the complexity score. Now suppose the similarity between two articles is defined by the overlap among words. To exclude most common words, we will only consider words with less than 20 occurrences. If the number of words in this range is N1 in story 1 and N2 in story 2, and S of those are common, then the similarity score is (2S/(N1+N2))*100
Develop a function Simscore(story1, story2) that returns the similarity score between the filenames for the two books as parameters.
Data file: JackAndJill.txt
Jack and Jill went up the hill, To fetch a pail of water. Jack fell down and broke his crown, And Jill came tumbling after
Data file: Investing-Coy.txt
The Best 50-Year-Old Investing Advice Money Can BuNov. 21, 2022, 4:15 p.m. ECredit...Illustration by The New York Time
Data file: Investing-WallaceWells.txt
OPINIOWhat Stage of Capitalism Is Sam Bankman-FriedNov. 21, 2022, 3:00 p.m. EBy David Wallace-WellOpinion WriteYo're
Data file: Politics-Bouie.txt
OPINIOJAMELLE BOUIThere Is a Way to Break Out of Our Constitutional StagnatioNov. 18, 202By Jamelle BouiOpinion ColumnisSign up for the Opinion Today newsletter Get expert analysis of the news and a guide to the big ideas shaping
Data file: Politics-Littlefield.txt
GUEST ESSADemocrats Need to Realize How Much Dobbs MattereNov. 19, 202By Amy LittlefielMs. Littlefield is the abortion access correspondent for The NationSign up for the Opinion Today newsletter