TLP - Programming Maintenance
Week 14 - Calculating the Flesch-Kinkaid Reading Level
Background
One of the most common formulas to calculate the reading level of text is the Flesch-Kinkaid reading level. On the surface it's a very simple formula:
If you don't get bogged down in the details this actually makes a lot of sense. The first set of parenthesis calculates the length of the average sentence. Longer sentences are harder to read and process. The second set of parentheses calculate the number of syllables in the average word. Words with more syllables are harder to read.
Sample Texts:
Remember that in order to download these files you may need to right click on the link below and choose "save as"
- dummy.txt (silly file I created to have a low score)
- hemingway.txt
- warAndPeace.txt (long, but not really that hard by the formula's standards)
- prideAndPrejudice.txt
Task
Create a function called: readingLevel()
This function should:
- Take in one parameter:
- the name of the text file
- Actions:
- Read in the entire book in to one large string (you can do this one line at a time, but it is likely easier to do it all at once).
- Use the encoding parameter we have talked about to make sure that a stray special character isn't causing problems.
- In my previous work I used "utf-8" but we have discovered that "latin1" seems to be more reliable.
- Calculate and print the three values for your book:
- Sentences
- For the sake of we will use any use of periods, question marks, or exclamation points marks a new sentence
- Note, this is actually an over simplification. This means that books with lots of "Mr." and "Mrs." will have their scores reduced because the number of sentences will go up quite a bit.
- You MAY elect to try to fix this by detecting those kinds of uses of a period and not counting them but this isn't required. The example numbers given below do not do this.
- Words
- This one isn't too bad.
- The biggest challenge is cleaning up punctuation that MIGHT cause problems.
- But be careful that your code doesn't count the zero length string as a word for this or the next feature.
- Syllable
- You likely wrote some code for this in week 3.
- The warning I will make about this one is that you have to remove punctuation that might have ended up as part of your word. This will change the syllable count due to not recognizing an "e" at the end of the word.
- For example "time." should count as one syllable but if you don't remove the period your code will likely count it as two.
- Sentences
- Calculate and print the reading level of the book.
- For the sake of presentation, let's round to one decimal point
- Read in the entire book in to one large string (you can do this one line at a time, but it is likely easier to do it all at once).
- Returns:
- Nothing. This should all print.
- Special Note:
- Because this function prints rather than returns, and because very small differences in how we handle certain characters can change the output by fractions, I will be collecting this on Autolab and then hand scoring it based on my observation of whether your values are "close enough."
- This might bother you, because shouldn't there be a "right" answer? And my answer is that lanugage is JUST fluid enough that tiny interpretation issues can skew this.
- I think that there are right answers, but understand where small differences are valid.
- You should look closely at what is printed by your program and see if they are very close to mine.
- If in doubt, don't hesitate to ask.
- Because this function prints rather than returns, and because very small differences in how we handle certain characters can change the output by fractions, I will be collecting this on Autolab and then hand scoring it based on my observation of whether your values are "close enough."
- Hints
- Even though it appears that I have done this in one function - readingLevel() - I have actually done this with a whole bunch of helper functions. I won't tell you which ones you should write, but I strongly encourage you to write some functions to help with organization.
- I have tested my code pretty thoroughly. If you get significantly higher values for the printouts given below this suggests that you aren't handling all of the punctuation properly.
- Having said that, I may have missed some cases so if you get slightly lower values it may be because of a special char I missed.
- Example runs: