TLP - Programming Maintenance
Week 3 - Extracting customers
Background
One of the things that we talked about in FCCS was various techniques of Data Mining. For example, we might want to take our Miss Tea database from last semester and figure out which customers are better or worse customers. One thing we could focus on is cluster analysis and one way to do that is to try to study whether customers from a given state, or age, or even email service, are better/worse customers. One step in that process would be to split up a large data set into some previously defined clusters and study the results (this isn't traditional cluster analysis but is a variant).
For this activity we will use the MissTea files from last semester.
Task
- Write a function called extractDomains()
- Input
- Takes in three strings
- The first is the input file which is a file with a large group of customers
- The second is the output file to be produced by this function
- The third is the domain name we are interested in identifying
- Takes in three strings
- Performance
- Go through the input file and look at the email addresses of each customer.
- If the customer's email address is from the domain provided (the third parameter) then their entire row is added to the output file.
- Output
- Nothing is returned
- But, an output file of only those customers from the domain in questions is generated.
- To run this function it would look like this
- If this was the input file
- Than this would be the output file
Hints
- I am only interested in the domain part of the email address.
- NOTICE:
- The email address for customer #2 contains "uni.edu" but NOT as part of the domain of the email address. Therefore, it is not included in the output file