School of Physics, Engineering and Computer SciencePage 1 of 7Assignment Briefing Sheet (2020/21 Academic Year)Section A: Assignment title, important dates and weighting Assignment title:Flexible REF/DEFGroup orindividual:Individual Module title:Data MiningModulecode:7COM1018 Module leader:Paul MoggridgeModerator’sinitials:WJ Submissiondeadline:18th June 202117:00Target date for return ofmarked assignment:5th July 2021 You are expected to spend about40hours to complete this assignment to asatisfactory standard. This assignment is worth40%of the overall assessment for this module. Section B: Student(s) to complete Student ID numberYear CodeNOT NEEDED FOR ONLINE SUBMISSION Notes for students• For undergraduate modules, a score above 40% represent a pass performance at honours level.• For postgraduate modules, a score of 50% or above represents a pass mark.• Late submission of any item of coursework for each day or part thereof (or for hard copy submissiononly, working day or part thereof) for up to five days after the published deadline, coursework relatingto modules at Levels 0, 4, 5, 6 submitted late (including deferred coursework, but with the exception ofreferred coursework), will have the numeric grade reduced by 10 grade points until or unless thenumeric grade reaches or is 40. Where the numeric grade awarded for the assessment is less than40, no lateness penalty will be applied.• Late submission of referred coursework will automatically be awarded a grade of zero (0).• Coursework (including deferred coursework) submitted later than five days (five working days in thecase of hard copy submission) after the published deadline will be awarded a grade of zero (0).• Regulations governing assessment offences including Plagiarism and Collusion are available fromhttps://www.herts.ac.uk/about-us/governance/university-policies-and-regulations-uprs/uprs (pleaserefer to UPR AS14)• Guidance on avoiding plagiarism can be found here:https://herts.instructure.com/courses/61421/pages/referencing-avoidingplagiarism?module_item_id=779436• Modules may have several components of assessment and may require a pass in all elements. Forfurther details, please consult the relevant Module Handbook (available on Studynet/Canvas, underModule Information) or ask the Module Leader. School of Physics, Engineering and Computer SciencePage 2 of 7Assignment Briefing Sheet (2020/21 Academic Year) This Assignment assesses the following module Learning Outcomes (from Definitive ModuleDocument):Successful students will typically:2. be able to appreciate the strengths and limitations of various data mining models.3. be able to critically evaluate, articulate and utilise a range of techniques for designingdata mining systems.4. be able to understand and reflect on the underlying ethical and legal issues and constraintson the holding and the use of data;5. be able to critically evaluate different algorithms and models of data mining.Assignment Brief:In the workplace, you have been assigned to a new project, “recognizing supermarket purchase patterns”.At your next meeting with management, you have been asked to explain how the FP Tree (AssociationMining) works.Your response must include:1. A technical explanation, articulating how the algorithm works, showing how to work out thealgorithm example by hand, using your own small example (14 marks)2. Comments on the strength and limitations of the algorithm (8 marks)3. Critically evaluate the algorithm for your given use case and compare with other similaralgorithms and use-cases in research, the papers should be referenced, how you do this yourchoice (10 marks)4. Describe and reflect on the ethical considerations for using this algorithm, for example couldthe algorithm produce bias results; how would this happen? (8 marks)In summary, the assignment is not to complete a data science project. Your task is to create a piece ofwork explaining an algorithm (for example a video) while considering the example of using it forrecognizing supermarket purchase patterns.The flexibility is in the type of response, (report/video), the intention to allow you to perform at your best.In summary, your task is to explain how the FP Tree data mining algorithms works and comment on itsfitness for “recognizing supermarket purchasing patterns”.Submission Requirements:You may choose from the below on how you respond to this assignment,• Video featuring a whiteboard / drawing app / pen and paper / PowerPoint (max. 16 minutes)• Voiced over PowerPoint (max. 16 minutes)• Large Poster with an Audio Recording (max. 16 minutes)• Technical Document (max. 1700 words)All length limits are flexible (+/- 10% and do not include figures, captions, and references). There are nomarks for production quality although we kindly ask that make sure the video and audio quality is fit forpurpose, (standard built in webcam and microphones should be suitable). For advise please speak to themodule leader. The videos or documents are intended for a professional environment. Accepted formatsfor videos: mp4, webm, flv, mkv, avi, mov and wmv. Accepted formats for voiced over PowerPoints: pptx. School of Physics, Engineering and Computer SciencePage 3 of 7 Accepted formats voice over if separate to PowerPoint: mp3, wav, ogg, aac, wma and m4a. Acceptedformats for posters and documents: pdf, docx, odt, png and svg. Referencing format is flexible, when usinga video, references can appear on screen or be spoken either will be accepted (please identify the title,author, and the year).Marks awarded for:This assignment is worth 40% of the overall assessment for this module.Marks will be awarded out of 40 in the proportion:See marking scheme below.A reminder that all work should be your own.Videos/reports exceeding the maximum length may not be marked beyond length limit.Type of Feedback to be given for this assignment:Along with the marks, each student will receive individual written feedback on the online platform. School of Physics, Engineering and Computer SciencePage 4 of 7Mark Scheme:1.1 Explanation Quality / Algorithm Understanding Assessment element01-34-67-1011-14A technical explanation,articulating how the algorithmworks, showing how to work outdifferent parts of the algorithmexample by hand (14 marks)No discernable attempt at thiselement.Little/some understandingshown of the chosenalgorithm.Good high-level understandingshown of the chosen algorithm.Very good understandingshown of the chosen algorithm.Excellent understanding shownof the chosen algorithm.Some steps of the algorithm areexplained. With somecalculations shown.All steps of the algorithm areexplained. Most calculationsshown.All steps are fully explained,demonstrating all calculationsthat need to occur at each step.Limited use of visual aids (plots,tables, graphics) forexplanation.Appropriate visual aids (plots,tables, graphics) have beenused thought the explanation.Creative visual aids have beenused to articulate concisely howeach step works. This can behand drawn or digital.The original source of thealgorithm has been referenced.The original source of thealgorithm has been referencedand recent research using thealgorithm has been cited.Broad knowledge isdemonstrated for exampleexplaining how a step is likesteps taken in other algorithms.Edge cases and/or challenginginput shown. Demonstratingwhere the algorithm would failor be less accurate. School of Physics, Engineering and Computer SciencePage 5 of 71.2 Knowledge of Strength and Limitations Assessment element01-23-45-8Comments on the strength andlimitations of the algorithm (6marks)No discernable attempt at thiselementThe one or two commonly knownstrength and limitations of the chosenalgorithm have been identified.Three strength and limitations of thechosen algorithm have beendescribed.Four strengths and limitations of thechosen algorithm have beenanalyzed.Time and space requirements of thealgorithm are briefly mentioned.Time and space requirements of thealgorithm are analyzed. Big Onotation is mentioned.Artificial illustrative dataset is used tohighlight strengths and limitations.Artificial illustrative dataset and isused to highlight strengths andlimitations. Real world datasets arereferenced regarding strengths andlimitations too.Updates and modifications toalgorithms are discussed and recentresearch papers are cited. School of Physics, Engineering and Computer SciencePage 6 of 71.3 Evaluation / Comparing performance of algorithms / datasets Assessment element01-56-10Critically evaluate the algorithm for your usecase and compare with other similar algorithms(5 marks)No discernable attempt at this elementOne or two similar algorithms have beenidentified.Three similar algorithms have been identified.The strengths and limitations of the similaralgorithms has been identified in comparison tothe chosen algorithm.The strengths and limitations of the similaralgorithms has been identified in comparison tothe chosen algorithm and compared in relationto the challenges in the proposed project.Academic sources (journal and conferencepapers) have been referenced to criticallyevaluate the suitability of the algorithms for theproposed project. I.e. a paper using thesame/similar algorithm on a similar use case. School of Physics, Engineering and Computer SciencePage 7 of 71.4 Describing ethical Issues Assessment element01-34-8Describe and reflect the ethical considerationsfor using this algorithm, could the algorithmproduce bias results, how would this happen? (5marks)No discernable attempt at this elementAn ethical issue is raised.More than one ethical issue is provided.The ethical issues could apply to the algorithmselected and how the algorithm would behavedifferent has be briefly reflected on.How the issue would manifest itself into themodel produced by the algorithm is explained,technical terminology is used.Methods (likely preprocessing methods) to avoidthe ethical issue are identified.
