Database and Analytics Principles | Reliable Papers

COURSEWORK ASSESSMENT SPECIFICATION Error! Filename not specified. Module Title:Database and Analytics PrinciplesModule Number:LD7084Module Tutor Name(s):Parvez JugonAcademic Year:2020-2021% Weighting (to overall module):100%Coursework Title:AssignmentAverage Study Time Required by Student:60 study hours Dates and Mechanisms for Assessment Submission and Feedback Date of Handout to Students: During week 2 Mechanism for Handout to Students: Via Blackboard Date and Time of Submission by Student: 10th of May 2021 at 4 pm. If for personal reasons you cannot submit by that date, and you feel you have grounds for requesting an extension, you should contact nu.london@northumbria.ac.uk Mechanism for Submission of Work by Student: Electronic submission on Turnitin via Blackboard Date by which Work, Feedback and Marks will be returned to Students: Within 20 working days after submission of this assignment Mechanism for return of assignment work, feedback and marks to students: Formal feedback will take place following completion of all reviews and internal moderation of results. Assessment Brief Learning Outcomes Assessed in this assessment This assignment will assess the following learning outcomes: A critical knowledge and understanding of data and data analytics theories and methodologiesAbility to apply data analytics skills and knowledge to a complex business challenger projectAbility to evaluate your data analytics skills.Critical reflection of your implementation, recommendations for improvement and the subsequent implications for your learning development in data and data analytics domain. General Information This assignment constitutes 100% towards the final mark for this module. Any queries relating to this assignment should be discussed with module tutor: Parvez Jugon (parvez.jugon@northumbria.ac.uk) Type of the submission required This is an INDIVIDUAL piece of work contributing towards the module assessment. Deliverables should be assembled into a single report document, which includes (critical appraisal, research, and snapshot evidences of tasks carried out and justification of technologies used). Submission will be in the form of a MS word report (4000 words). Scenario WeDeliver is an online delivery company, which is looking to develop an effective Relational Database Management System (RDBMS) and data warehouse to cater the needs of their growing business. As a data analyst, you have been asked to develop a required system for WeDeliver. Which should satisfy the following information requirements: • The database should contain information about Customers, Items, Restaurants, Orders, Drivers and their Vehicle. For payroll, the NI number of the drivers is recorded. For each customer, the database should store the CustomerID, Last Name, First Name, Email, and the Phone Number.For each driver, their name, salary, email address and their manager are recorded as well details of their Driving License such as Driving License Number, Issue Date, country of issue and Expiry Date.Each driver is assigned a motorbike when they start with the company and they normally keep it during the duration of their contract.Details of the motorbike areEach Manager manages at least one driver and each driver is associated with one restaurant only but one restaurant employs many drivers.For each of the Restaurant, the RestaurantID, Restaurant Name, Address are recorded.For each Item, the ItemID, the Item Name and Item Price should be recorded. The items are divided into four categories such as Starter, Main Course, Deserts and Drinks. Prices for each item may vary in different branches.For example, theFor each order, it is required to store the OrderID, Orderdate, items and the quantity that have been ordered. A customer must order at least one item per order.A customer can have one or more orders from the same or different restaurants.A driver can deliver more than one order but one driver delivers a particular ordernly.registered such as RegistrationNo, Colour, date of purchase, engine size etc.Pizza Hut Edmonton Branch sells Garlic Bread at £4.50 but the Liverpool Street Branch charges £5.00 for the same item. Relevant assumptions can be made, if required. Section 1 – Database (30%) (Learning Outcome 1) Using Crow’s foot notation, design a logical data model (ERD) for WeDeliver database in a modelling tool of your choice. All entity types, attributes and the relationships between them must be labelled clearly. Your design should show all cardinality and participation constraints accurately. Discuss and justify your design choices in detail. [10 marks](a) Produce a script using appropriate SQL commands (DDL) to create a set of database tables using MySQL relational database. Your tables should also show all constraints applied at either the column or the table level. You are expected to implement the database so that it must meets the requirements described in the scenario. [5 marks]By using appropriate SQL statements (DML), populate all the tables in the database you created with some data (At least 10 records in each table). The data should be meaningful but does not need to be extensive.marks]As a Data Analyst, produce 5 DML statements that will provide valuable insights for WeDeliver. Your DML statements should contain at least one of the following: A statement involving a self-join.A statement involving an equi-join.A statement involving at least one group functionA statement involving at least one subquery.A statement involving null values. [10 marks] Section 2 – Data warehouse (15%) (Learning Outcome 2 & 3) 1. Data Warehouse Modelling The management of WeDeliver wants to gain competitive advantages over competitors such as Just Eat or DeliveryRoo and proposed to invest in latest tools and technologies One of the that will enable them to make strategic as well as management decision. suggestions is to build a Data Warehouse. Using Kimball’s four step dimensional design process, draw/design either a star schema or a snowflake schema, which will help the management understand Sales pattern across the different restaurant and Cities in the UK . Using the concept of a Data Cube, show all possible combination of Sales insight that can be derived from the Warehouse. Justify your Modelling choice. Section 3 – Data Mining (15%) (Learning Outcome 3 & 4) Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide . Heart failure is a common event caused by CVDs and addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies can prevent most cardiovascular diseases. You have been hired as a Data Scientist by ABC Hospital and provided with a dataset (Appendix 1) that contains 12 features that can be used to predict mortality by heart failure. According to IBM, Data Scientists spend 70% of their time in the Data Preparation stage. Using the CRISP Modelling life cycle and the dataset provided, state what types of data preparation would be required in the Data Preparation stage and state the importance of good data preparation.Based on the dataset provided, explain what types of Machine Learning algorithm you would require to predict mortality by heart failure. Please state at least two machine learning algorithm and justify your choice. Section 4 – Business Intelligence (Tableau) (20%) (Learning Outcome 1, 2 & 3) You are hired as a Data Analyst by one of the leading supermarket in the US and provided with the Dataset in Appendix 2. You must use Tableau to analyse the data set provided and find answers to the following: Using Tableau, explain what data integrity checks you will perform on each of the column in the Dataset. [2 marks]Given the item ordered comes with 60 days warranty from the date it was ordered. Use Tableau to calculate the Warranty End date. [2 marks]Using an appropriate chart, display the total sales made in each City. Order the result by the City with the highest sales. [2 marks]Using Tableau, display all cities that generated more than $3,000,000 for the year[2 marks]Using Tableau, display the top 5 selling products for the year 2019. Your chart should also show the quantity of products sold. [2 marks]Using Tableau, display the total sales made in each month for the year 2019. [3 marks]Given that the profit on each item is set at 30%. Use an appropriate chart to display the total sales and total profit made by each product for the year 2019. [3 marks]Using tableau, find the maximum sales generated in Boston, Los Angeles, Seattle and New York. [3 marks]Build a dashboard to display the results obtained in Number 3, 4, 5 and 6. [1 mark] For question 3 – 8, each answer should be on a separate worksheet with a chart, graph or table that shows your finding. Section 5 – NOSQL Database (20%) (Learning Outcome 4) You have been hired to develop a database for a start-up company. Your manager must decide between using a Relation Database or a NOSQL database. You are required to write a report to explain the factors that will influence your choice of database and why you will be recommending one type of database over the other. Assumptions can be made about the type of Start-up Company and the data they will be dealing with. ———————- Grading Guidance Distinction: Excellent in-depth application and critical research on the processes and user requirements. Provide in-depth knowledge of how to design and evaluate a complete Database, data warehouse for a given scenario. Excellent in-depth understanding and demonstration of data analytics using tableau. Commendation: Shows good research on the processes and user requirements. Provides good knowledge of how to design and Database, data warehouse for a given scenario. Shows good understanding and evaluation of data warehouse, data mining and Hadoop. Pass: Provides basic understanding of the deliverables. Provides end-to-end design and all requirements are met. Report has some errors and lacks adequate explanation. The robustness and correctness of Database, data warehouse and data analytics are not explained thoroughly. Evidence of design is shown but inadequate explanation using tableau. Fail: Provides incomplete attempt or lacks substantial parts of the deliverables. Fails to demonstrate understanding of the concepts required to implement deliverables. Work lacks serious clarity and detail. There are several errors in the report. Academic Integrity Statement: You must adhere to the university regulations on academic conduct. Formal inquiry proceedings will be instigated if there is any suspicion of plagiarism or any other form of misconduct in your work. Refer to the University’s Assessment Regulations for Northumbria Awards if you are unclear as to the meaning of these terms. The latest copy is available on the University website. https://northumbria-cdn.azureedge.net/-/media/corporate-website/new-sitecore gallery/services/academic-registry/documents/qte/assessment/guidance-for-students/pl,-d-,005 v004-academic-misconduct policy.pdf?la=en&modified=20200117152110&hash=5631FCFB09A073D1BEE7BD157FAB19313 A10268A#:~:text=1%20Academic%20Integrity, 1.1%20Every%20student&text=1.2%20In%20all%20assessed%20work,advantage%20in%20any %20other%20way. (last accessed on 25th August 2020) Formative Feedback There will be an opportunity for formative feedback during the semester. You are advised to start working on this assignment as early as possible so that you can seek clarification from the module tutor regarding any questions you might have during the semester. Note that tutors will not predict your grade, and you should not take the lack of comment on any aspect of your work as indicating that it is correct. You should make every effort to take advantage of formative feedback as tutors will not comment on draft work at other times. Remember that you will get more useful feedback from us by asking specific questions than just presenting us with your documentation and asking, ‘Is this right?’ Penalties for Exceeding Word Limits: The following penalties will be applied after any reductions in mark due to late submission have been made, Penalties will be applied as defined in the University Policy on Word Limits Policy. https://northumbria-cdn.azureedge.net/-/media/corporate-website/new sitecore-gallery/services/academic registry/documents/qte/assessment/guidance-for-students/pl013-v002-word limits-policy.pdf?modified=20200803200335. (last accessed on 25th August 2020) The actual word count is to be declared on the front of the assessment submission. Error! Filename not specified. Late Submission Policy: For coursework submitted up to 1 working day (24 hours) after the published hand-in deadline without approval, 10% of the total marks available for the assessment (i.e.100%) shall be deducted from the assessment mark. Penalties will be applied as defined in the University Policy on the Late submission work. https://northumbria-cdn.azureedge.net/-/media/corporate-website/new-sitecore gallery/services/academic-registry/documents/qte/assessment/guidance-for-students/pl,-d-,008 v004-late-submission-of-work-and-extension-requests-policy.pdf?modified=20200803152930 (last accessed on 25th August 2020) For clarity: a late piece of work that would have scored 65%, 55% or 45% had it been handed in on time will be awarded 55%, 45% or 35% respectively as 10% of the total available marks will have been deducted. Failure to submit: The University requires all students to submit assessed coursework by the deadline stated in the assessment brief. Where coursework is submitted without approval after the published hand-in deadline, penalties will be applied as defined in the University Policy on the Late Submission of Work. https://northumbria-cdn.azureedge.net/-/media/corporate-website/new-sitecore gallery/services/academic-registry/documents/qte/assessment/guidance-for-students/pl,-d-,008 v004-late-submission-of-work-and-extension-requests-policy.pdf?modified=20200803152930 (last accessed on 25th August 2020)