+1(978)310-4246 credencewriters@gmail.com
  

DescriptionMidterm: Part 1: Theory
Harvesting, Storing, and Retrieving Data
Midterm Assessment
TO BE SUBMITTED BY FEB 13 11:59PM US TIMINGS
Overview
• The midterm assessment includes all the topics that have been discussed in the first half
of the course.
• The materials in any format that have been posted for the class activities should be
considered and used for the assessment.
• IMPORTANT NOTES:
o The student should present his/her work for each section using both text and
images.
o The sources can be from class lectures, assignments, etc., or from other sources.
o One picture is worth 1000 words. It would be much more convincing to add one
or more images along with the text. However, an image without text explaining
what it is and what it is for is considered incomplete.
o In the Theory section, describe each question in your own words. You are not
required to write a research paper of dozens of pages. However, it is expected
that you provide an in-depth discussion. For example, the student is asked to
discuss the following topic: Discuss the differences between A and B. To start
working on the topic, the student has to make an assumption that the audience
(readers) has no idea about what is A or what is B. Please provide a discussion
that shows all the relevant details or information that make the discussion
complete.
o Turnitin will be used for Part 1: Theory. If you have a similarity above 15%, you
should make corrections and resubmit. Just know, you have to wait 24 hours to
get another report for the work you resubmitted.
o Do not include the questions without putting them in quotation marks (see
video), as this will count toward the Turnitin score.
o Images should include the screenshots for the practical sections.
Theory: PART I: Fundamental Concepts of Data Engineering, Data Structure, Big Data, and
Storage
• Question 1 (15 Points). Discuss different types of data– structured, semi-structured, and
unstructured focusing on:
o The differences among the three
o The type of storage (block, object, OLTP, or OLAP) for each and the differences
between each type of storage
o What service in GCP is used for each and what are the pro and cons of each (The
services are what you worked on within your homework assignments.)


Question 2 (15 Points). Discuss the key differences between IaaS, PaaS, and SaaS
Question 3 (15 Points). Explain the differences between Standard, Nearline, Coldline,
and Archive storage in Cloud Storage.
o The differences among the four
o How often is this type of storage accessed?
o Cost: in terms of most expensive to least expensive
• Question 4 (15 Points). Discuss the concepts of relational databases and non-relational
databases by comparing and contrasting the two.
• Question 5 (15 Points). Name and discuss/define the 5 V’s of Big Data. Please include
the 5 V’s we spoke about in class. There are many V’s but I would like you to discuss the
ones we spoke of in class.
• Question 6 (15 Points). Discuss OLAP and OLTP.
Submission: Please submit all work for Part I and Part II in their appropriate places. Part I
should be submitted here. Submit all work for Part I in one document (word or pdf). If you
have any problems submitting, please email the document to the professor or TA.
Introduction to Big Data
Harvesting, Storing, and Retrieving Data
Why are we here?
• What are the five Vs of Big Data?
• How do they help define Big Data??
• What are structured and unstructured data?
• How are they different?
• Why do we care?
What is Big Data
Volume
The 5 V’s
of Big
Data
Value
Veracity
BIG
DATA
Velocity
Variety
Volume
The 5 V’s
of Big
Data
BIG
DATA
Volume
The 5 V’s
of Big
Data
terabyte (TB) 10*12
petabyte (PB) 10*15
exabyte (EB) 10*18
zettabyte (ZB) 10*21
yottabyte (YB) 10*24
The 5 V’s
of Big
Data
BIG
DATA
Velocity
The 5 V’s
of Big
Data
BIG
DATA
Variety
The 5 V’s
of Big
Data
BIG
DATA
Veracity
The 5 V’s
of Big
Data
Value
BIG
DATA
Types of Big Data:
Structured, Semi-structured, Unstructured Data
Structured Data
Structured
Data:
Relational
Database
Semi-Structured Data
SemiStructured
Data: XML
Document
HTML
Tom
John
Reminder
lunch today?
JSON
{“name”: ”John”, “age”: 30, “car”:
GM”}
Unstructured Data
Assignment

Purchase answer to see full
attachment

error: Content is protected !!