Description
File to submit:
A report in pdf format, which must include following contents:
(1) Design an entity relationship (ER) diagram for the TPC-H datasets, as described in Part I.
(2) Design an entity relationship (ER) diagram for the IMDB datasets, as described in Part II.
Important Reminders:
- The solution of this assignment should be submitted to GradeScope;
- Plagiarism: Homework must be completed individually and independently;
- Typeset all your answers whenever possible. Illegible handwriting may get zero points, at the discretion of the graders.
- Late submission is not allowed unless you can provide proof for a university-approved excuse.
Part 1. Design the entity relationship (ER) diagram for the TPC-H datasets as described in Assignment 1 (20%).
You only need to involve following tables:
PART
SUPPLIER
PARTSUPP
NATION
REGION
Part 2. Design the entity relationship (ER) diagram for the IMDB datasets (80%):
NOTE: This is different with the example we introduced in the class, although both are related with movies.
International Movie Database (IMDB: http://www.imdb.com (Links to an external site.)) wants to store information about movies
Usually this process involve two steps. This Part focus on the first conceptual ER design step.
Conceptual Database Design: High level description of data to be stored (ER model)
Logical Database Design: Translation of ER diagram to a relational database schema (description of tables)
Requirements of Part 2 Task (You can also find the complete information here: https://www.imdb.com/interfaces/ I (Links to an external site.) removed some information to simplify the task):
1. A Title has following information
- titleId (string) – a tconst, an alphanumeric unique identifier of the title
- title (string) the localized title
- region (string) – the region for this version of the title
- language (string) – the language of the title
- types (array) – Enumerated set of attributes for this alternative title. One or more of the following: “alternative”, “dvd”, “festival”, “tv”, “video”, “working”, “original”, “imdbDisplay”. New values may be added in the future without warning
- isAdult (boolean) – 0: non-adult title; 1: adult title
- startYear (YYYY) represents the release year of a title. In the case of TV Series, it is the series start year
- endYear (YYYY) TV Series end year. N for all other title types
- runtimeMinutes primary runtime of the title, in minutes
- genres (string array) includes up to three genres associated with the title
2. CREW (An actor or writer) has following information:
- tconst (string) – alphanumeric unique identifier of the title
- directors (array of nconsts) – director(s) of the given title
- writers (array of nconsts) writer(s) of the given title
3. An EPISODE has following information for the TV episode:
- tconst (string) – alphanumeric identifier of episode
- parentTconst (string) – alphanumeric identifier of the parent TV Series
- seasonNumber (integer) season number the episode belongs to
- episodeNumber (integer) episode number of the tconst in the TV series
4. A PRINCIPAL has following information for the principal cast/crew:
- tconst (string) – alphanumeric unique identifier of the title
- nconst (string) – alphanumeric unique identifier of the name/person
- category (string) – the category of job that person was in
- job (string) – the specific job title if applicable, else ‘N’
- characters (string) – the name of the character played if applicable, else ‘N’
5. A RATING has following information for the IMDb rating and votes information :
- tconst (string) – alphanumeric unique identifier of the title
- averageRating weighted average of all the individual user ratings
- numVotes – number of votes the title has received
6. A Name has following information for a person:
- nconst (string) – alphanumeric unique identifier of the name/person
- primaryName (string) name by which the person is most often credited
- birthYear in YYYY format
- deathYear in YYYY format if applicable, else ‘N’
- primaryProfession (array of strings) the top-3 professions of the person
- knownForTitles (array of tconsts) titles the person is known for
Remark 1: You are free to use any software, such as PowerPoint, to draw figures as long as you follow the format we discussed in class.
Remark 2: You can find some helpful explanation from https://www.imdb.com/interfaces/ (Links to an external site.), but you need to follow the instruction of this assignment.