Thursday, November 21, 2013

BT0066 – Database Management, BSC IT 1st SEM, Assignment


1. What are relationships? Explain the relationship types.
Ans: When creating a database, common sense dictates that we use separate tables for different types of entities. Some examples are: customers, orders, items, messages etc… But we also need to have relationships between these tables. For instance, customers make orders, and orders contain items. These relationships need to be represented in the database. Also, when fetching data with SQL, we need to use certain types of JOIN queries to get what we need.
There are several types of database relationships. Today we are going to cover the following:
  • One to One Relationships
  • One to Many and Many to One Relationships
  • Many to Many Relationships
  • Self Referencing Relationships
When selecting data from multiple tables with relationships, we will be using the JOIN query. There are several types of JOIN’s, and we are going to learn about the the following:
  • Cross Joins
  • Natural Joins
  • Inner Joins
  • Left (Outer) Joins
  • Right (Outer) Joins
We will also learn about the ON clause and the USING clause.
One to One Relationships
Let’s say you have a table for customers:
We can put the customer address information on a separate table:
Now we have a relationship between the Customers table and the Addresses table. If each address can belong to only one customer, this relationship is “One to One”. Keep in mind that this kind of relationship is not very common. Our initial table that included the address along with the customer could have worked fine in most cases.
Notice that now there is a field named “address_id” in the Customers table, that refers to the matching record in the Address table. This is called a “Foreign Key” and it is used for all kinds of database relationships. We will cover this subject later in the article.
We can visualize the relationship between the customer and address records like this:
Note that the existence of a relationship can be optional, like having a customer record that has no related address record.

One to Many and Many to One Relationships
This is the most commonly used type of relationship. Consider an e-commerce website, with the following:
  • Customers can make many orders.
  • Orders can contain many items.
  • Items can have descriptions in many languages.
In these cases we would need to create “One to Many” relationships. Here is an example:
Each customer may have zero, one or multiple orders. But an order can belong to only one customer.
Many to Many Relationships
In some cases, you may need multiple instances on both sides of the relationship. For example, each order can contain multiple items. And each item can also be in multiple orders.
For these relationships, we need to create an extra table:
The Items_Orders table has only one purpose, and that is to create a “Many to Many” relationship between the items and the orders.
Here is a how we can visualize this kind of relationship:
If you want to include the items_orders records in the graph, it may look like this:
Self Referencing Relationships
This is used when a table needs to have a relationship with itself. For example, let’s say you have a referral program. Customers can refer other customers to your shopping website. The table may look like this:
Customers 102 and 103 were referred by the customer 101.
This actually can also be similar to “one to many” relationship since one customer can refer multiple customers. Also it can be visualized like a tree structure:
One customer might refer zero, one or multiple customers. Each customer can be referred by only one customer, or none at all.
If you would like to create a self referencing “many to many” relationship, you would need an extra table like just like we talked about in the last section.
Foreign Keys
So far we have only learned about some of the concepts. Now it is time to bring them to life using SQL. For this part, we need to understand what Foreign Keys are.
In the relationship examples above, we always had these “****_id” fields that referenced a column in another table. In this example, the customer_id column in the Orders table is a Foreign Key column:




2. Explain Second Normal and Third Normal Form
Ans: Second normal form (2NF) is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971. A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a table is in 2NF if and only if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table. A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table.
Put simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is dependent on the whole of a candidate key. Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.This is a complex way of saying that if a column isn’t intrinsically related to the entire primary key, then you should break out the primary key into different tables.




Example:
OrderId (PK)
ItemId (PK)
OrderDate
1
100
2009-01-01

1
101
2009-01-01

The primary key is (Ordered, ItemId).
Consider OrderDate. It is conceptually part of an order. An order always occurs at some time. But is an OrderDate related to an Item? Not really. You may be saying, “but items are part of an order!”, and you would be right. But that’s not what I’m getting at. OrderDate is independent of the item itself.
Look at another way: in the table above the OrderDate will always be the same for a given OrderId regardless of the value of the ItemId column. This means data duplication, which is denormalization.
Here’s how we correct the problem:
Orders


OrderId (PK)
OrderDate
1
2009-01-01

Order_Items


OrderId (PK)
ItemId (PK)
1
100

1
101

Here is an excellent line from the article, “All we are trying to establish here is whether a particular order on a particular date relies on a particular item.”

Third Normal Form (3NF): No dependencies on non-key attributes
The third normal form (3NF) is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:
  • The relation R (table) is in second normal form (2NF)
  • Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every super key of R.
A non-prime attribute of R is an attribute that does not belong to any candidate key of R. A transitive dependency is a functional dependency in which XZ (X determines Z) indirectly, by virtue of XY and YZ (where it is not the case that YX).
A 3NF definition that is equivalent to Codd's, but expressed differently was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if, for each of its functional dependencies XA, at least one of the following conditions holds:
  • X contains A (that is, XA is trivial functional dependency), or
  • X is a superkey, or
  • Every element of A-X, the set difference between A and X, is a prime attribute (i.e., each column in A-X is contained in some candidate key)
Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent Boyce–Codd normal form (BCNF). BCNF simply eliminates the third alternative ("A is a prime attribute").
2NF covers the case of multi-column primary keys. 3NF is meant to cover single column keys. Simply stated, pull out columns that don’t directly relate to the subject of the row (the primary key), and put them in their own table.
Example:
Orders



OrderId (PK)
OrderDate
CustomerName
CustomerCity
1
2009-01-01
John Smith
Chicago
Customer information could be the subject of its own table. Pull out customer name and other customer fields into another table, and then put a Customer foreign key into Orders.

3. Explain query engine and storage manager in MySQL architecture.
Ans:  MySQL is based on tiered architecture, consisting of both subsystems and support components that interact with each othe to read,parse,and execute quaries,and to cache and return query results.MySQL architecture consists of five primary subsystems that work together to respond to a request made to MySQL database server.

1)     Query Engine
 SQL Interface
The SQL interface provides the mechanisms to receive commands and transmit results to theuser. The MySQL SQL interface was built to the ANSI SQL standard and accepts the same basicSQL statements as most ANSI-compliant database servers. Although many of the SQL commandssupported in MySQL have options that are not ANSI standard, the MySQL developers have
stayed very close to the ANSI SQL standard.Connections to the database server are received from the network communication pathwaysand a thread is created for each. The threaded process is the heart of the executable pathway inthe MySQL server. MySQL is built as a true multithreaded application whereby each thread executes independently of the other threads (except for certain helper threads). The incoming SQL command is stored in a class structure and the results are transmitted to the client by
writing the results out to the network communication protocols. Once a thread has been created, the MySQL server attempts to parse the SQL command and store the parts in the internal data structure.
Parser
When a client issues a query, a new thread is created and the SQL statement is forwarded to the parser for syntactic validation (or rejection due to errors). The MySQL parser is implemented using a large Lex-YACC script that is compiled with Bison. The parser constructs a query structure used to represent the query statement (SQL) in memory as a tree structure (also called an
abstract syntax tree) that can be used to execute the query.
Query Optimizer
The MySQL query optimizer subsystem is considered by some to be misnamed. The optimizer used is a SELECT-PROJECT-JOIN strategy that attempts to restructure the query by first doing any restrictions (SELECT) to narrow the number of tuples to work with, then performs the projections to reduce the number of attributes (fields) in the resulting tuples, and finally evaluates
any join conditions. While not considered a member of the extremely complicated query optimizer category, the SELECT-PROJECT-JOIN strategy falls into the category of heuristic optimizers. In this case, the heuristics (rules) are simply
• Horizontally eliminate extra data by evaluating the expressions in the WHERE (HAVING) clause.
• Vertically eliminate extra data by limiting the data to the attributes specified in the attribute list. The exception is the storage of the attributes used in the join clause that may not be kept in the final query.
• Evaluate join expressions. This results in a strategy that ensures a known-good access method to retrieve data in an efficient manner. Despite critical reviews, the SELECT-PROJECT-JOIN strategy has proven effective at executing the typical queries found in transaction processing. The first step in the optimizer is to check for the existence of tables and access control by the user. If there are errors, the appropriate error message is returned and control returns to the thread manager, or listener. Once the correct tables have been identified, they are opened and the appropriate locks are applied for concurrency control. Once all of the maintenance and setup tasks are complete, the optimizer uses the internal query structure and evaluates the WHERE conditions (a restrict operation) of the query. Results are returned as temporary tables to prepare for the next step. If UNION operators are present, the optimizer executes the SELECT portions of all statements in a loop before continuing. The next step in the optimizer is to execute the projections. These are executed in a similar manner as the restrict portions, again storing the intermediate results as temporary tables and saving only those attributes specified in the column specification in the SELECT statement. Lastly, the structure is analyzed for any JOIN conditions that are built using the join class, and then the join::optimize() method is called. At this stage the query is optimized by evaluating the expressions and eliminating any conditions that result in dead branches or always true or always false conditions (as well as many other similar optimizations). The optimizer is attempting to eliminate any known-bad conditions in the query before executing the join. This is done because joins are the most expensive and time consuming of all of the relational operators. It is also important to note that the join optimization step is performed for all queries that have a WHERE or HAVING clause regardless of whether there are any join conditions. This enables developers to concentrate all of the expression evaluation code in one place. Once the join
optimization is complete, the optimizer uses a series of conditional statements to route the query to the appropriate library method for execution.
Query Execution
Execution of the query is handled by a set of library methods designed to implement a particular query. For example, the mysql_insert() method is designed to insert data. Likewise, there is a mysql_select() method designed to find and return data matching the WHERE clause. This library of execution methods is located in a variety of source code files under a file of a similar
name (e.g., sql_insert.cc or sql_select.cc). All of these methods have as a parameter a thread object that permits the method to access the internal query structure and eases execution.Results from each of the execution methods are returned using the network communication pathways library. The query execution library methods are clearly implemented using the
interpretative model of query execution. 
Query Cache
While not its own subsystem, the query cache should be considered a vital part of the query optimization and execution subsystem. The query cache is a marvelous invention that caches not only the query structure but also the query results themselves. This enables the system to check for frequently used queries and shortcut the entire query optimization and execution stages altogether. This is another of the technologies that is unique to MySQL. Other database system cache queries, but no others cache the actual results. As you can appreciate, the query cache must also allow for situations where the results are “dirty” in the sense that something has changed since the last time the query was run (e.g., an INSERT, UPDATE, or DELETE was run against the base table) and that the cached queries may need to be occasionally purged.

2)    The Storage Manager
 The storage manager interfaces with the operating system to write data to the disk efficiently. Because the storage functions reside in a separate subsystem, the MYSQL engine operates at a level of abstraction away from the operating system. The storage manager writes to disk all of the data in the user tables. indexes, and logs as well as the internal system data.
In MySQL you have the possibility to select from the following storage engines:
  • MyISAM
  • InnoDB
  • MERGE
  • MEMORY
In older MySQL version the BDB and ISAM table types were also available but in the new versions they are not more supported. Besides this in MySQL 6 there will be a new table type called Falcon.
MyISAM 
MyISAM is the default storage engine in MySQL. It is the improved replacement of the old ISAM table type. Using MyISAM storage engine every table is stored in 3 different files:
1.        .frm file which stores the table structure
2.        .MYD file, which stores the table data
3.        .MYI file which is the index file
MyISAM storage engine is optimized for speed and supports extensive indexing. You can index BLOB and TEXT and also supports FULLTEXT indexes.
However MyISAM tables do not supports foreign key constraint and row level locking.
InnoDB
InnoDB is a transaction-safe storage engine in MySQL. Table data are managed by the InnoDB table space. This table type supports foreign key constraints and row level locking as well. However FULLTEXT indexes are not supported.
MERGE 
The MERGE storage engine is a bit special. In this case the data are not stored in the MERGE table, but in the MyISAM tables from which the virtual MERGE table was made up.
MEMORY 
In case of MEMORY storage engine the data are stored in the memory and are available only as long as the MySQL server is available. The MEMORY tables are very fast and so they are ideal for temporary tables.

4. Explain any five Relational database components.
Ans: Although there are different ways to organize data in a database, relational databases are one of the most effective. Relational database systems are an application of mathematical set theory to the problem of effectively organizing data. In a relational database, data is collected into tables (called relations in relational theory).
A table represents some class of objects that are important to an organization. For example, a company may have a database with a table for employees, another table for customers, and another for stores. Each table is built of columns and rows (called attributes and tuples in relational theory). Each column represents some attribute of the object represented by the table. For example, an Employee table would typically have columns for attributes such as first name, last name, employee ID, department, pay grade, and job title. Each row represents an instance of the object represented by the table. For example, one row in the Employee table represents the employee who has employee ID 12345.
When organizing data into tables, you can usually find many different ways to define tables. Relational database theory defines a process called normalization, which ensures that the set of tables you define will organize your data effectively.
There are various parts to a database that we should know about. These include tables, forms, queries, reports, macros and modules.
Tables
The first component we should know about in a database is a table. Here is an example of a table of data about zoo animals.
A database is made up of one or more tables. Each table (also known as a 'file') holds the records for just one entity and an entity is an actual, real-life thing about which we want to keep information. We saw here that tables are made up of records in table rows, fields (attributes) in columns, actual data items and a key field also known as a primary key
Forms
The next database component we should be able to identify is a form.
A form is used to enter new records into a database, one record at a time. They are also used to navigate through existing records and to edit them. Forms have some common characteristics. These include
  • a title (because you may have more than one form)
  • field names (so you know what piece of data needs to be entered
  • place holders, so you know where to type in data
  • navigation buttons, so you can move through the different forms easily
  • form operations such as 'close', so that you can easily and quickly carry out common operations
  • data entry validation rules, so that you can help prevent erroneous data from being entered into the system
  • different methods of data entry such as tick boxes, so that you can speed up data entry and reduce spelling errors.
Queries
The third component of a database is the query. This is used to search through the data we have in our tables to answer questions or queries such as:
  • Could I have a list of all the elephants
  • Could I have a list of all the animals born before 3rd July 2006
  • 'Could I have a list of all the animals looked after by the keeper called Mr Cooper.
We design the query first and then we run it. Designing a query can be done in a number of ways, including using a language called Structured Query Language (SQL). This is common to most database applications software. Below, we haven't used SQL but an alternative known as Query By Example (QBE) which is available for use in Access.
When we run this query, we get the results presented as a table:
Reports
Queries only give us results presented as a table of results. If we want to lay them out so they look professional, with added information such as a logo, a title for the report, headings for the columns of data, a date and so on, and also if we want to organise how the information is presented, perhaps by grouping the results in some way, then we need to use reports. Reports are the next main component we should know about in a database.
Macros
The next component of a database we should know about is the macro. Macros provide extra features and functionality to a database, over and above what can be achieved by using just the tools that come with your database application. A macro is a sequence of instructions which we give a single name to.

5. What are the benefits of making the system catalogs relations?
Ans: We can store a relation using one of several alternative file structures, and we can create one or more indexes each stored as a file on every relation. Conversely, in a relational DBMS, every file contains either the tuples in a relation or the entries in an index. The collection of files corresponding to users' relations and indexes represents the data in the database. A fundamental property of a database system is that it maintains a description of all the data that it contains. A relational DBMS maintains information about every relation and index that it contains. The DBMS also maintains information about views, for which no tuples are stored explicitly; rather, a definition of the view is stored and used to compute the tuples that belong in the view when the view is queried. This information is stored in a collection of relations, maintained by the system, called the catalog relations. The catalog relations are also called the system catalog, the catalog, or the data dictionary. The system catalog is sometimes referred to as metadata; that is, not data, but descriptive information about the data. The information in the system catalog is used extensively for query optimization.
Let us consider what is stored in the system catalog. At a minimum we have system wide information, such as the size of the buffer pool and the page size, and the following information about individual relations, indexes, and views: For each relation:
Its relation name, the file name (or some identifier), and the file structure (e.g., heap file) of the file in which it is stored.
The attribute name and type of each of its attributes.
The index name of each index on the relation.
The integrity constraints (e.g., primary key and foreign key constraints) on the relation.

6. Explain any five Alternative Objective Oriented Database Strategies.
Ans: An object-oriented database management system (OODBMS, but sometimes just called “object database”) is a DBMS that stores data in a logical model that is closely aligned with an application program’s object model. Of course, an OODBMS will have a physical data model optimized for the kinds of logical data model it expects.
If you’re guessing from that definition that there can be difficulties drawing boundaries between the application, the application programming language, the data manipulation language, and/or the DBMS — you’re right. Those difficulties have been a big factor in relegating OODBMS to being a relatively niche technology to date.
  • Intersystems Cache’, the most successful OODBMS product by far, with good OLTP (OnLine Transaction Processing) capabilities and a strong presence in the health care market. Although it was designed around the specialized MUMPS/M language, Cache’ happily talks Java and SQL.
  • ObjectStore, a well-pedigreed startup a couple decades ago, which wound up focusing on complex objects in markets such as computer-aided design. ObjectStore was eventually sold to Progress Software, which is positioning ObjectStore more as a distributed caching system than anything else (Amazon was an impressive reference for that use case). That said, Progress’ ObjectStore business is small, as is its ObjectStore level of effort. Both Cache’ and ObjectStore were at some point unsuccessfully targeted at the XML database market.
  • Part of Workday’s technology stack. Very-well-pedigreed SaaS (Software as a Service) application vendor Workday decided to go with what amounts to an in-memory OODBMS. This makes all kinds of sense, and is a lot of what rekindled my interest in object-oriented database management.
  • Objectivity, also from the 20-years-ago generation, and a poster child for the “DBMS toolkit as much as a DBMS” issue.
  • McObject Perst, an embeddable memory-centric OODBMS.
  • Versant. Actually, by now the Versant company has several different OODBMS; I’m not sure whether what it’s selling has much to do with the original Versant product. Anyhow, both the original and current Versant product seem to be positioned in OLTP. Versant has recently suffered from declining revenues, in license fees and maintenance alike.
  • Forthcoming technology from Starcounter, in the area of high-performance memory-centric OLTP. According to my correspondents, Starcounter still needs to explain how its technology is different from what Versant and ObjectStore introduced 20 or so years ago. Interestingly, while ObjectStore shines as a distributed system, Starcounter’s developers have consigned scale-out to the “too hard to bother with” category.
  • Gemstone, which seemed to be on an ObjectStore-like caching track until it was acquired by VMware.
Arguably, OODBMS have all the benefits of document-model DBMS, but with different language bindings. And if you’re going to write in an object-oriented language anyway, those language bindings can seem pretty appealing. In particular, they might be preferable to fighting your way through object/relational mapping.
Other than the double-edged language sword, the main criticism of object-oriented DBMS is that they include a whole lot of pointers. Intersystem’s and others have shown that, even in a disk-centric world, OODBMS can have excellent performance in OLTP and tolerable performance in simple reporting. As RAM gets cheaper, memory-centric operation becomes ever more viable, making the pointers even less problematic.
Objects and Identity
The following figure shows object with state and behavior. The state is represented by the values of the object's attributes, and the behavior is defined by the methods acting on the state of the object. There is a unique object identifier OID to identify the object.
Complex Objects
Complex objects are built by applying constructors to simpler objects including: sets, lists and tuples. An example is illustrated below:

Encapsulation
Encapsulation is derived from the notion of Abstract Data Type (ADT). It is motivated by the need to make a clear distinction between the specification and the implementation of an operation. It reinforces modularity and provides a form of logical data independence.
Class
A class object is an object which acts as a template.
It specifies:
A structure that is the set of attributes of the instances
A set of operations
A set of methods which implement the operations
Instantiation means generating objects, Ex. 'new' operation in C++
Persistence of objects: Two approaches
An implicit characteristic of all objects
An orthogonal characteristic - insert the object into a persistent collection of objects








Inheritance
A mechanism of reusability, the most powerful concept of OO programming
Association
Association is a link between entities in an application In OODB, associations are represented by means of references between objects a representation of a binary association, a representation of a ternary association reverse reference

No comments:

Post a Comment