1. What are relationships?
Explain the relationship types.
Ans: When creating a database, common
sense dictates that we use separate tables for different types of entities.
Some examples are: customers, orders, items, messages etc… But we also need to
have relationships between these tables. For instance, customers make orders,
and orders contain items. These relationships need to be represented in the
database. Also, when fetching data with SQL, we need to use certain types of
JOIN queries to get what we need.
There are several types of database
relationships. Today we are going to cover the following:
- One to One Relationships
- One to Many and Many to One Relationships
- Many to Many Relationships
- Self Referencing Relationships
When selecting data from multiple
tables with relationships, we will be using the JOIN query. There are several
types of JOIN’s, and we are going to learn about the the following:
- Cross Joins
- Natural Joins
- Inner Joins
- Left (Outer) Joins
- Right (Outer) Joins
We will also learn about the ON clause
and the USING clause.
One to One Relationships
Let’s say you
have a table for customers:
We can put
the customer address information on a separate table:
Now we have a relationship between the
Customers table and the Addresses table. If each address can belong to only one
customer, this relationship is “One to One”. Keep in mind that this kind of
relationship is not very common. Our initial table that included the address
along with the customer could have worked fine in most cases.
Notice that now there is a field named
“address_id” in the Customers table, that refers to the matching record in the
Address table. This is called a “Foreign Key” and it is used for all kinds of
database relationships. We will cover this subject later in the article.
We can visualize the relationship
between the customer and address records like this:
Note that the
existence of a relationship can be optional, like having a customer record that
has no related address record.
One to Many and Many to One
Relationships
This is the most commonly used type of
relationship. Consider an e-commerce website, with the following:
- Customers can make many orders.
- Orders can contain many items.
- Items can have descriptions in many languages.
In these cases we would need to create
“One to Many” relationships. Here is an example:
Each customer
may have zero, one or multiple orders. But an order can belong to only one
customer.
Many to Many Relationships
In some cases, you may need multiple
instances on both sides of the relationship. For example, each order can
contain multiple items. And each item can also be in multiple orders.
For these relationships, we need to
create an extra table:
The Items_Orders table has only one
purpose, and that is to create a “Many to Many” relationship between the items
and the orders.
Here is a how we can visualize this
kind of relationship:
If you want to include the
items_orders records in the graph, it may look like this:
Self Referencing Relationships
This is used when a table needs to
have a relationship with itself. For example, let’s say you have a referral
program. Customers can refer other customers to your shopping website. The
table may look like this:
Customers 102 and 103 were referred by
the customer 101.
This actually can also be similar to
“one to many” relationship since one customer can refer multiple customers.
Also it can be visualized like a tree structure:
One customer might refer zero, one or
multiple customers. Each customer can be referred by only one customer, or none
at all.
If you would like to create a self
referencing “many to many” relationship, you would need an extra table like
just like we talked about in the last section.
Foreign Keys
So far we have only learned about some
of the concepts. Now it is time to bring them to life using SQL. For this part,
we need to understand what Foreign Keys are.
In the relationship examples above, we
always had these “****_id” fields that referenced a column in another table. In
this example, the customer_id column in the Orders table is a Foreign Key
column:
2. Explain Second Normal and
Third Normal Form
Ans: Second normal form (2NF)
is a normal form used in database normalization. 2NF was originally defined by
E.F. Codd in 1971. A table that is in first normal form (1NF) must meet
additional criteria if it is to qualify for second normal form. Specifically: a
table is in 2NF if and only if it is in 1NF and no non-prime attribute is
dependent on any proper subset of any candidate key of the table. A non-prime
attribute of a table is an attribute that is not a part of any candidate key of
the table.
Put simply, a table is in 2NF if and
only if it is in 1NF and every non-prime attribute of the table is dependent on
the whole of a candidate key. Note that when a 1NF table has no composite
candidate keys (candidate keys consisting of more than one attribute), the
table is automatically in 2NF.This is a complex way of saying that if a column
isn’t intrinsically related to the entire primary key, then you should break
out the primary key into different tables.
Example:
OrderId (PK)
|
ItemId (PK)
|
OrderDate
|
…
|
1
|
100
|
2009-01-01
|
|
1
|
101
|
2009-01-01
|
The primary key is (Ordered, ItemId).
Consider OrderDate. It is conceptually
part of an order. An order always occurs at some time. But is an OrderDate
related to an Item? Not really. You may be saying, “but items are part of an
order!”, and you would be right. But that’s not what I’m getting at. OrderDate
is independent of the item itself.
Look at another way: in the table
above the OrderDate will always be the same for a given OrderId regardless of
the value of the ItemId column. This means data duplication, which is
denormalization.
Here’s how we correct the problem:
Orders
|
||
OrderId (PK)
|
OrderDate
|
…
|
1
|
2009-01-01
|
Order_Items
|
||
OrderId (PK)
|
ItemId (PK)
|
…
|
1
|
100
|
|
1
|
101
|
Here is an excellent line from the
article, “All we are trying to establish here is whether a particular order on
a particular date relies on a particular item.”
Third Normal Form (3NF): No
dependencies on non-key attributes
The third
normal form (3NF) is a normal form used in database normalization. 3NF was
originally defined by E.F. Codd in 1971. Codd's definition states
that a table is in 3NF if and only if both of the following conditions hold:
- The relation R (table) is in second normal form (2NF)
- Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every super key of R.
A non-prime
attribute of R is an attribute that does not belong to any candidate key of R.
A transitive dependency is a functional dependency in which X → Z
(X determines Z) indirectly, by virtue of X → Y and
Y → Z (where it is not the case that Y → X).
A 3NF
definition that is equivalent to Codd's, but expressed differently was given by
Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and
only if, for each of its functional dependencies X → A, at least
one of the following conditions holds:
- X contains A (that is, X → A is trivial functional dependency), or
- X is a superkey, or
- Every element of A-X, the set difference between A and X, is a prime attribute (i.e., each column in A-X is contained in some candidate key)
Zaniolo's
definition gives a clear sense of the difference between 3NF and the more
stringent Boyce–Codd normal form (BCNF). BCNF simply eliminates the third
alternative ("A is a prime attribute").
2NF covers the case of multi-column
primary keys. 3NF is meant to cover single column keys. Simply stated, pull out
columns that don’t directly relate to the subject of the row (the primary key),
and put them in their own table.
Example:
Orders
|
|||
OrderId (PK)
|
OrderDate
|
CustomerName
|
CustomerCity
|
1
|
2009-01-01
|
John Smith
|
Chicago
|
Customer information could be the
subject of its own table. Pull out customer name and other customer fields into
another table, and then put a Customer foreign key into Orders.
3. Explain query engine and
storage manager in MySQL architecture.
Ans: MySQL
is based on tiered architecture, consisting of both subsystems and support
components that interact with each othe to read,parse,and execute quaries,and
to cache and return query results.MySQL architecture consists of five primary
subsystems that work together to respond to a request made to MySQL database
server.
1)
Query Engine
SQL
Interface
The SQL
interface provides the mechanisms to receive commands and transmit results to
theuser. The MySQL SQL interface was built to the ANSI SQL standard and accepts
the same basicSQL statements as most ANSI-compliant database servers. Although
many of the SQL commandssupported in MySQL have options that are not ANSI
standard, the MySQL developers have
stayed very
close to the ANSI SQL standard.Connections to the database server are received
from the network communication pathwaysand a thread is created for each. The
threaded process is the heart of the executable pathway inthe MySQL server.
MySQL is built as a true multithreaded application whereby each thread executes
independently of the other threads (except for certain helper threads). The
incoming SQL command is stored in a class structure and the results are
transmitted to the client by
writing the
results out to the network communication protocols. Once a thread has been
created, the MySQL server attempts to parse the SQL command and store the parts
in the internal data structure.
Parser
When a client issues a query, a new thread is created and the SQL
statement is forwarded to the parser for syntactic validation (or rejection due
to errors). The MySQL parser is implemented using a large Lex-YACC script that
is compiled with Bison. The parser constructs a query structure used to
represent the query statement (SQL) in memory as a tree structure (also called
an
abstract syntax tree) that can be used to execute the query.
Query
Optimizer
The MySQL
query optimizer subsystem is considered by some to be misnamed. The optimizer used
is a SELECT-PROJECT-JOIN strategy that attempts to restructure the query by
first doing any restrictions (SELECT) to narrow the number of tuples to work
with, then performs the projections to reduce the number of attributes (fields)
in the resulting tuples, and finally evaluates
any join
conditions. While not considered a member of the extremely complicated query optimizer
category, the SELECT-PROJECT-JOIN strategy falls into the category of heuristic
optimizers. In this case, the heuristics (rules) are simply
•
Horizontally eliminate extra data by evaluating the expressions in the WHERE
(HAVING) clause.
•
Vertically eliminate extra data by limiting the data to the attributes specified
in the attribute list. The exception is the storage of the attributes used in
the join clause that may not be kept in the final query.
• Evaluate
join expressions. This results in a strategy that ensures a known-good access
method to retrieve data in an efficient manner. Despite critical reviews, the
SELECT-PROJECT-JOIN strategy has proven effective at executing the typical
queries found in transaction processing. The first step in the optimizer is to
check for the existence of tables and access control by the user. If there are
errors, the appropriate error message is returned and control returns to the
thread manager, or listener. Once the correct tables have been identified, they
are opened and the appropriate locks are applied for concurrency control. Once
all of the maintenance and setup tasks are complete, the optimizer uses the
internal query structure and evaluates the WHERE conditions (a restrict
operation) of the query. Results are returned as temporary tables to prepare
for the next step. If UNION operators are present, the optimizer executes the
SELECT portions of all statements in a loop before continuing. The next step in
the optimizer is to execute the projections. These are executed in a similar manner
as the restrict portions, again storing the intermediate results as temporary
tables and saving only those attributes specified in the column specification
in the SELECT statement. Lastly, the structure is analyzed for any JOIN
conditions that are built using the join class, and then the join::optimize()
method is called. At this stage the query is optimized by evaluating the
expressions and eliminating any conditions that result in dead branches or
always true or always false conditions (as well as many other similar
optimizations). The optimizer is attempting to eliminate any known-bad
conditions in the query before executing the join. This is done because joins
are the most expensive and time consuming of all of the relational operators.
It is also important to note that the join optimization step is performed for
all queries that have a WHERE or HAVING clause regardless of whether there are
any join conditions. This enables developers to concentrate all of the
expression evaluation code in one place. Once the join
optimization
is complete, the optimizer uses a series of conditional statements to route the
query to the appropriate library method for execution.
Query
Execution
Execution
of the query is handled by a set of library methods designed to implement a
particular query. For example, the mysql_insert() method is designed to insert
data. Likewise, there is a mysql_select() method designed to find and return
data matching the WHERE clause. This library of execution methods is located in
a variety of source code files under a file of a similar
name (e.g.,
sql_insert.cc or sql_select.cc). All of these methods have as a parameter a
thread object that permits the method to access the internal query structure
and eases execution.Results from each of the execution methods are returned using
the network communication pathways library. The query execution library methods
are clearly implemented using the
interpretative
model of query execution.
Query Cache
While not
its own subsystem, the query cache should be considered a vital part of the
query optimization and execution subsystem. The query cache is a marvelous
invention that caches not only the query structure but also the query results
themselves. This enables the system to check for frequently used queries and
shortcut the entire query optimization and execution stages altogether. This is
another of the technologies that is unique to MySQL. Other database system
cache queries, but no others cache the actual results. As you can appreciate,
the query cache must also allow for situations where the results are “dirty” in
the sense that something has changed since the last time the query was run
(e.g., an INSERT, UPDATE, or DELETE was run against the base table) and that
the cached queries may need to be occasionally purged.
2)
The Storage Manager
The
storage manager interfaces with the operating system to write data to the disk
efficiently. Because the storage functions reside in a separate subsystem, the
MYSQL engine operates at a level of abstraction away from the operating system.
The storage manager writes to disk all of the data in the user tables.
indexes, and logs as well as the internal system data.
In MySQL
you have the possibility to select from the following storage engines:
- MyISAM
- InnoDB
- MERGE
- MEMORY
In older
MySQL version the BDB and ISAM table types were also available but in the new
versions they are not more supported. Besides this in MySQL 6 there will be a
new table type called Falcon.
MyISAM
MyISAM is
the default storage engine in MySQL. It is the improved replacement of the old
ISAM table type. Using MyISAM storage engine every table is stored in 3
different files:
1.
.frm file which stores the table structure
2.
.MYD file, which stores the table data
3.
.MYI file which is the index file
MyISAM
storage engine is optimized for speed and supports extensive indexing. You can
index BLOB and TEXT and also supports FULLTEXT indexes.
However
MyISAM tables do not supports foreign key constraint and row level locking.
InnoDB
InnoDB is a
transaction-safe storage engine in MySQL. Table data are managed by the InnoDB
table space. This table type supports foreign key constraints and row level
locking as well. However FULLTEXT indexes are not supported.
MERGE
The MERGE
storage engine is a bit special. In this case the data are not stored in the
MERGE table, but in the MyISAM tables from which the virtual MERGE table was
made up.
MEMORY
In case of
MEMORY storage engine the data are stored in the memory and are available only
as long as the MySQL server is available. The MEMORY tables are very fast and
so they are ideal for temporary tables.
4. Explain any five Relational
database components.
Ans: Although there are different ways
to organize data in a database, relational databases are one of the most
effective. Relational database systems are an application of mathematical set
theory to the problem of effectively organizing data. In a relational database,
data is collected into tables (called relations in relational theory).
A table represents some class of
objects that are important to an organization. For example, a company may have
a database with a table for employees, another table for customers, and another
for stores. Each table is built of columns and rows (called attributes and
tuples in relational theory). Each column represents some attribute of the
object represented by the table. For example, an Employee table would
typically have columns for attributes such as first name, last name, employee
ID, department, pay grade, and job title. Each row represents an instance of
the object represented by the table. For example, one row in the Employee
table represents the employee who has employee ID 12345.
When organizing data into tables, you
can usually find many different ways to define tables. Relational database
theory defines a process called normalization, which ensures that the set of
tables you define will organize your data effectively.
There are various parts to a database that we
should know about. These include tables, forms, queries, reports, macros and
modules.
Tables
The first component we should know about in a database is a table. Here is an example of a table of data about zoo animals.
The first component we should know about in a database is a table. Here is an example of a table of data about zoo animals.
A database is made up of one or more tables.
Each table (also known as a 'file')
holds the records for just one entity and an entity is an actual, real-life
thing about which we want to keep information. We saw here that tables are made
up of records in table rows, fields (attributes) in columns, actual data items
and a key field also known as a primary key
Forms
The next database component we should be able to identify is a form.
The next database component we should be able to identify is a form.
A form is used to enter new records into a
database, one record at a time. They are also used to navigate through existing
records and to edit them. Forms have some common characteristics. These include
- a title (because you may have more than one form)
- field names (so you know what piece of data needs to be entered
- place holders, so you know where to type in data
- navigation buttons, so you can move through the different forms easily
- form operations such as 'close', so that you can easily and quickly carry out common operations
- data entry validation rules, so that you can help prevent erroneous data from being entered into the system
- different methods of data entry such as tick boxes, so that you can speed up data entry and reduce spelling errors.
Queries
The third component of a database is the query. This is used to search through the data we have in our tables to answer questions or queries such as:
The third component of a database is the query. This is used to search through the data we have in our tables to answer questions or queries such as:
- Could I have a list of all the elephants
- Could I have a list of all the animals born before 3rd July 2006
- 'Could I have a list of all the animals looked after by the keeper called Mr Cooper.
We design the query first and then we run it.
Designing a query can be done in a number of ways, including using a language
called Structured Query Language (SQL). This is common to most database
applications software. Below, we haven't used SQL but an alternative known as
Query By Example (QBE) which is available for use in Access.
When we run this query, we get the results
presented as a table:
Reports
Queries only give us results presented as a table of results. If we want to lay them out so they look professional, with added information such as a logo, a title for the report, headings for the columns of data, a date and so on, and also if we want to organise how the information is presented, perhaps by grouping the results in some way, then we need to use reports. Reports are the next main component we should know about in a database.
Queries only give us results presented as a table of results. If we want to lay them out so they look professional, with added information such as a logo, a title for the report, headings for the columns of data, a date and so on, and also if we want to organise how the information is presented, perhaps by grouping the results in some way, then we need to use reports. Reports are the next main component we should know about in a database.
Macros
The next component of a database we should know about is the macro. Macros provide extra features and functionality to a database, over and above what can be achieved by using just the tools that come with your database application. A macro is a sequence of instructions which we give a single name to.
The next component of a database we should know about is the macro. Macros provide extra features and functionality to a database, over and above what can be achieved by using just the tools that come with your database application. A macro is a sequence of instructions which we give a single name to.
5. What are the benefits of
making the system catalogs relations?
Ans: We can store a relation using one of several alternative file
structures, and we can create one or more indexes each stored as a file on
every relation. Conversely, in a relational DBMS, every file contains either
the tuples in a relation or the entries in an index. The collection of files
corresponding to users' relations and indexes represents the data in the
database. A fundamental property of a database system is that it maintains a
description of all the data that it contains. A relational DBMS maintains
information about every relation and index that it contains. The DBMS also
maintains information about views, for which no tuples are stored explicitly;
rather, a definition of the view is stored and used to compute the tuples that
belong in the view when the view is queried. This information is stored in a
collection of relations, maintained by the system, called the catalog
relations. The catalog relations are also called the system catalog, the catalog,
or the data dictionary. The system catalog is sometimes referred to as
metadata; that is, not data, but descriptive information about the data. The
information in the system catalog is used extensively for query optimization.
Let us consider what is
stored in the system catalog. At a minimum we have system wide information,
such as the size of the buffer pool and the page size, and the following
information about individual relations, indexes, and views: For each relation:
Its relation name, the file name (or some
identifier), and the file structure (e.g., heap file) of the file in
which it is stored.
The attribute name and type of each of its
attributes.
The index name of each index on the relation.
The integrity constraints (e.g., primary key and
foreign key constraints) on the relation.
6. Explain any five Alternative
Objective Oriented Database Strategies.
Ans: An object-oriented database
management system (OODBMS, but sometimes just called “object database”) is
a DBMS that stores data in a logical model that is closely aligned with an
application program’s object model. Of course, an OODBMS will have a
physical data model optimized for the kinds of logical data model it expects.
If you’re guessing from that definition
that there can be difficulties drawing boundaries between the application, the
application programming language, the data manipulation language, and/or the
DBMS — you’re right. Those difficulties have been a big factor in relegating
OODBMS to being a relatively niche technology to date.
- Intersystems Cache’, the most successful OODBMS product by far, with good OLTP (OnLine Transaction Processing) capabilities and a strong presence in the health care market. Although it was designed around the specialized MUMPS/M language, Cache’ happily talks Java and SQL.
- ObjectStore, a well-pedigreed startup a couple decades ago, which wound up focusing on complex objects in markets such as computer-aided design. ObjectStore was eventually sold to Progress Software, which is positioning ObjectStore more as a distributed caching system than anything else (Amazon was an impressive reference for that use case). That said, Progress’ ObjectStore business is small, as is its ObjectStore level of effort. Both Cache’ and ObjectStore were at some point unsuccessfully targeted at the XML database market.
- Part of Workday’s technology stack. Very-well-pedigreed SaaS (Software as a Service) application vendor Workday decided to go with what amounts to an in-memory OODBMS. This makes all kinds of sense, and is a lot of what rekindled my interest in object-oriented database management.
- Objectivity, also from the 20-years-ago generation, and a poster child for the “DBMS toolkit as much as a DBMS” issue.
- McObject Perst, an embeddable memory-centric OODBMS.
- Versant. Actually, by now the Versant company has several different OODBMS; I’m not sure whether what it’s selling has much to do with the original Versant product. Anyhow, both the original and current Versant product seem to be positioned in OLTP. Versant has recently suffered from declining revenues, in license fees and maintenance alike.
- Forthcoming technology from Starcounter, in the area of high-performance memory-centric OLTP. According to my correspondents, Starcounter still needs to explain how its technology is different from what Versant and ObjectStore introduced 20 or so years ago. Interestingly, while ObjectStore shines as a distributed system, Starcounter’s developers have consigned scale-out to the “too hard to bother with” category.
- Gemstone, which seemed to be on an ObjectStore-like caching track until it was acquired by VMware.
Arguably, OODBMS have all the benefits
of document-model DBMS, but with different language bindings. And if you’re
going to write in an object-oriented language anyway, those language bindings
can seem pretty appealing. In particular, they might be preferable to fighting
your way through object/relational mapping.
Other than the double-edged language
sword, the main criticism of object-oriented DBMS is that they include a whole
lot of pointers. Intersystem’s and others have shown that, even in a
disk-centric world, OODBMS can have excellent performance in OLTP and tolerable
performance in simple reporting. As RAM gets cheaper, memory-centric operation
becomes ever more viable, making the pointers even less problematic.
Objects and
Identity
The
following figure shows object with state and behavior. The state is represented
by the values of the object's attributes, and the behavior is defined by the
methods acting on the state of the object. There is a unique object identifier
OID to identify the object.

Complex Objects
Complex
objects are built by applying constructors to simpler objects including: sets,
lists and tuples. An example is illustrated below:

Encapsulation
Encapsulation
is derived from the notion of Abstract Data Type (ADT). It is motivated by the
need to make a clear distinction between the specification and the
implementation of an operation. It reinforces modularity and provides a form of
logical data independence.

Class
A class
object is an object which acts as a template.
It
specifies:
A structure that is the set of attributes of the instances
A set of operations
A set of methods which implement the operations
A structure that is the set of attributes of the instances
A set of operations
A set of methods which implement the operations
Instantiation
means generating objects, Ex. 'new' operation in C++
Persistence
of objects: Two approaches
An implicit characteristic of all objects
An orthogonal characteristic - insert the object into a persistent collection of objects
An implicit characteristic of all objects
An orthogonal characteristic - insert the object into a persistent collection of objects
Inheritance
A mechanism
of reusability, the most powerful concept of OO programming

Association
Association
is a link between entities in an application In OODB, associations are
represented by means of references between objects a representation of a binary
association, a representation of a ternary association reverse reference
No comments:
Post a Comment