Lecture 6 – Data management and Databases

Introduction

Data in its raw form is not useful to any organization. Data processing is the method of collecting raw data and translating it into usable information. It is usually performed in a step-by-step process by data management team with aid information systems. It is crucial for generation of organizations’ information for better business strategies and performance. 

Data management

Data management refers to the processes and techniques used to control and organize data throughout its lifecycle. It encompasses various activities aimed at ensuring data quality, accessibility, security, and compliance. Effective data management is a crucial piece of components that constitute information systems that run business applications and provide analytical information to help drive operational decision-making and strategic planning by corporate executives, business managers and other end users.

Here are some key aspects of data management:

  • Data Governance: Data governance involves establishing policies, procedures, and guidelines for managing data within an organization. It defines roles, responsibilities, and processes for data management, ensuring data are handled consistently and in accordance with regulatory requirements.
  • Data Quality: Data quality management focuses on maintaining accurate, consistent, and reliable data. It involves processes to identify and rectify data errors, inconsistencies, duplicates, and incomplete or outdated information. Data cleansing, validation, and standardization techniques are used to improve data quality.
  • Data Integration: Data integration involves combining data from multiple sources into a unified view. It enables organizations to have a holistic understanding of their data by consolidating information from different systems or departments.
  • Data Storage and Retrieval: Data management includes determining the most appropriate storage infrastructure for different types of data. This may involve using databases, data warehouses, data lakes, or cloud-based storage solutions. Efficient retrieval mechanisms, such as indexing and search algorithms, filters and queries are implemented to access data quickly and accurately.
  • Data Security and Privacy: Data security is crucial to protect sensitive and confidential information from unauthorized access, breaches, or data leaks. Data management includes implementing access controls, encryption, authentication mechanisms, and security protocols to safeguard data. Compliance with data privacy regulations, such as the General Data Protection Regulation (GDPR), is also essential.
  • Data Lifecycle Management: Data lifecycle management involves managing data from its creation or acquisition to its archival or disposal. It includes defining data retention policies, data archiving, and data purging strategies. This ensures data is retained for the required period, compliant with legal and regulatory requirements, and reduces storage costs.
  • Data Analytics and Business Intelligence: Data management supports data analysis and business intelligence initiatives by providing clean, organized, and accessible data. It involves creating data models, data warehouses, or data marts for efficient data analysis, reporting, and decision-making.
  • Data Documentation: Effective data management involves documenting data structures, definitions, and relationships. Metadata management involves capturing and managing metadata, which provides information about data attributes, sources, transformations, and usage. Proper documentation and metadata management facilitate data understanding and enhance data lineage and traceability.
  • Data Compliance and Regulatory Requirements: Data management ensures compliance with industry-specific regulations, legal requirements, and data privacy laws. It includes establishing data governance frameworks, data protection policies, and procedures to meet compliance standards.

These aspects of data management are crucial for organizations to derive value from their data assets, ensure data reliability and security, and make informed decisions based on accurate and high-quality data.

Data management best practices

A well-designed data management program is a critical component of effective data management strategies, especially in organizations with distributed data environments that include a diverse set of systems. A strong focus is on data quality that has to be achieved. Business executives and users have to be involved to make sure their data needs are met and data quality standards are met and adhered. Also, the multitude of databases and other data platforms available have to be deployed in accordance to the adopted policy standards. IT and data managers must be sure that the information systems they implement are fit for the intended purpose and should deliver the data processing capabilities and analytics information required for organization’s business operations.

Data Hierarchy

Listed below are the components of the data hierarchy in the ascending order of complexity: 

SimpleBit  (1s and 0s)
Byte  (8 bits)
Field or item
Record
File
Most complex DatabaseDatabase

This is called a data hierarchy because databases are composed of files; files are composed of records, and so on.

Bit – The term bit is short for binary digit. It can assume either of the two states, representing numeric value 0s or 1s. e.g. 1001012

Byte – In a computer system, basic unit of information is called a byte. A byte of information is generally stored by using 8 bits in a specified combination, e.g., 111011002

Field or Item – A field or item of data is one or more bytes that contain data about attributes of an entity.

Record – A record is a collection of fields relating to a specific entity.

File – A file is a collection of related records. The concept of a computer file is very similar to the manual file in a filing cabinet.

Database – A database consists of all the files of an organization, structured and integrated to facilitate updating of files and retrieval of information from them.

Databases 

Collection information which is in an organized form for easier access, management, and various updating is known as a database.  A database is a collection of related data stored in a well-defined structure. Data can be defined as a collection of facts and records on which we can apply reasoning or can-do discussion or some calculation. Data is a numeric or alphanumeric group of symbols, such as 2231971.  The data is always easily available and is in plenty. It can be used for processing some useful information from it. Also, it can be in redundant, can be irrelevant. Data can exist in form of graphics, reports, tables, text, etc. that represents every kind of information, that allows easy retrieval, updating, analysis, and output of data by systematically organized or structured repository of indexed information.

Record: Collection of related data items, e.g. in the above example the three data items had no meaning. But if we organize them in the following way, then they collectively represent meaningful information.

RollNameAge
1ABC19

Table or Relation: Collection of related records.

RollNameAge
1ABC19
2DEF22
3XYZ28

NB: The columns of this relation (table) are called Fields, Attributes or Domains. The rows are called Tuples or Records. Several tables or relations constitute a database.

Database management

Database management refers to the process of organizing, storing, and retrieving data efficiently and securely in a database system. It involves various tasks and techniques aimed at ensuring data integrity, accessibility, and reliability. Here are some key aspects of database management:

  1. Database Design: It involves designing the structure and layout of the database, including tables, relationships, and data types. A well-designed database ensures efficient data storage and retrieval.
  2. Data Modeling: Data modeling is the process of creating a conceptual representation of the database. It involves defining entities, attributes, and relationships between them. Commonly used data modeling techniques include Entity-Relationship (ER) diagrams and Unified Modeling Language (UML).
  3. Data Storage and Organization: Databases store data in structured formats using tables, rows, and columns. The data is organized based on predefined schemas, such as Relational Database Management Systems (RDBMS) or NoSQL databases. Efficient indexing and partitioning strategies can be employed for faster data access and management.
  4. Data Retrieval: Retrieving data from a database involves querying the database using query languages such as SQL (Structured Query Language). SQL allows users to retrieve specific data based on conditions and criteria defined in the query.
  5. Data Security: Database management includes implementing security measures to protect data from unauthorized access, modification, or loss. This includes user authentication, access control, encryption, and backup and recovery strategies.
  6. Data Backup and Recovery: Regular backups of the database are crucial to prevent data loss in case of system failures, disasters, or accidental deletion. Database administrators should establish backup schedules and implement recovery procedures to restore data to a consistent state.
  7. Performance Tuning: Optimizing the performance of a database involves analyzing and improving factors like query execution time, indexing strategies, database configuration, and hardware resources. Performance tuning ensures efficient data retrieval and processing.
  8. Database Scalability: As data grows, a database should be scalable to handle increased storage requirements and user demand. Scaling techniques include vertical scaling (adding more resources to a single server) or horizontal scaling (distributing data across multiple servers).
  9. Data Integration: In many cases, data needs to be integrated from multiple sources into a centralized database. This may involve data extraction, transformation, and loading processes to ensure consistency and accuracy across different data sources.

Thus, effective database management is essential for businesses and organizations to ensure data availability, reliability, and security, enabling informed decision-making and efficient operations.

Database Management System

Databases are managed by software system known as Database Management Systems (DBMS).  It is software system that enables the creation, organization, management, and retrieval of data in a structured and efficient manner. It provides a set of tools and capabilities for interacting with databases and allows users to store, manipulate, and retrieve data easily.

A DBMS is a set of rules and procedures which help us to create organize and manipulate the database. It also helps to add, modify, delete, edit data items/records in the database. It is important because without the existence of such programs it is not possible to maintain the database. Examples of DBMS’s are Microsoft Access, FoxPro, Oracle, and MySQL, PostQress among many others.  Just as a word processor (e.g., Microsoft Word) is used to create and edit documents, a DBMS is used to create and manage databases. Inside a database, the data is recorded in a table which is a collection of rows, columns, and it is indexed so that to find relevant information becomes an easier task. As new information is added, data gets updated, expanded and deleted. The various processes of databases create and update themselves, querying the data they contain and running applications against it.

Advantages of DBMS

  • Reduction of data Redundancy: This is perhaps the most significant advantage of using DBMS. Redundancy is the problem of storing the same data item in more than one place. Redundancy creates several problems like requiring extra storage space, entering same data more than once during data insertion, and deleting data from more than one place during deletion. Anomalies may occur in the database if insertion, deletion etc are not done properly.
  • Sharing of Data: In a paper-based record keeping, data cannot be shared among many users. But in computerized DBMS, many users can share the same database if they are connected via a network.
  • Data Integrity: We can maintain data integrity by specifying integrity constrains, which are rules and restrictions about what kind of data may be entered or manipulated within the database. This increases the reliability of the database as it can be guaranteed that no wrong data can exist within the database at any point of time.
  • Data security: We can restrict certain people from accessing the database or allow them to see certain portion of the database while blocking sensitive information. This is not possible very easily in a paper-based record keeping.

Disadvantages of DBMS

However, there could be a few disadvantages of using DBMS. They can be as following:

  • As DBMS needs computers, we have to invest a good amount in acquiring the hardware, software, installation facilities and training of users.
  • We have to keep regular backups because a failure can occur any time. Taking backup is a lengthy process and the computer system cannot perform any other job at this time.
  • While data security system is a boon for using DBMS, it must be very robust. If someone can bypass the security system then the database would become open to any kind of mishandling.