Wednesday, 30 August 2023

Data Analysis and Data Warehouse and Data Modeling

Data Analysis

The data analyst serves as a gatekeeper for an organization's data so stakeholders can understand data and use it to make strategic business decisions. It is a technical role that requires an undergraduate degree or master's degree in analytics, computer modeling, science, or math.

Data analysis is a process of inspecting, cleansing, transforming, and modelling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively

 

Top 11 technical and soft skills required to become a data analyst:

·        Data Visualization

·        Data Cleaning

·        MATLAB

·        R

·        Python

·        SQL and NoSQL

·        Machine Learning

·        Linear Algebra and Calculus

·        Microsoft Excel

·        Critical Thinking

·        Communication

 

Data Mining:

Data mining is a particular data analysis technique that focuses on statistical modelling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on the application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data. All of the above are varieties of data analysis.

Data Engineer vs Data Scientist

There is a significant overlap between data engineers and data scientists when it comes to skills and responsibilities. The main difference is the one of focus.  Data Engineers are focused on building infrastructure and architecture for data generation.  In contrast, data scientists are focused on advanced mathematics and statistical analysis on that generated data.  

Data Scientists are engaged in a constant interaction with the data infrastructure that is built and maintained by the data engineers, but they are not responsible for building and maintaining that infrastructure. Instead, they are internal clients, tasked with conducting high-level market and business operation research to identify trends and relations—things that require them to use a variety of sophisticated machines and methods to interact with and act upon data.

Data engineers work to support data scientists and analysts, providing infrastructure and tools that can be used to deliver end-to-end solutions to business problems.  Data engineers build scalable, high performance infrastructure for delivering clear business insights from raw data sources; implement complex analytical projects with a focus on collecting, managing, analyzing, and visualizing data; and develop batch & real-time analytical solutions.

Data scientists depend on data engineers. Whereas data scientists tend to toil away in advanced analysis tools such as R, SPSS, Hadoop, and advanced statistical modelling, data engineers are focused on the products which support those tools. For example, a data engineer’s arsenal may include SQL, MySQL, NoSQL, Cassandra, and other data organization services.

Data Modeling:

Data modeling (data modelling) is the analysis of data objects and their relationships to other data objects. Data modeling is often the first step in database design and object-oriented programming as the designers first create a conceptual model of how data items relate to each other.

Data Modeling tools:

1)     PowerDesigner

2)     ER/Studio

3)     Sparx Enterprise Architect

4)     Oracle SQL Developer Data Modeler

5)     CA ERwin

6)     IBM - InfoSphere Data Architect

 Data Modeling Concept thru Diagram:


Data Modeling


Data Warehouse:

Data Warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise

On-premise data warehouses

Using an on-prem solution naturally involves purchasing, installing, and maintaining your own hardware for storing the contents of your data warehouse, in addition to managing the data it stores.

List of common on-prem data warehouse solutions:

  • IBM
  • Oracle
  • Teradata

Cloud-native data warehouses involve purchasing a solution hosted in the cloud, and funnelling data to it, usually through an API or some other means. Because of the advantages cloud-native solutions provide, nearly all providers of traditionally on-prem solutions have a cloud offering. Cloud-based data warehouses are cost-effective, quick and easy to prepare, can scale without any extra effort, have security built in, and support multi-tenancy.


  • Amazon Redshift
  • Google BigQuery
  • Microsoft Azure
  • Snowflake

 


No comments:

Post a Comment

The Difference Between a Program Manager and a Project Manager

The Difference Between a Program Manager and a Project Manager A program manager manages multiple projects, and sometimes multiple program...