Data Analysis
The data analyst serves as a gatekeeper
for an organization's data so stakeholders can understand data and use it to
make strategic business decisions. It is a technical role that requires an
undergraduate degree or master's degree in analytics, computer modeling,
science, or math.
Data analysis is a process of inspecting, cleansing, transforming, and
modelling data with the goal of discovering useful information, informing
conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse
techniques under a variety of names, and is used in different business,
science, and social science domains. In today's business world, data analysis
plays a role in making decisions more scientific and helping businesses operate
more effectively
Top
11 technical and soft skills required to become a data analyst:
·
Data
Visualization
·
Data
Cleaning
·
MATLAB
·
R
·
Python
·
SQL
and NoSQL
·
Machine
Learning
·
Linear
Algebra and Calculus
·
Microsoft
Excel
·
Critical
Thinking
·
Communication
Data
Mining:
Data mining is a particular data
analysis technique that focuses on statistical modelling and knowledge
discovery for predictive rather than purely descriptive purposes, while
business intelligence covers data analysis that relies heavily on aggregation,
focusing mainly on business information. In statistical applications, data
analysis can be divided into descriptive statistics, exploratory data analysis
(EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new
features in the data while CDA focuses on confirming or falsifying existing
hypotheses. Predictive analytics focuses on the application of statistical
models for predictive forecasting or classification, while text analytics
applies statistical, linguistic, and structural techniques to extract and
classify information from textual sources, a species of unstructured data. All
of the above are varieties of data analysis.
Data Engineer vs Data Scientist
There
is a significant overlap between data
engineers and data scientists when it comes to skills and responsibilities.
The main difference is the one of focus.
Data Engineers are focused on
building infrastructure and architecture for data generation. In contrast, data scientists are focused on advanced mathematics and statistical
analysis on that generated data.
Data Scientists are engaged in a constant interaction with the data infrastructure that is built and maintained by the data engineers, but they are not responsible for building and maintaining that infrastructure. Instead, they are internal clients, tasked with conducting high-level market and business operation research to identify trends and relations—things that require them to use a variety of sophisticated machines and methods to interact with and act upon data.
Data engineers work to support data scientists and analysts, providing infrastructure and tools that can be used to deliver end-to-end solutions to business problems. Data engineers build scalable, high performance infrastructure for delivering clear business insights from raw data sources; implement complex analytical projects with a focus on collecting, managing, analyzing, and visualizing data; and develop batch & real-time analytical solutions.
Data scientists depend on data
engineers. Whereas data scientists tend to toil away in advanced analysis tools
such as R, SPSS, Hadoop, and advanced statistical modelling, data engineers are
focused on the products which support those tools. For example, a data engineer’s
arsenal may include SQL, MySQL, NoSQL, Cassandra, and other data organization
services.
Data Modeling:
Data
modeling (data modelling) is the analysis of data objects and their relationships
to other data objects. Data modeling is often the first step in database design
and object-oriented programming as the designers first create a conceptual
model of how data items relate to each other.
Data Modeling tools:
1) PowerDesigner
2) ER/Studio
3) Sparx
Enterprise Architect
4) Oracle
SQL Developer Data Modeler
5) CA
ERwin
6) IBM
- InfoSphere Data Architect
Data Modeling Concept thru Diagram:
| Data Modeling |
Data Warehouse:
Data
Warehouse (DW or DWH), also known as an enterprise
data warehouse (EDW), is a system used for reporting and data analysis, and
is considered a core component of business intelligence. DWs are central
repositories of integrated data from one or more disparate sources. They store
current and historical data in one single place that are used for creating analytical
reports for workers throughout the enterprise
On-premise data warehouses
Using
an on-prem solution naturally involves purchasing, installing, and maintaining
your own hardware for storing the contents of your data warehouse, in addition
to managing the data it stores.
List of common on-prem data warehouse solutions:
- IBM
- Oracle
- Teradata
Cloud-native data warehouses involve purchasing a solution hosted in the cloud, and funnelling data to it, usually through an API or some other means. Because of the advantages cloud-native solutions provide, nearly all providers of traditionally on-prem solutions have a cloud offering. Cloud-based data warehouses are cost-effective, quick and easy to prepare, can scale without any extra effort, have security built in, and support multi-tenancy.
- Amazon
Redshift
- Google
BigQuery
- Microsoft
Azure
- Snowflake
No comments:
Post a Comment