About Me and My Views: August 2023

Wednesday, 30 August 2023

The Difference Between a Program Manager and a Project Manager

The Difference Between a Program Manager and a Project Manager

A program manager manages multiple projects, and sometimes multiple programs while a project manager manages the teams responsible for fulfilling the project and achieving its deliverables.

Generally speaking, a program manager has broader responsibilities than the project manager. Therefore, the tools they use are focused on either the macro, for the program manager, or the micro, for the project manager.

Project Manager Responsibilities:

v Managing the project, including project scope, schedule and resources

v Assembling and managing the project team and their performance

v Delivering successful project outcomes (ensuring it is on time and under budget)

Program Manager Responsibilities:

v Overseeing multiple projects

v Managing multiple project teams (and sometimes project managers)

v Delivering successful program outcomes.

The main difference between a business analyst and a systems analyst is that the BA is business specific and focuses on the broader context in the business of business changes and systems development for a business. On the other hand, the systems analyst will focus on system specific requirements.

Business Systems Analyst:

Business Systems Analyst Job Duties: Determines operational objectives by studying business functions; gathering information; evaluating output requirements and formats. ... Prepares technical reports by collecting, analyzing, and summarizing information and trends.

Business Analyst

The analyst is involved in the design or modification of business systems or IT systems. The analyst interacts with the business stakeholders and subject matter experts in order to understand their problems and needs. The analyst gathers, documents, and analyzes business needs and requirements.

Data-Base (DBMS)

Data-Base (DBMS)

A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS). ... The data can then be easily accessed, managed, modified, updated, controlled, and organized

Type of Data-Base:

1. Structured Data-base (RDBMS)

2. Unstructured Data-base (Non- RDBMS)

Structured Data-base:

The term structured data refers to data that resides in a fixed field within a file or record. Structured data is typically stored in a relational database (RDBMS). ... Typical examples of Excel, Google Sheets, SQL, customer data, phone records, transaction history

Some RDMS Tools:

· Microsoft SQL Server

· Oracle Database

· MySQL

· IBM Db2

· Amazon Relational Database Service (RDS)

· PostgreSQL

· SAP HANA

· Amazon Aurora

· IBM Informix

· MariaDB

· SQLite

· Teradata Vantage

· Azure SQL Database

· Oracle Database Express Edition (XE)

· InterSystems IRIS

· SAP HEC (HANA Enterprise Cloud)

· SAP SQL Anywhere

· Firebird

· Percona Server

Unstructured Data-base:

Structured data is data that has been organized into a formatted repository, typically a database, so that its elements can be made addressable for more effective processing and analysis. ... Typical examples of Text data, social media comments, phone calls transcriptions, various logs files, images, audio, video

Some Non-RDMS Tools:

· Couchbase

· NoSQL

· IBM Cloud Databases

· MongoDB

· IBM Cloudant

· Amazon DynamoDB

· Cassandra

· HBase

· Neo4j

· Cache

· Amazon ElastiCache

· Oracle Berkeley DB

· Google Cloud Datastore

· Matisse

· Amazon Neptune

· TIBCO Graph Database

· BigTable (Google)

Data Analysis and Data Warehouse and Data Modeling

Data Analysis

The data analyst serves as a gatekeeper for an organization's data so stakeholders can understand data and use it to make strategic business decisions. It is a technical role that requires an undergraduate degree or master's degree in analytics, computer modeling, science, or math.

Data analysis is a process of inspecting, cleansing, transforming, and modelling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively

Top 11 technical and soft skills required to become a data analyst:

· Data Visualization

· Data Cleaning

· MATLAB

· R

· Python

· SQL and NoSQL

· Machine Learning

· Linear Algebra and Calculus

· Microsoft Excel

· Critical Thinking

· Communication

Data Mining:

Data mining is a particular data analysis technique that focuses on statistical modelling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on the application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data. All of the above are varieties of data analysis.

Data Engineer vs Data Scientist

There is a significant overlap between data engineers and data scientists when it comes to skills and responsibilities. The main difference is the one of focus. Data Engineers are focused on building infrastructure and architecture for data generation. In contrast, data scientists are focused on advanced mathematics and statistical analysis on that generated data.

Data Scientists are engaged in a constant interaction with the data infrastructure that is built and maintained by the data engineers, but they are not responsible for building and maintaining that infrastructure. Instead, they are internal clients, tasked with conducting high-level market and business operation research to identify trends and relations—things that require them to use a variety of sophisticated machines and methods to interact with and act upon data.

Data engineers work to support data scientists and analysts, providing infrastructure and tools that can be used to deliver end-to-end solutions to business problems. Data engineers build scalable, high performance infrastructure for delivering clear business insights from raw data sources; implement complex analytical projects with a focus on collecting, managing, analyzing, and visualizing data; and develop batch & real-time analytical solutions.

Data scientists depend on data engineers. Whereas data scientists tend to toil away in advanced analysis tools such as R, SPSS, Hadoop, and advanced statistical modelling, data engineers are focused on the products which support those tools. For example, a data engineer’s arsenal may include SQL, MySQL, NoSQL, Cassandra, and other data organization services.

Data Modeling:

Data modeling (data modelling) is the analysis of data objects and their relationships to other data objects. Data modeling is often the first step in database design and object-oriented programming as the designers first create a conceptual model of how data items relate to each other.

Data Modeling tools:

1) PowerDesigner

2) ER/Studio

3) Sparx Enterprise Architect

4) Oracle SQL Developer Data Modeler

5) CA ERwin

6) IBM - InfoSphere Data Architect

Data Modeling Concept thru Diagram:

Data Modeling

Data Warehouse:

Data Warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise

On-premise data warehouses

Using an on-prem solution naturally involves purchasing, installing, and maintaining your own hardware for storing the contents of your data warehouse, in addition to managing the data it stores.

List of common on-prem data warehouse solutions:

IBM
Oracle
Teradata

Cloud-native data warehouses involve purchasing a solution hosted in the cloud, and funnelling data to it, usually through an API or some other means. Because of the advantages cloud-native solutions provide, nearly all providers of traditionally on-prem solutions have a cloud offering. Cloud-based data warehouses are cost-effective, quick and easy to prepare, can scale without any extra effort, have security built in, and support multi-tenancy.

Amazon Redshift
Google BigQuery
Microsoft Azure
Snowflake

ETL and ETL Tools and Business Intelligence and Business Intelligence tools

ETL:

Extract, Transform, Load (ETL) is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s). The term comes from the three basic steps needed: extracting (selecting and exporting) data from the source, transforming the way the data is represented to the form expected by the destination, and loading (reading or importing) the transformed data into the destination system.

ETLs tools:

List of common batch run/incumbent ETL tools:

IBM InfoSphere DataStage
Informatica Power Centre
Microsoft SSIS (SQL Server Integration Services)
Oracle Data Integrator
Oracle Warehouse Builder (OWB)
SAP Data Services
SAS Data Management
PowerCenter Informatica
Elixir Repertoire for Data ETL
Data Migrator (IBI)
Talend Studio for Data Integration
Sagent Data Flow
Actian DataConnect
Open Text Integration Center
Cognos Data Manager
CloverETL
Centerprise Data Integrator
IBM Infosphere Warehouse Edition
Pentaho Data Integration
Adeptia Integration Server
Syncsort DMX
QlikView Expressor
Relational Junction ETL Manager (Sesame Software)

Open source ETL tools

These solutions are the evolutionary middle step between incumbent batch-based tools and fully managed cloud-based solutions. They solve some of the problems that batch run tools do not, for example, handling real-time streaming data.

List of common open source ETL tools:

Apache Kafka
Apache NiFi
CloverETL
Jaspersoft
Pentaho Kettle
Talend Open Studio

Cloud-native ETL tools

Today's ETL tools are cloud-based and run in real time. Cloud-based means your ETL solution is managed and you need not worry about hardware costs, scaling, replication, or security, because these are usually built-in.

List of common cloud-native ETL tools:

Alooma
Fivetran
Matillion
Snaplogic
Stitch Data

Real-time ETL tools

The demand for real-time support has moved the model from batch processing to one based on message queues and streams. Kafka has become the leading distributed message queue, and companies like Alooma have built SaaS or on-prem ETL solutions atop it.

Batch processing of ETL work makes little sense if your data (or insights from it) are needed instantly. And many applications work this way today — a tweet or social media update goes live immediately, not tomorrow!

Here's a list of common real-time ETL tools:

Alooma
Confluent
StreamSets
Striim

BI tools

BI and Analytics tools are about everything you do with the data to get insights once you've captured it. These include tools for visualization, data science analysis, analytics and KPIs:

List of common BI and analytics tools:

SAP Business Intelligence
MicroStrategy
Dundas BI
Yellowfin BI
TIBCO Spotfire
Hevo Data
Microsoft Power BI
Looker
Clear Analytics
Tableau
Oracle BI
Domo
QlikView
Pentaho
TIBCO Jaspersoft
BIRT
IBM Cognos Analytics
Style Intelligence
Netlink

Big-Data and Hadoop

Big data:

Big data refers to data sets that are too large or complex for traditional data-processing application software to adequately deal with. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts: volume, variety, and velocity. Other concepts later attributed with big data are veracity

Tools Of Big Data

Apache Hadoop
Apache Spark
Apache Strom
Apache Cassandra
MongoDb
R programming
Neo4j
Apache SAMOA

Hadoop:

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

The base Apache Hadoop framework is composed of the following modules:

Hadoop Common – contains libraries and utilities needed by other Hadoop modules;

Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;

Hadoop YARN – introduced in 2012 is a platform responsible for managing computing resources in clusters and using them for scheduling users' applications;

Hadoop MapReduce – an implementation of the MapReduce programming model for large-scale data processing.

Scala:

Scala is a strong statically typed general-purpose programming language which supports both object-oriented programming and functional programming. Designed to be concise, many of Scala's design decisions are aimed to address criticisms of Java.

Apace Spark:

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.