Summary notes
π― Objectives: Overview summary
π Data Analysis - Complete Notes
Definition: Data analysis is the process of collecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making.
π 1. Types of Data
- Qualitative (Categorical): Descriptive data (e.g. gender, country)
- Quantitative (Numerical): Measurable data (e.g. income, age)
- Discrete: Countable (e.g. number of students)
- Continuous: Measurable but infinite (e.g. weight)
π 2. Steps in Data Analysis
- Define the problem/question
- Collect the data (surveys, databases, APIs)
- Clean the data (handle missing/outliers)
- Analyze (using statistical tools or programming)
- Visualize the results
- Interpret and report
π οΈ 3. Tools for Data Analysis
- Excel: Good for quick analysis & pivot tables
- Python: Using Pandas, NumPy, Matplotlib, Seaborn
- R: For statistical analysis & visualizations
- SQL: Querying structured databases
- Power BI / Tableau: Data dashboards
π 4. Data Cleaning
Ensures that data is accurate, complete, and usable.
# Python (pandas)
df.dropna() # Remove missing values
df.fillna(0) # Replace missing with 0
df['age'] = df['age'].astype(int)
df.dropna() # Remove missing values
df.fillna(0) # Replace missing with 0
df['age'] = df['age'].astype(int)
Example: In a survey dataset, replacing all missing gender values with "Unknown".
π 5. Data Visualization
Used to make data understandable using charts, graphs, and plots.
- Bar Chart: Compare categories
- Pie Chart: Show proportions
- Histogram: Frequency distribution
- Scatter Plot: Relationship between variables
- Box Plot: Summary statistics and outliers
π 6. Descriptive Statistics
- Mean: Average
- Median: Middle value
- Mode: Most frequent value
- Standard Deviation: Spread of data
- Variance: Square of SD
π§ͺ 7. Sample Python Data Analysis Code
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('sales.csv')
print(df.describe())
df['revenue'].plot(kind='hist')
plt.show()
import matplotlib.pyplot as plt
df = pd.read_csv('sales.csv')
print(df.describe())
df['revenue'].plot(kind='hist')
plt.show()
π 8. Interpretation
After visualizing and analyzing, interpret what the patterns or anomalies mean in context.
Example: Sales drop every February β may indicate seasonality or poor marketing.
π 9. Real-World Applications
- Business intelligence & decision making
- Healthcare (patient outcome analysis)
- Finance (fraud detection, credit scoring)
- Education (student performance tracking)
- Retail (sales forecasting, market basket analysis)
π§ 10. Practice Tasks
- Analyze student scores dataset β find average, highest, and visualize performance
- Clean a CSV file with missing and duplicate values
- Use SQL to select top 5 products by sales
- Create bar chart of monthly expenses in Excel or Python
π Sample Exam Questions
- Define data analysis and describe its key steps.
- What are the types of data and give an example of each?
- Explain the importance of data cleaning.
- Compare mean, median, and mode with examples.
- List three tools used for data analysis and their use cases.
π Reference Book: N/A
π Page: 1.0