Chapter 13: Introduction to Data Analysis with Python

Get a beginner�s introduction to data analysis using Python, focusing on libraries like Pandas and NumPy for data manipulation.

In this chapter, we�ll explore the basics of data analysis using Python�s powerful libraries, Pandas and NumPy. These libraries provide essential tools for data manipulation and analysis, allowing you to handle and transform data with ease.

Introduction to NumPy

NumPy is a library for numerical computing in Python, providing support for arrays, matrices, and mathematical functions. Install it with pip install numpy:

import numpy as np

# Create a NumPy array
arr = np.array([1, 2, 3, 4])
print(arr)

In this example, we create a NumPy array, which is more efficient than a regular Python list for numerical operations.

Basic Operations with NumPy

NumPy provides numerous functions for performing operations on arrays, such as mathematical calculations, reshaping, and aggregations:

arr = np.array([1, 2, 3, 4])
print(arr + 5)        # Adds 5 to each element
print(np.mean(arr))   # Calculates the mean
print(np.sum(arr))    # Calculates the sum

These operations allow you to efficiently perform calculations across large datasets.

Introduction to Pandas

Pandas is a library for data manipulation and analysis, providing data structures like DataFrames for handling tabular data. Install it with pip install pandas:

import pandas as pd

# Create a DataFrame
data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [24, 27, 22]}
df = pd.DataFrame(data)
print(df)

In this example, we create a DataFrame, which organizes data in rows and columns, making it easy to work with tabular data.

Reading and Writing Data with Pandas

Pandas provides functions to read from and write to various data formats, including CSV files:

# Reading data from a CSV file
df = pd.read_csv("data.csv")

# Writing data to a CSV file
df.to_csv("output.csv", index=False)

These functions make it easy to import and export data for analysis.

Data Manipulation with Pandas

Pandas provides various methods for data manipulation, such as filtering, sorting, and aggregating data:

# Filtering data
filtered_df = df[df["Age"] > 25]

# Sorting data
sorted_df = df.sort_values("Age")

# Aggregating data
average_age = df["Age"].mean()

These operations allow you to transform data into a format suitable for analysis.

Analyzing Data with Pandas and NumPy

You can combine Pandas and NumPy to perform more advanced data analysis, such as calculating statistical measures:

# Calculate the mean age
mean_age = np.mean(df["Age"])

# Calculate the age range
age_range = np.ptp(df["Age"])  # Peak-to-peak (range of values)

These tools allow you to compute insights from your data, which can then be visualized or further analyzed.

Summary and Next Steps

In this chapter, we introduced data analysis with Python, focusing on Pandas and NumPy for data manipulation. These libraries provide essential tools for processing and analyzing data efficiently. In the next chapter, we�ll build a simple CLI application to demonstrate data analysis in action.