Hands-on 1: Environment Setup for Data Analysis¶

Introduction¶

Welcome to the first hands-on exercise! In this notebook, we'll guide you through setting up your Python development environment for data analysis. By the end of this tutorial, you'll have a fully functional setup ready for the course.

Prerequisites¶

Welcome to our first hands-on exercise! During this TA session, we'll help you set up your Python development environment. Don't worry if you run into any issues - that's what we're here for! This tutorial covers all major operating systems (Windows, macOS, and Linux), so just follow the instructions for your system.

Note: If you encounter any problems during the installation process, raise your hand and a TA will come help you.

Part 1: Installing Visual Studio Code¶

Visual Studio Code (VS Code) is a powerful, lightweight code editor that we'll use throughout the course.

  1. Visit the VS Code download page
  2. Download the appropriate version for your operating system:
    • Windows: .exe installer
    • macOS: .dmg file (Intel) or Apple Silicon version
    • Linux: .deb/.rpm package or snap store installation

Installation instructions by operating system:

Windows¶

  1. Run the downloaded .exe file
  2. Follow the installation wizard
  3. Make sure to check "Add to PATH" during installation

macOS¶

  1. Open the downloaded .dmg file
  2. Drag VS Code to the Applications folder
  3. Launch VS Code from Applications

Linux¶

Ubuntu/Debian:

sudo apt update
sudo apt install code

Fedora/RHEL:

sudo dnf install code

Part 2: Setting Up Package Management with Mamba¶

We'll use Mamba for managing our Python environment. It's a faster alternative to Conda that helps manage Python packages and dependencies.

Installing Mamba¶

  1. First, download Miniforge for your operating system from here

Installation commands by OS:

Windows¶

  • Run the downloaded .exe file
  • Follow the installation wizard
  • Important: Select "Add to PATH" when prompted

macOS/Linux¶

bash ~/Downloads/Miniforge-$(uname)-$(uname -m).sh

After installation, verify mamba is installed:

mamba --version

Part 3: Creating Our Python Environment¶

Now let's create a dedicated environment for data analysis:

# Create a new environment named 'dataanalysis' with Python 3.10
mamba create -n dataanalysis python=3.10

# Activate the environment
# Windows:
mamba activate dataanalysis
# macOS/Linux:
source activate dataanalysis

Part 4: Installing Required Packages¶

Let's install the core packages we'll need:

mamba install -c conda-forge jupyter numpy pandas matplotlib seaborn scikit-learn

Additional packages using pip:

pip install plotly

Part 5: Setting Up VS Code Extensions¶

Install these essential VS Code extensions:

  1. Python (Microsoft)
  2. Jupyter (Microsoft)
  3. GitHub Copilot (if you have access)

To install extensions:

  1. Click the Extensions icon in the left sidebar (or press Ctrl+Shift+X)
  2. Search for each extension
  3. Click "Install"

Part 6: Verifying Your Setup¶

Let's verify everything is working correctly. Create a new notebook in VS Code:

  1. Press Ctrl+Shift+P (Cmd+Shift+P on macOS)
  2. Type "Create New Jupyter Notebook"
  3. Select your 'dataanalysis' kernel

Test your setup with this code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

print("NumPy version:", np.__version__)
print("Pandas version:", pd.__version__)

# Create a simple plot
plt.figure(figsize=(8, 6))
sns.set_style("whitegrid")
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.title("Test Plot")
plt.show()

Troubleshooting¶

Common issues and solutions:

  1. Environment not found in VS Code

    • Restart VS Code
    • Ensure Mamba is in your PATH
    • Run: python -m ipykernel install --user --name dataanalysis
  2. Package installation fails

    • Check your internet connection
    • Try installing packages one by one
    • Use pip install as a fallback
  3. Matplotlib plots not showing

    • Restart the kernel
    • Run %matplotlib inline at the start of your notebook

Next Steps¶

Congratulations! You now have a fully configured Python environment for data analysis. In the next hands-on session, we'll start exploring data manipulation with Python.

Additional Resources¶

  • VS Code Documentation
  • Jupyter Notebook Documentation
  • Mamba Documentation

Remember to:

  • Keep your packages updated using mamba update --all
  • Create different environments for different projects
  • Save your environment configuration using mamba env export > environment.yml