Lesson 1 of 4

Python Basics for Data Science

Learn essential Python concepts needed for data science including data types, control flow, functions, and working with libraries.

25 minutes

Python Basics for Data Science

Python is the most popular language for data science due to its simplicity and powerful ecosystem of libraries.

Why Python for Data Science?

  • Easy to Learn: Clean syntax that reads like English
  • Rich Ecosystem: NumPy, Pandas, Matplotlib, Scikit-learn
  • Community Support: Massive community and resources
  • Industry Standard: Used by top companies for data analysis

Essential Data Types

Python provides built-in data types perfect for data work:

  • Lists: Ordered, mutable collections [1, 2, 3]
  • Tuples: Immutable sequences (1, 2, 3)
  • Dictionaries: Key-value pairs {"name": "Alice"}
  • Sets: Unique elements {1, 2, 3}

Working with Numbers

Python handles integers and floats seamlessly:

age = 25  # Integer
temperature = 98.6  # Float
result = age * 2  # Arithmetic operations

Control Flow

Make decisions and repeat operations:

  • if/elif/else: Conditional logic
  • for loops: Iterate over sequences
  • while loops: Repeat while condition is true
  • List comprehensions: Concise list creation

Functions

Functions organize reusable code:

def calculate_average(numbers):
    return sum(numbers) / len(numbers)
  • Use def keyword to define functions
  • Parameters allow passing data
  • Return values with return keyword

Working with Libraries

Import powerful libraries to extend Python:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Best Practices

  • Use descriptive variable names
  • Write clear, readable code
  • Comment complex logic
  • Follow PEP 8 style guidelines
  • Use virtual environments for projects

Code Example

# Python basics for data analysis

# Lists and basic operations
temperatures = [72, 75, 68, 71, 73, 69, 70]
print(f"Average temperature: {sum(temperatures) / len(temperatures)}°F")
print(f"Highest: {max(temperatures)}°F")
print(f"Lowest: {min(temperatures)}°F")

# List comprehension
celsius = [(temp - 32) * 5/9 for temp in temperatures]
print(f"Celsius: {[round(t, 1) for t in celsius]}")

# Dictionaries for structured data
student = {
    "name": "Alice",
    "age": 20,
    "grades": [85, 92, 88, 95],
    "major": "Data Science"
}

print(f"{student['name']}'s average: {sum(student['grades']) / len(student['grades'])}")

# Functions for reusable logic
def calculate_statistics(data):
    return {
        "mean": sum(data) / len(data),
        "max": max(data),
        "min": min(data),
        "count": len(data)
    }

stats = calculate_statistics(temperatures)
print(f"Statistics: {stats}")

# Working with files
with open('data.csv', 'r') as file:
    header = file.readline().strip().split(',')
    for line in file:
        values = line.strip().split(',')
        print(dict(zip(header, values)))