Course Lessons
Lesson 1 of 4
Python Basics for Data Science
Learn essential Python concepts needed for data science including data types, control flow, functions, and working with libraries.
25 minutes
Python Basics for Data Science
Python is the most popular language for data science due to its simplicity and powerful ecosystem of libraries.
Why Python for Data Science?
- Easy to Learn: Clean syntax that reads like English
- Rich Ecosystem: NumPy, Pandas, Matplotlib, Scikit-learn
- Community Support: Massive community and resources
- Industry Standard: Used by top companies for data analysis
Essential Data Types
Python provides built-in data types perfect for data work:
- Lists: Ordered, mutable collections [1, 2, 3]
- Tuples: Immutable sequences (1, 2, 3)
- Dictionaries: Key-value pairs {"name": "Alice"}
- Sets: Unique elements {1, 2, 3}
Working with Numbers
Python handles integers and floats seamlessly:
age = 25 # Integer
temperature = 98.6 # Float
result = age * 2 # Arithmetic operations
Control Flow
Make decisions and repeat operations:
- if/elif/else: Conditional logic
- for loops: Iterate over sequences
- while loops: Repeat while condition is true
- List comprehensions: Concise list creation
Functions
Functions organize reusable code:
def calculate_average(numbers):
return sum(numbers) / len(numbers)
- Use
defkeyword to define functions - Parameters allow passing data
- Return values with
returnkeyword
Working with Libraries
Import powerful libraries to extend Python:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Best Practices
- Use descriptive variable names
- Write clear, readable code
- Comment complex logic
- Follow PEP 8 style guidelines
- Use virtual environments for projects
Code Example
# Python basics for data analysis
# Lists and basic operations
temperatures = [72, 75, 68, 71, 73, 69, 70]
print(f"Average temperature: {sum(temperatures) / len(temperatures)}°F")
print(f"Highest: {max(temperatures)}°F")
print(f"Lowest: {min(temperatures)}°F")
# List comprehension
celsius = [(temp - 32) * 5/9 for temp in temperatures]
print(f"Celsius: {[round(t, 1) for t in celsius]}")
# Dictionaries for structured data
student = {
"name": "Alice",
"age": 20,
"grades": [85, 92, 88, 95],
"major": "Data Science"
}
print(f"{student['name']}'s average: {sum(student['grades']) / len(student['grades'])}")
# Functions for reusable logic
def calculate_statistics(data):
return {
"mean": sum(data) / len(data),
"max": max(data),
"min": min(data),
"count": len(data)
}
stats = calculate_statistics(temperatures)
print(f"Statistics: {stats}")
# Working with files
with open('data.csv', 'r') as file:
header = file.readline().strip().split(',')
for line in file:
values = line.strip().split(',')
print(dict(zip(header, values)))