Just Basic Pandas(Bonus Included!)

import pandas as pd
import numpy as np
df = pd.read_csv('/content/sample_data/california_housing_test.csv')
df = pd.DataFrame({'a':np.random.rand(10),
'b':np.random.randint(10, size=10),
'c':[True,True,True,False,False,np.nan,np.nan,
False,True,True],
'b':['London','Paris','New York','Istanbul',
'Liverpool','Berlin',np.nan,'Madrid',
'Rome',np.nan],
'd':[3,4,5,1,5,2,2,np.nan,np.nan,0],
'e':[1,4,5,3,3,3,3,8,8,4]})
df
df.head()
df.tail()
ip:df.shape
df.info()
op: <class 'pandas.core.frame.DataFrame'> RangeIndex: 3000 entries,
0 to 2999 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 longitude 3000 non-null float64 1 latitude 3000 non-null float64 2 housing_median_age 3000 non-null float64 3 total_rooms 3000 non-null float64 4 total_bedrooms 3000 non-null float64 5 population 3000 non-null float64 6 households 3000 non-null float64 7 median_income 3000 non-null float64 8 median_house_value 3000 non-null float64 dtypes: float64(9) memory usage: 211.1 KB
df.dropna() #drops all null values rows
df.fillna(0) #fill NaN values with zeros
df.fillna().mean() #fills NaN values with mean
df.replace(np.nan, 0) #replace with mean
df.replace(np.nan, df.column.mean()) #replace with mean
df.iloc[1] 
longitude -118.300 latitude 34.260 housing_median_age 43.000 total_rooms 1510.000 total_bedrooms 310.000 population 809.000 households 277.000 median_income 3.599 median_house_value 176500.000 Name: 1, dtype: float64
df.loc[:2,'total_rooms']
0 3885.0
1 1510.0
2 3589.0
Name: total_rooms, dtype: float64
!pip install pandas-profiling
from pandas_profiling import ProfileReport
prof = ProfileReport(df)
prof.to_file(output_file='output.html')

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Predicting the Probability of Scoring a Basket in the NBA using Gradient Boosted Trees

Value Investing Machine — 3

How I solved Kaggle’s Fake News competition question

What Resistbotters Are Writing to Trump

Guide for Databricks Certified Associate Developer for Apache Spark 3.0

Linear Regression in Python in 10 lines

Everything you need to know about Min-Max normalization in Python

Top Languages, Airbnb Litigations, Train Travels, and Gmail Filters — DataViz Weekly

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Akshar Rastogi

Akshar Rastogi

More from Medium

Python For Data Science

Boolean masking (NumPy) in Python

Python: Methods

python

Python: Core data types