Data Preprocessing: Understanding the most time-consuming process.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2964 entries, 0 to 2963
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 2964 non-null object
1 Open 2932 non-null float64
2 High 2932 non-null float64
3 Low 2932 non-null float64
4 Adj Close 2932 non-null float64
dtypes: float64(4), object(1) memory usage: 115.9+ KB
df.describe()
w_d['date'] = pd.to_datetime(w_d['date'] , format='%Y-%m-%d')
w_d = w_d.drop(['continent','female_smokers','male_smokers','handwashing_facilities','new_cases_smoothed','new_deaths_smoothed','new_cases_smoothed_per_million', 'new_deaths_smoothed_per_million','new_tests_smoothed','new_tests_smoothed_per_thousand', 'stringency_index','population','population_density' , 'median_age','aged_65_older','aged_70_older' , 'extreme_poverty','cardiovasc_death_rate','diabetes_prevalence', 'hospital_beds_per_thousand','life_expectancy','human_development_index'] , axis=1)
world_data = world_data.dropna(axis ='columns')
y_pred2 = y_pred2.fillna(y_pred2.mean)
#taking mean of missing values
from sklearn.model_selection import train_test_splitX_train , X_test , y_train ,y_test = train_test_split(x,y , test_size=0.3 , random_state=42)
sns.distplot(x=dfd['Close'])

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

A Tale of Two Macro-F1’s

Breast-Cancer-Diagnostic-Prediction-Model

Upgrade your Nvidia GPU Drivers for a Better Performance 🚀

Save children from vehicular heatstroke using OpenVino face and age detection

A Quick Way to Learn XGBoost in Machine Learning?

Unlocking Business Value from Machine Learning: Model Interpretability

Leveraging Word2vec for More than Text

HyperLabel Fundamentals: Object Detection for “High-Stakes” Applications

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Akshar Rastogi

Akshar Rastogi

More from Medium

Essential Libraries To Have In Your Toolbox For Data Science And ML — Series #2 — Pandas

Resources to find datasets for your Next Data Science Project — Part 1

Foundations Of Data Science Part 5— Statistics — Learning Notes

The Data Scientist’s New Year Wish.