Data Preprocessing: Understanding the most time-consuming process.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2964 entries, 0 to 2963
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 2964 non-null object
1 Open 2932 non-null float64
2 High 2932 non-null float64
3 Low 2932 non-null float64
4 Adj Close 2932 non-null float64
dtypes: float64(4), object(1) memory usage: 115.9+ KB
w_d['date'] = pd.to_datetime(w_d['date'] , format='%Y-%m-%d')
w_d = w_d.drop(['continent','female_smokers','male_smokers','handwashing_facilities','new_cases_smoothed','new_deaths_smoothed','new_cases_smoothed_per_million', 'new_deaths_smoothed_per_million','new_tests_smoothed','new_tests_smoothed_per_thousand', 'stringency_index','population','population_density' , 'median_age','aged_65_older','aged_70_older' , 'extreme_poverty','cardiovasc_death_rate','diabetes_prevalence', 'hospital_beds_per_thousand','life_expectancy','human_development_index'] , axis=1)
world_data = world_data.dropna(axis ='columns')
y_pred2 = y_pred2.fillna(y_pred2.mean)
#taking mean of missing values
from sklearn.model_selection import train_test_splitX_train , X_test , y_train ,y_test = train_test_split(x,y , test_size=0.3 , random_state=42)




