diabetestalk.net

Pima Indian Diabetes Python

Python - How To Delete Or Ignore Rows In A Csv File When Training A Model? - Stack Overflow

Python - How To Delete Or Ignore Rows In A Csv File When Training A Model? - Stack Overflow

I also came across one paper which discuss missing values in thisdataset. The Problem of Disguised Missing Data Page number 84 and 85: discuss Pima Indians diabetes dataset, Highlight from paper: Breault was able to obtain generally better results by omitting thedisguised missing values, even though this complete case analysisreduced the effective sample size from 768 patients to 392. Dealing with Dataset: Visit Link: It says there are some missing values in the data. Missing Attribute Values: Yes For Each Attribute: (all numeric-valued) 1. Number of times pregnant 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3. Diastolic blood pressure (mm Hg) 4. Triceps skin fold thickness (mm) 5. 2-Hour serum insulin (mu U/ml) 6. Body mass index (weight in kg/(height in m)^2) 7. Diabetes pedigree function 8. Age (years) 9. Class variable (0 or 1) The aim is to use the first 8 variables to predict 9. Class Distribution: (class value 1 is interpreted as "tested positive for diabetes") Class Value Number of instances 0 500 1 268 import numpy as npimport urllib# url with dataseturl = "download the fileraw_data = urllib.urlopen(url)# load the CSV file as a numpy matrixdataset = np.loadtxt(raw_data, delimiter=",")# separate the data from the target attributesX = dataset[:,0:8]y = dataset[:,8] The majority of gradient methods (on which almost all machine learning algorithms are based) are highly sensitive to data scaling. Therefore, before running an algorithm, we should perform either normalization, or the so-called standardization. Normalization involves replacing nominal features, so that each of them would be in the range from 0 to 1. As for standardization, it involves data pre-processing, after which each feature has an average 0 and 1 dispersion. The S Continue reading >>

Github - Niharikagulati/diabetesprediction: Using Pima Indians Diabetes Data Set To Predict Whether A Patient Has Diabetes Or Not Based Upon Patients Lab Test Result Variables Like Glucose, Blood Pressure, Etc. Using Cart Decision Tree Algorithm And K-nearest Model Achieving 76% Accuracy. Python-scikit Learn, Scipy, Pandas, Matplotlib.

Github - Niharikagulati/diabetesprediction: Using Pima Indians Diabetes Data Set To Predict Whether A Patient Has Diabetes Or Not Based Upon Patients Lab Test Result Variables Like Glucose, Blood Pressure, Etc. Using Cart Decision Tree Algorithm And K-nearest Model Achieving 76% Accuracy. Python-scikit Learn, Scipy, Pandas, Matplotlib.

Dataset Link: From the domain knowledge, I have analyzed and found out the ranges of values and its effects on diabetes for each continuous variable in the dataset. Based upon these ranges we will categorize the continuous variables for implementing the decision tree in the next step. Also, we can utilize these ranges to come up with appropriate null value replacement for each independent variable.There are 8 independent variables: Glucose: Plasma Glucose Concentration a 2 hour in an oral glucose tolerance test (mg/dl)A 2-hour value between 140 and 200 mg/dL (7.8 and 11.1 mmol/L) is called impaired glucose tolerance. This is called "pre- diabetes." It means you are at increased risk of developing diabetes over time. A glucose level of 200 mg/dL (11.1 mmol/L) or higher is used to diagnose diabetes. Blood Pressure: Diastolic Blood Pressure(mmHg):If Diastolic B.P > 90 means High B.P (High Probability of Diabetes)Diastolic B.P < 60 means low B.P (Less Probability of Diabetes) Skin Thickness: Triceps Skin Fold Thickness (mm) A value used to estimate body fat. Normal Triceps SkinFold Thickness in women is 23mm. Higher thickness leads to obesity and chances of diabetes increases. Insulin: 2-Hour Serum Insulin (mu U/ml)Normal Insulin Level 16-166 mIU/LValues above this range can be alarming. BMI: Body Mass Index (weight in kg/ height in m2)Body Mass Index of 18.5 to 25 is within the normal rangeBMI between 25 and 30 then it falls within the overweight range. A BMI of 30 or over falls within the obese range. Diabetes Pedigree Function: It provides information about diabetes history in relatives and genetic relationship of those relatives with patients. Higher Pedigree Function means patient is more likely to have diabetes. Outcome: Class Variable (0 or 1) where 0 denotes patien Continue reading >>

Machine Learning: Pima Indians Diabetes

Machine Learning: Pima Indians Diabetes

Sat 14 April 2018| in Development | tags: Machine Learning Python scikit-learn tutorial The Pima are a group of Native Americans living in Arizona. A genetic predisposition allowed this group to survive normally to a diet poor of carbohydrates for years. In the recent years, because of a sudden shift from traditional agricultural crops to processed foods, together with a decline in physical activity, made them develop the highest prevalence of type 2 diabetes and for this reason they have been subject of many studies. The dataset includes data from 768 women with 8 characteristics, in particular: Plasma glucose concentration a 2 hours in an oral glucose tolerance test Body mass index (weight in kg/(height in m)^2) The last column of the dataset indicates if the person has been diagnosed with diabetes (1) or not (0) The original dataset is available at UCI Machine Learning Repository and can be downloaded from this address: The type of dataset and problem is a classic supervised binary classification. Given a number of elements all with certain characteristics (features), we want to build a machine learning model to identify people affected by type 2 diabetes. To solve the problem we will have to analyse the data, do any required transformation and normalisation, apply a machine learning algorithm, train a model, check the performance of the trained model and iterate with other algorithms until we find the most performant for our type of dataset. # We import the libraries needed to read the datasetimport osimport pandas as pdimport numpy as np # We placed the dataset under datasets/ sub folderDATASET_PATH = 'datasets/' # We read the data from the CSV filedata_path = os.path.join(DATASET_PATH, 'pima-indians-diabetes.csv')dataset = pd.read_csv(data_path, header=None)# Bec Continue reading >>

Python Data Science

Python Data Science

Our task is to load the images, convert it into a matrix of numbers (possibly change the shape of the matrix by using some engineering tools) and classify the pastas. First of all you can download the data from here First we need to read all the images in python, and to this we need to iterate over the food file Once the images are loaded we convert them into numerical matrices (After all they are numeric pixel values that represent a particular color) We also shape the data by removing some unnecessary pixel values Great so now we have our data time to split it in train and testing Finally we run different kinds of svm models however we cannot exceed 48% accuracy But no reason to be upset Artificial neural networks to the rescue What are ANN Artificial neural networks are one of the main tools used in machine learning. As the neural part of their name suggests, they are brain-inspired systems which are intended to replicate the way that we humans learn. Neural networks consist of input and output layers, as well as (in most cases) a hidden layer consisting of units that transform the input into something that the output layer can use. They are excellent tools for finding patterns which are far too complex or numerous for a human programmer to extract and teach the machine to recognize. ANNs were able to give us 60% accuracy , which is a significant increase from SVMs. However in order to boost our accuracy, now we try to convert our images from color to gray scale and try to highlight any particular unique shape or feature of the image. This process is known as Histogram of Gradient. However this didnt help us get better results. One shortcut solution we could have used is to use a pre trained neural network, train it on our data and get better results, like vgg16 So Continue reading >>

Data Preprocessing For Machine Learning In Python

Data Preprocessing For Machine Learning In Python

Data Preprocessing for Machine learning in Python Pre-processing refers to the transformations applied to our data before feeding it to the algorithm. Data Preprocessing is a technique that is used to convert the raw data into a clean data set. In other words, whenever the data is gathered from different sources it is collected in raw format which is not feasible for the analysis. For achieving better results from the applied model in Machine Learning projects the format of the data has to be in a proper manner. Some specified Machine Learning model needs information in a specified format, for example, Random Forest algorithm does not support null values, therefore to execute random forest algorithm null values have to be managed from the original raw data set. Another aspect is that data set should be formatted in such a way that more than one Machine Learning and Deep Learning algorithms are executed in one data set, and best out of them is chosen. This article contains 3 different data preprocessing techniques for machine learning. The Pima Indian diabetes dataset is used in each technique. This is a binary classification problem where all of the attributes are numeric and have different scales. It is a great example of a dataset that can benefit from pre-processing. You can find this dataset on the UCI Machine Learning Repository webpage. Note that the program might not run on Geeksforgeeks IDE, but it can run easily on your local python interpreter, provided, you have installed the required libraries. When our data is comprised of attributes with varying scales, many machine learning algorithms can benefit from rescaling the attributes to all have the same scale. This is useful for optimization algorithms in used in the core of machine learning algorithms like gra Continue reading >>

Machine Learning Workflow On Diabetes Data: Part01

Machine Learning Workflow On Diabetes Data: Part01

Machine Learning Workflow on Diabetes Data: Part01 Image credit Machine learning in a medical setting can help enhance medical diagnosis dramatically. This article will portray how data related to diabetes can be leveraged to predict if a person has diabetes or not. More specifically, this article will focus on how machine learning can be utilized to predict diseases such as diabetes. By the end of this article series you will be able to understand concepts like data exploration, data cleansing, feature selection, model selection, model evaluation and apply them in a practical way. Diabetes is a disease which occurs when the blood glucose level becomes high, which ultimately leads to other health problems such as heart diseases, kidney disease etc. Diabetes is caused mainly due to the consumption of highly processed food, bad consumption habits etc. According to WHO , the number of people with diabetes has been increased over the years. Anaconda (Scikit Learn, Numpy, Pandas, Matplotlib, Seaborn) Basic understanding of supervised machine learning methods: specifically classification. As a Data Scientist the most tedious task which we encounter is the acquiring and the preparation of a data set. Even though there is an abundance of data in this era, it is still hard to find a suitable data set which suits the problem you are trying to tackle. If there arent any suitable data sets to be found, you might have to create your own. In this tutorial we arent going to create our own data set, instead we will be using an existing data set called the Pima Indians Diabetes Database provided by the UCI Machine Learning Repository (famous repository for machine learning data sets). We will be performing the machine learning workflow with the Diabetes Data set provided above. When en Continue reading >>

Pima

Pima

# The Pandas groupby method computes the distribution of one feature# with respect to the others# We see 8 histograms distrubuted against a negative diabetes chck# and other 8 histograms with distribution against a positive diabetes checkdata.groupby('class').hist(figsize=(8,8), xlabelsize=7, ylabelsize=7) class0 [[Axes(0.125,0.684722;0.215278x0.215278), Axes...1 [[Axes(0.125,0.684722;0.215278x0.215278), Axes...dtype: object sm = scatter_matrix(data, alpha=0.2, figsize=(7.5, 7.5), diagonal='kde')[plt.setp(item.yaxis.get_majorticklabels(), 'size', 6) for item in sm.ravel()][plt.setp(item.xaxis.get_majorticklabels(), 'size', 6) for item in sm.ravel()]plt.tight_layout(h_pad=0.15, w_pad=0.15) All the above Pandas coding and statistical charts are nice and helpfulbut in most practical situations are useless for coming up with an algorithmfor predicting if a person is likely to have diabetes based on his 8 medical records. In recent years, deep neural networks has been suggested as an effective technique for solvingsuch problem, and so far has shown to be successful in many areas.We will try to build a simple neural network for predicting if a person has diabetes (0 or 1),based of the 8 features in his record.This is the full data except the class column!We want to be able to predict the class (positive or negative diabetes check)from the 8 features (as input) # Let's first extract the first 8 features from our data (from the 9 we have)# We want to be able to predict the class (positive or negative diabetes check)X = data.ix[:,0:8] 0 11 02 13 04 15 06 17 08 19 1Name: class, dtype: int64 A sequential neural-network in Keras consists of a sequence of layers,starting from the inputs layer up to the output layer(also known as Feedforward Neural Network).The number and breadth of Continue reading >>

Data Analysis And Visualization In Python (pima Indians Diabetes Data Set)

Data Analysis And Visualization In Python (pima Indians Diabetes Data Set)

Data analysis and visualization in Python (Pima Indians diabetes data set) in data-visualization - on October 14, 2017 - No comments Today I am going to perform data analysis for a very common data set i.e. Pima Indians Diabetes data set . You can download the data from here . I'll not give the meta information here in detail because it is given exclusively here . So it is recommended for all who want to understand the complete data analysis that what kind of data we are working with. In our analysis we'll be using two major Python libraries to do analysis and visualization. Pandas is for data processing, cleaning and analysis whereas Matplotlib is for visualization of our data. We start by calculating the descriptives which allows us to see the data summary. One of the reaasons why initial descriptives are important because we see the data summary and do preprocessing again if we find any potential outliers and do normalization if there is a significant difference of scales between the variables. Normalization makes our analysis easier specially when we try to visualize data. If you are using pandas, there is a very simple of calculating the descriptive statistics We see that 'df_temp.describe()' does all the calculations. We drop the binary variable 'diabetes?' in df_temp because its descriptive statistics are calculated by binomial distribution formula and the way pandas caluclated the descriptives will not give any insights. Rest of the other work is just to load the data and mapping the columns and meta information. The df.describe() method will return the following output. Here I am not going to spend much time in interprating the the results because it is very basic and you can find various sources to see the interpratation of these metrics. The idea here is to Continue reading >>

Introduction To Machine Learning With Python And Scikit-learn

Introduction To Machine Learning With Python And Scikit-learn

Introduction to Machine Learning with Python and Scikit-Learn My name is Alex. I deal with machine learning and web graphs analysis (mostly in theory). I also work on the development of Big Data products for one of the mobile operators in Russia. Its the first time I write a post, so please, dont judge me too harshly. Nowadays, a lot of people want to develop efficient algorithms and take part in machine learning competitions. So they come to me and ask: Where to start?. Some time ago, I led the development of Big Data tools for the analysis of media and social networks in one of the institutions of the Government of the Russian Federation. I still have some documentation my team used, and Id like to share it with you. It is assumed that the reader has a good knowledge of mathematics and machine learning (my team mostly consisted of MIPT (the Moscow Institute of Physics and Technology) and the School of Data Analysis graduates). Actually, it has been the introduction to Data Science. This science has become quite popular recently. Competitions in machine learning are increasingly held (for example, Kaggle , TudedIT ), and their budget is often quite considerable. The most common tools for a Data Scientist today are R and Python. Each tool has its pros and cons, but Python wins recently in all respects (this is just imho, I use both R and Python though). This happened after there had appeared a very well documented Scikit-Learn library that contains a great number of machine learning algorithms. Please note that we will focus on Machine Learning algorithms in the article. It is usually better to perform the primary data analysis by means of the Pandas package that is quite simple to deal with on your own. So, lets focus on implementation. For definiteness, we assume tha Continue reading >>

Understanding K-nearest Neighbours With The Pima Indians Diabetes Dataset

Understanding K-nearest Neighbours With The Pima Indians Diabetes Dataset

Understanding k-Nearest Neighbours with the PIMA Indians Diabetes dataset K nearest neighbors (kNN) is one of the simplest supervised learning strategies: given a new, unknown observation, it simply looks up in the reference database which ones have the closest features and assigns the predominant class. Let's try and understand kNN with examples. #Importing required packagesfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn import metricsfrom sklearn.cross_validation import train_test_splitimport matplotlib.pyplot as pltimport matplotlib as mplimport numpy as npimport seabornfrom pprint import pprint%matplotlib inline #Let's begin by exploring one of scikit-learn's easiest sample datasets, the Iris.from sklearn.datasets import load_irisiris = load_iris()print iris.keys() ['target_names', 'data', 'target', 'DESCR', 'feature_names'] #The Iris contains data about 3 types of Iris flowers namely:print iris.target_names#Let's look at the shape of the Iris datasetprint iris.data.shapeprint iris.target.shape#So there is data for 150 Iris flowers and a target set with 0,1,2 depending on the type of Iris.#Let's look at the featuresprint iris.feature_names#Great, now the objective is to learn from this dataset so given a new Iris flower we can best guess its type#Let's keep this simple to start with and train on the whole dataset. ['setosa' 'versicolor' 'virginica'](150, 4)(150,)['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] #Fitting the Iris dataset using KNNX,y = iris.data, iris.target#Fitting KNN with 1 Neighbor. This is generally a very bad idea since the 1st closest neighbor to each point is itself#so we will definitely overfit. It's equivalent to hardcoding labels for each row in the dataset.iris_knn = KNeighborsClassifier(n_ Continue reading >>

5. Datasets Kurs Programowania W Python I Machine Learning #ebb625a, 2018-04-19 Documentation

5. Datasets Kurs Programowania W Python I Machine Learning #ebb625a, 2018-04-19 Documentation

The Iris flower data set or Fishers Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish the species from each other. Fig. 5.5. Scatterplot of the Iris data set Based on Fishers linear discriminant model, this data set became a typical test case for many statistical classification techniques in machine learning such as support vector machines. Fig. 5.6. Unsatisfactory k-means clustering result (the data set does not cluster into the known classes) and actual species visualized using ELKI >>> from sklearn.datasets import load_iris>>> iris = load_iris()>>> print(iris.feature_names)['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']>>> print(iris.target_names)['setosa' 'versicolor' 'virginica']>>> print(iris.data[0])[5.1 3.5 1.4 0.2]>>> print(iris.target[0])0 This problem is comprised of 768 observations of medical details for Pima indians patents. The records describe instantaneous measurements taken from the patient such as their age, the number of times pregnant and blood workup. All patients are women aged 21 or older. All attributes are numeric, and their units vary from attribute to attribute. Plasma glucose concentration a 2 hours in an oral glucose tolerance test Body mass index (weight in kg/(height in m)^2) The sklearn.dat Continue reading >>

End-to-end Example: Using Logistic Regression For Predicting Diabetes | Commonlounge

End-to-end Example: Using Logistic Regression For Predicting Diabetes | Commonlounge

We have our data saved in a CSV file called diabetes.csv. We first read our dataset in a pandas dataframe called diabetesDF, and then use the head() function to show the first five records from our dataset. First 5 records in the Pima Indians Diabetes Database The following features have been provided to help us predict whether a person is diabetic or not: Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test BloodPressure: Diastolic blood pressure (mm Hg) SkinThickness: Triceps skin fold thickness (mm) BMI: Body mass index (weight in kg/(height in m)2) DiabetesPedigreeFunction: Diabetes pedigree function (a function which scores likelihood of diabetes based on family history) Outcome: Class variable (0 if non-diabetic, 1 if diabetic) Let's also make sure that our data is clean (has no null values, etc). Note that the data does have some missing values (see Insulin = 0) in the samples in the previous figure. For the model we will be using, (logistic regression), values of 0 automatically imply that the model will simply be ignoring these values. Ideally we could replace these 0 values with the mean value for that feature, but we'll skip that for now. Let us now explore our data set to get a feel of what it looks like and get some insights about it. Let's start by finding correlation of every pair of features (and the outcome variable), and visualize the correlations using a heatmap. Output of feature (and outcome) correlations Heatmap of feature (and outcome) correlations In the above heatmap, brighter colors indicate more correlation. As we can see from the table and the heatmap, glucose levels, age, BMI and number of pregnancies all have significant correlation with the outcome variable. Also notice the correlation between pairs of feat Continue reading >>

Using A Neural Network To Predict Diabetes In Pima Indians

Using A Neural Network To Predict Diabetes In Pima Indians

Using a neural network to predict diabetes in Pima indians Created an 95% accurate neural network to predict the onset of diabetes in Pima indians. Pretty cool! #theano. Needed to navigate to c:/users/Alex Ko/.keras/keras.json and change tensorflow to theano#Create first network with Kerasimport kerasfrom keras.models import Sequentialfrom keras.layers import Denseimport numpyimport pandas as pdimport sklearnfrom sklearn.preprocessing import StandardScaler# fix random seed for reproducibilityseed = 7numpy.random.seed(seed)# load pima indians datasetdataset = numpy.loadtxt('pima-indians-diabetes.csv', delimiter=",")#dataset = pd.read_csv('pima-indians-diabetes.csv')data=pd.DataFrame(dataset) #data is panda but dataset is something elseprint(data.head())# split into input (X ie dependent variables) and output (Y ie independent variables) variablesX = dataset[:,0:8] #0-8 columns are dependent variables - remember 8th column is not includedY = dataset[:,8] #8 column is independent variable# = StandardScaler()X = scaler.fit_transform(X)# create modelmodel = Sequential()# model.add(Dense(1000, input_dim=8, init='uniform', activation='relu')) # 1000 neurons# model.add(Dense(100, init='uniform', activation='tanh')) # 100 neurons with tanh activation functionmodel.add(Dense(500, init='uniform', activation='relu')) # 500 neurons# 95.41% accuracy with 500 neurons# 86.99% accuracy with 100 neurons# 85.2% accuracy with 50 neurons# 81.38% accuracy with 10 neuronsmodel.add(Dense(1, init='uniform', activation='sigmoid')) # 1 output neuron# Compile modelmodel.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])# Fit the modelmodel.fit(X, Y, nb_epoch=150, batch_size=10, verbose=2) # 150 epoch, 10 batch size, verbose = 2# evaluate the modelscores = model.evaluate(X Continue reading >>

Applying Scikit-learn Random Forest Algorithm To Pima Indian Diabetesdataset

Applying Scikit-learn Random Forest Algorithm To Pima Indian Diabetesdataset

Applying scikit-learn Random Forest Algorithm to Pima Indian DiabetesDataset In this Data Science Recipe, the reader will learn: How to organise a Predictive Modelling Machine Learning project step by step. What are the different steps in Predictive Modelling and Applied Machine Learning. How to summarise and present feature variables in Predictive Modelling (Descriptive statistics). How to visualise features through histogram, density plot, box plot and scatter matrix. How to find correlations among features variables. How to do data analysis for feature and target variables. How to utilise Random forest algorithm, sklearn and pandas packages in Python. How to implement Tree based bagging algorithms for Binary Classification in Python. How to implement sklearn random forest classifier in Python. How to setup random forest hyper-parameters: manual and automatic tuning in Python. How to setup RandomSearchCV and GridSearchCV for parameter tuning in Python. How to perform K-fold Cross Validation in Python. How to compare classifiers with Accuracy and Kappa in Python. Machine learning is the science of getting computers to act without being explicitly program. It is a subset of AI: Artificial Intelligence. Predictive modelling is a branch of Machine Learning that particularly deals with tabular data to explicitly find patterns and/or insights from the data available. There are common classes of problems in Machine Learning. The problems discussed below are standards for most of the ML based predictive modelling problems. Classification (or Supervised Learning): Data are labelled meaning that they are assigned to classes, for example spam/non-spam or fraud/non-fraud. The decision being modelled is to assign labels to new unlabelled pieces of data. Classification should be B Continue reading >>

Python Basics: Logistic Regression With Python

Python Basics: Logistic Regression With Python

Python Basics: Logistic regression with Python Logistic regression is one of the basics of data analysis and statistics. The goal of the regression is to predict an outcome, will I sell my car or not? Is this bank transfer fraudulent? Is this patient ill or not? All these outcomes can be encoded as 0 and 1, a fraudulent bank transfer could be encoded as 1 while a regular one would be encoded as 0. As with linear regression, the inputs variable can be either categorical or continuous. In this tutorial, we will create a Logistic regression model to predict whether or not someone has diabetes or not. The dataset that will be used is from Kaggle: Pima Indians Diabetes Database . It has 9 variables:Pregnancies, Glucose,BloodPressure,SkinThickness,Insulin, BMI, DiabetesPedigreeFunction,Age, Outcome. Here is the variable description from Kaggle : Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test BloodPressure: Diastolic blood pressure (mm Hg) SkinThickness: Triceps skin fold thickness (mm) BMI: Body mass index (weight in kg/(height in m)^2) DiabetesPedigreeFunction: Diabetes pedigree function All these variables are continuous, the goal of the tutorial is to predict if someone has diabetes (Outcome=1) according to the other variables. It worth noticing that all the observations are from women older than 21 years old. First, please download the data. Then, with pandas, we will read the CSV: import pandas as pdimport numpy as npDiabetes=pd.read_csv('diabetes.csv')table1=np.mean(Diabetes,axis=0)table2=np.std(Diabetes,axis=0) To understand the data, lets take a look at the different variables means and standard deviations Mean and stard deviation of the vairables The data are unbalanced with 35% of observations having diabetes. The standard deviati Continue reading >>

More in diabetes