top of page

MACHINE LEARNING

In this blog  you will get to read the brief summary of my online internship on Machine learning 

Following are the terms which i learned

in my machine learning intership

Overview on Python

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is designed to be highly readable. It uses English keywords frequently whereas the other languages use punctuations. It has fewer syntactical constructions than other languages.

 

  • Python is Interpreted: Python is processed at runtime by the interpreter. You do not need to compile your program before executing it. This is similar to PERL and PHP.  

 

  • Python is Interactive: You can actually sit at a Python prompt and interact with the interpreter directly to write your programs.

 

  • Python is Object-Oriented: Python supports Object-Oriented style or technique of programming that encapsulates code within objects.

 

  • Python is a Beginner's Language: Python is a great language for the beginner level programmers and supports the development of a wide range of applications from simple text processing to WWW browsers to games.

Python Features

  • Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax. This allows a student to pick up the language quickly.

  • Easy-to-read: Python code is more clearly defined and visible to the eyes.

  • Easy-to-maintain: Python's source code is fairly easy-to-maintain.

  • A broad standard library: Python's bulk of the library is very portable and crossplatform compatible on UNIX, Windows, and Macintosh.

  • Interactive Mode: Python has support for an interactive mode, which allows interactive testing and debugging of snippets of code.

  • Portable: Python can run on a wide variety of hardware platforms and has the same interface on all platforms

  • Extendable: You can add low-level modules to the Python interpreter. These modules enable programmers to add to or customize their tools to be more efficient.

  • Databases: Python provides interfaces to all major commercial databases.

  • GUI Programming: Python supports GUI applications that can be created and ported to many system calls, libraries and windows systems, such as Windows MFC, Macintosh, and the X Window system of Unix

  • Scalable: Python provides a better structure and support for large programs than shell scripting

Setting up Path

  • Programs and other executable files can be in many directories. Hence, the operating systems provide a search path that lists the directories that it searches for executables. The important features are-

  • The path is stored in an environment variable, which is a named string maintained by the operating system. This variable contains information available to the command shell and other programs. Python 3 10

  • The path variable is named as PATH in Unix or Path in Windows (Unix is case sensitive; Windows is not)

Python Identifiers

  • A Python identifier is a name used to identify a variable, function, class, module or other object. An identifier starts with a letter A to Z or a to z or an underscore (_) followed by zero or more letters, underscores and digits (0 to 9). Python does not allow punctuation characters such as @, $, and % within identifiers. Python is a case sensitive programming language. Thus, Manpower and manpower are two different identifiers in Python. Here are naming conventions for Python identifiers-

  •  Class names start with an uppercase letter. All other identifiers start with a lowercase letter.

  • Starting an identifier with a single leading underscore indicates that the identifier is private. Python 3 15

  • Starting an identifier with two leading underscores indicates a strong private identifier.

  • If the identifier also ends with two trailing underscores, the identifier is a language defined special name.

Standard Data Types

The data stored in memory can be of many types. For example, a person's age is stored as a numeric value and his or her address is stored as alphanumeric characters. Python has various standard data types that are used to define the operations possible on them and the storage method for each of them. Python has five standard data types-

  • Numbers

  • String

  • List

  • Tuple

  • Dictionary

Python Numbers

  • Number data types store numeric values. Number objects are created when you assign a value to them. For example var1 = 1 var2 = 10

  • You can also delete the reference to a number object by using the del statement. The syntax of the del statement is − del var1[,var2[,var3[....,varN]]]]

  • You can delete a single object or multiple objects by using the del statement. For example del var del var_a, var_b

  • Python supports three different numerical types −

​​

  • int (signed integers)

  •  float (floating point real values)

  • complex (complex numbers)

 

  • Python 3 25 All integers in Python 3 are represented as long integers. Hence, there is no separate number type as long.

Screenshot (95).png
48jvt4.jpg

Python Strings

  • Strings in Python are identified as a contiguous set of characters represented in the quotation marks. Python allows either pair of single or double quotes. Subsets of strings can be taken using the slice operator ([ ] and [:] ) with indexes starting at 0 in the beginning of the string and working their way from -1 to the end. The plus (+) sign is the string concatenation operator and the asterisk (*) is the repetition operator. For example-

​

​

#!/usr/bin/python3

 str = 'Hello World!'

print (str) # Prints complete string

print (str[0]) # Prints first character of the string

print (str[2:5]) # Prints characters starting from 3rd to 5th

print (str[2:]) # Prints string starting from 3rd character

print (str * 2) # Prints string two times

print (str + "TEST") # Prints concatenated string

Python Lists

  • Lists are the most versatile of Python's compound data types. A list contains items separated by commas and enclosed within square brackets ([]). To some extent, lists are similar to arrays in C. One of the differences between them is that all the items belonging to a list can be of different data type. The values stored in a list can be accessed using the slice operator ([ ] and [:]) with indexes starting at 0 in the beginning of the list and working their way to end -1. The plus (+) sign is the list concatenation operator, and the asterisk (*) is the repetition operator. For example-

​

#!/usr/bin/python3

list = [ 'abcd', 786 , 2.23, 'john', 70.2 ]

tinylist = [123, 'john']

print (list) # Prints complete list

print (list[0]) # Prints first element of the list

print (list[1:3]) # Prints elements starting from 2nd till 3rd

print (list[2:]) # Prints elements starting from 3rd element

print (tinylist * 2) # Prints list two times

print (list + tinylist) # Prints concatenated lists

Python Tuples

  • A tuple is another sequence data type that is similar to the list. A tuple consists of a number of values separated by commas. Unlike lists, however, tuples are enclosed within parenthesis. The main difference between lists and tuples is- Lists are enclosed in brackets ( [ ] ) and their elements and size can be changed, while tuples are enclosed in parentheses ( ( ) ) and cannot be updated. Tuples can be thought of as read-only lists. For example-

​

#!/usr/bin/python3

tuple = ( 'abcd', 786 , 2.23, 'john', 70.2 )

tinytuple = (123, 'john')

print (tuple) # Prints complete tuple

print (tuple[0]) # Prints first element of the tuple

print (tuple[1:3]) # Prints elements starting from 2nd till 3rd

print (tuple[2:]) # Prints elements starting from 3rd element

print (tinytuple * 2) # Prints tuple two times

print (tuple + tinytuple) # Prints concatenated tuple

Python Dictionary

  • Python's dictionaries are kind of hash-table type. They work like associative arrays or hashes found in Perl and consist of key-value pairs. A dictionary key can be almost any Python type, but are usually numbers or strings. Values, on the other hand, can be any arbitrary Python object. Python 3 28 Dictionaries are enclosed by curly braces ({ }) and values can be assigned and accessed using square braces ([]). For example-

​

#!/usr/bin/python3

dict = {} dict['one'] = "This is one"

dict[2] = "This is two"

tinydict = {'name': 'john','code':6734, 'dept': 'sales'}

print (dict['one']) # Prints value for 'one' key

print (dict[2]) # Prints value for 2 key

print (tinydict) # Prints complete dictionary

print (tinydict.keys()) # Prints all the keys

print (tinydict.values()) # Prints all the values

Data Type Conversion

  • Sometimes, you may need to perform conversions between the built-in types. To convert between types, you simply use the type-name as a function. There are several built-in functions to perform conversion from one data type to another. These functions return a new object representing the converted value.

Screenshot (97).png
Screenshot (98).png
prz7lg2pv6231.jpg

Basic Operators

Operators are the constructs, which can manipulate the value of operands. Consider the expression 4 + 5 = 9. Here, 4 and 5 are called operands and + is called the operator.

 

Types of Operator Python language supports the following types of operators-

  • Arithmetic Operators

  • Comparison (Relational) Operators

  • Assignment Operators

  • Logical Operators

  • Bitwise Operators

  • Membership Operators

  • Identity Operators

Decision Making

Decision-making is the anticipation of conditions occurring during the execution of a program and specified actions taken according to the conditions. Decision structures evaluate multiple expressions, which produce TRUE or FALSE as the outcome. You need to determine which action to take and which statements to execute if the outcome is TRUE or FALSE otherwise.

Python programming language assumes any non-zero and non-null values as TRUE, and any zero or null values as FALSE value. Python programming language provides the following types of decision-making statements.

decision-making-c-1.png
Screenshot (99).png
Screenshot (99).png

Loops

In general, statements are executed sequentially- The first statement in a function is executed first, followed by the second, and so on. There may be a situation when you need to execute a block of code several number of times. Programming languages provide various control structures that allow more complicated execution paths. A loop statement allows us to execute a statement or group of statements multiple times.

do-while-loop-flowchart.png
Screenshot (100).png
Screenshot (100).png

Loop Control Statements

The Loop control statements change the execution from its normal sequence. When the execution leaves a scope, all automatic objects that were created in that scope are destroyed.

Break Statement

The break statement is used for premature termination of the current loop. After abandoning the loop, execution at the next statement is resumed, just like the traditional break statement in C. The most common use of break is when some external condition is triggered requiring a hasty exit from a loop. The break statement can be used in both while and for loops. If you are using nested loops, the break statement stops the execution of the innermost loop and starts executing the next line of the code after the block.

Continue Statement

The continue statement in Python returns the control to the beginning of the current loop. When encountered, the loop starts next iteration without executing the remaining statements in the current iteration.

The continue statement can be used in both while and for loops.

Pass Statement

It is used when a statement is required syntactically but you do not want any command or code to execute. The pass statement is a null operation; nothing happens when it executes. The pass statement is also useful in places where your code will eventually go, but has not been written yet i.e. in stubs).

​

Iterator and Generator

Iterator is an object, which allows a programmer to traverse through all the elements of a collection, regardless of its specific implementation. In Python, an iterator object implements two methods, iter() and next(). String, List or Tuple objects can be used to create an Iterator.

​

A generator is a function that produces or yields a sequence of values using yield method. When a generator function is called, it returns a generator object without even beginning execution of the function. When the next() method is called for the first time, the function starts executing, until it reaches the yield statement, which returns the yielded value. The yield keeps track i.e. remembers the last execution and the second next() call continues from previous value.

Exceptions Handling

Python provides two very important features to handle any unexpected error in your Python programs and to add debugging capabilities in them-

  • Exception Handling. 

  • Assertions.

​

​

Here is a list of few Standard Exceptions available in Python

  • Exception -  Base class for all exceptions

  • StopIteration - Raised when the next() method of an iterator does not point to any object.

  • SystemExit - Raised by the sys.exit() function.

  • StandardError - Base class for all built-in exceptions except StopIteration - and SystemExit. 

  • ArithmeticError - Base class for all errors that occur for numeric calculation.

  • OverflowError - Raised when a calculation exceeds maximum limit for a numeric type.

What is Exception?

An exception is an event, which occurs during the execution of a program that disrupts the normal flow of the program's instructions. In general, when a Python script encounters a situation that it cannot cope with, it raises an exception. An exception is a Python object that represents an error. When a Python script raises an exception, it must either handle the exception immediately otherwise it terminates and quits.

Handling an Exception

If you have some suspicious code that may raise an exception, you can defend your program by placing the suspicious code in a try: block. After the try: block, include an except: statement, followed by a block of code which handles the problem as elegantly as possible.

​

Here are few important points about the syntax-

 

  • A single try statement can have multiple except statements. This is useful when the try block contains statements that may throw different types of exceptions. 

  • You can also provide a generic except clause, which handles any exception. 

  • After the except clause(s), you can include an else-clause. The code in the else block executes if the code in the try: block does not raise an exception.

  • The else-block is a good place for code that does not need the try: block's protection.

​

​

Files I/O

Reading Keyboard Input

Python 2 has two built-in functions to read data from standard input, which by default comes from the keyboard. These functions are input() and raw_input()

In Python 3, raw_input() function is deprecated. Moreover, input() functions read data from keyboard as string, irrespective of whether it is enclosed with quotes ('' or "" ) or not.

Opening and Closing Files

Until now, you have been reading and writing to the standard input and output. Now, we will see how to use actual data files. Python provides basic functions and methods necessary to manipulate files by default. You can do most of the file manipulation using a file object. The open Function Before you can read or write a file, you have to open it using Python's built-in open() function. This function creates a file object, which would be utilized to call other support methods associated with it.

Functions

A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide better modularity for your application and a high degree of code reusing. As you already know, Python gives you many built-in functions like print(), etc. but you can also create your own functions. These functions are called user-defined functions.

​

You can define functions to provide the required functionality. Here are simple rules to define a function in Python.

 

  • Function blocks begin with the keyword def followed by the function name and parentheses ( ( ) ).

  • Any input parameters or arguments should be placed within these parentheses. You can also define parameters inside these parentheses.

  • The first statement of a function can be an optional statement - the documentation string of the function or docstring.

  • The code block within every function starts with a colon (:) and is indented.

  • The statement return [expression] exits a function, optionally passing back an expression to the caller. A return statement with no arguments is the same as return None.

​

Function Arguments You can call a function by using the following types of formal arguments-

 

  • Required arguments

  • Keyword arguments

  • Default arguments  

  • Variable-length arguments

Machine Learning with Python

Machine Learning (ML) is basically that field of computer science with the help of which computer systems can provide sense to data in much the same way as human beings do. In simple words, ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm or method. The key focus of ML is to allow computer systems to learn from experience without being explicitly programmed or human intervention.

What is Machine Learning?

Machine Learning (ML) is that field of computer science with the help of which computer systems can provide sense to data in much the same way as human beings do.

In simple words, ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm or method. The main focus of ML is to allow computer systems learn from experience without being explicitly programmed or human intervention.

​

Need for Machine Learning

Human beings, at this moment, are the most intelligent and advanced species on earth because they can think, evaluate and solve complex problems. On the other side, AI is still in its initial stage and haven’t surpassed human intelligence in many aspects. Then the question is that what is the need to make machine learn? The most suitable reason for doing this is, “to make decisions, based on data, with efficiency and scale”.

Lately, organizations are investing heavily in newer technologies like Artificial Intelligence, Machine Learning and Deep Learning to get the key information from data to perform several real-world tasks and solve problems. We can call it data-driven decisions taken by machines, particularly to automate the process. These data-driven decisions can be used, instead of using programing logic, in the problems that cannot be programmed inherently. The fact is that we can’t do without human intelligence, but other aspect is that we all need to solve real-world problems with efficiency at a huge scale. That is why the need for machine learning arises.

Why & When to Make Machines Learn?

We have already discussed the need for machine learning, but another question arises that in what scenarios we must make the machine learn? There can be several circumstances where we need machines to take data-driven decisions with efficiency and at a huge scale. The followings are some of such circumstances where making machines learn would be more effective.

Methods to Load CSV Data File

While working with ML projects, the most crucial task is to load the data properly into it. The most common data format for ML projects is CSV and it comes in various flavors and varying difficulties to parse. In this section, we are going to discuss about three common approaches in Python to load CSV data file Load CSV with Python Standard Library

The first and most used approach to load CSV data file is the use of Python standard library which provides us a variety of built-in modules namely csv module and the reader()function. The following is an example of loading CSV data file with the help of it In this example, we are using the iris flower data set which can be downloaded into our local directory. After loading the data file, we can convert it into NumPy array and use it for ML projects. Following is the Python script for loading CSV data file

Packages(libraries)

GENERAL PURPOSE LIBRARIES

1. Data processing                                                                  pandas

2. Computer vision                                                               Open CV

3. General purpose ML                                                           pytorch

Python-based frameworks

WEB APPLICATIONS

1. Full stack frameworks                                                           django

2. Micro frameworks                                                                  flask

3. Asynchronous frameworks

DATA SCIENCE and ML

1. Math and statistics                                                  math, numpy, scipy

2. Visualization                                                             matplotlib, seaborn

3. Machine Learning                                                    scikitlearn (sklearn)

4. Deep learning                                                                 tensorflow

5. Natural language processing                                     NLTK(NL Tool kit)   

6. Distributed deep learning                                        keras, elephas, spark

7. Data scraping                                                                      scrapy

Steps in Machine Learning

1 - Data Collection

  • The quantity & quality of your data dictate how accurate our model is

  • The outcome of this step is generally a representation of data (Guo simplifies to specifying a table) which we will use for training

  • Using pre-collected data, by way of datasets from Kaggle, UCI, etc., still fits into this step

 
2 - Data Preparation

  • Wrangle data and prepare it for training

  • Clean that which may require it (remove duplicates, correct errors, deal with missing values, normalization, data type conversions, etc.)

  • Randomize data, which erases the effects of the particular order in which we collected and/or otherwise prepared our data

  • Visualize data to help detect relevant relationships between variables or class imbalances (bias alert!), or perform other exploratory analysis

  • Split into training and evaluation sets

 
3 - Choose a Model

  • Different algorithms are for different tasks; choose the right one

 
4 - Train the Model

  • The goal of training is to answer a question or make a prediction correctly as often as possible

  • Linear regression example: algorithm would need to learn values for m (or W) and b (x is input, y is output)

  • Each iteration of process is a training step

 
5 - Evaluate the Model

  • Uses some metric or combination of metrics to "measure" objective performance of model

  • Test the model against previously unseen data

  • This unseen data is meant to be somewhat representative of model performance in the real world, but still helps tune the model (as opposed to test data, which does not)

  • Good train/eval split? 80/20, 70/30, or similar, depending on domain, data availability, dataset particulars, etc.

 
6 - Parameter Tuning

  • This step refers to hyperparameter tuning, which is an "artform" as opposed to a science

  • Tune model parameters for improved performance

  • Simple model hyperparameters may include: number of training steps, learning rate, initialization values and distribution, etc.

 
7 - Make Predictions

  • Using further (test set) data which have, until this point, been withheld from the model (and for which class labels are known), are used to test the model; a better approximation of how the model will perform in the real world

Types of Machine Learning

SUPERVISED LEARNING: Supervised Learning is the first type of machine learning, in which labelled data used to train the algorithms. In supervised learning, algorithms are trained using marked data, where the input and the output are known. Types of Supervised Learning:

1. Regression

2. Classification

​

UNSUPERVISED LEARNING: Unsupervised Learning is the second type of machine learning, in which unlabeled data are used to train the algorithm, which means it used against data that has no historical labels. What is being showing must figure out by the algorithm. The purpose is to explore the data and find some structure within.

Types of Unsupervised Learning:

1. Clustering

2. Dimensionality reduction

ChatterBot

ChatterBot is a machine-learning based conversational dialog engine build in Python which makes it possible to generate responses based on collections of known conversations. The language independent design of ChatterBot allows it to be trained to speak any language.

​

An untrained instance of ChatterBot starts off with no knowledge of how to communicate. Each time a user enters a statement, the library saves the text that they entered and the text that the statement was in response to. As ChatterBot receives more input the number of responses that it can reply and the accuracy of each response in relation to the input statement increase. The program selects the closest matching response by searching for the closest matching known statement that matches the input, it then returns the most likely response to that statement based on how frequently each response is issued by the people the bot communicates with.

pip install chatterbot

Iris Classification Logistic

Classify flowers into their species based upon the length and width of their sepal and petal

#data collection
#import libraries and dataframe
import pandas as pd
data = pd.read_csv('Iris.csv')

​

#data interpretation
data.info()
data.describe()
data['Species'].unique()

​

#data cleaning
data.drop('Id',inplace=True,axis=1) #default: axis=0 (row)
#Replace species with 0,1,2
classes = {'Iris-setosa':0, 'Iris-versicolor':1, 'Iris-virginica':2}
data.replace({'Species':classes},inplace=True)

​

#Create arrays 
#x(iv): sl,sw,pl,pw
#y(tv): species
x=data.iloc[:,:-1].values
y=data.iloc[:,-1].values

​

#Split universal dataset (train:test)
#Library: sklearn
#module : model_selection
#class  : train_test_split
from sklearn.model_selection import train_test_split as tts
x_train,x_test,y_train,y_test=tts(x,y,test_size=0.2,random_state=17)

​

#Algorithm selection
#Logistic regression
#Library: sklearn
#module : linear_model
#class  : LogisticRegression
from sklearn.linear_model import LogisticRegression as logreg
model_logreg = logreg() #object for model

​

#Train the model. Use fit(training arrays x and y)
model_logreg.fit(x_train,y_train)

​

#Test the model
#Predict the output for x_test data. Use predict(x_test)
y_pred=model_logreg.predict(x_test)

​

#Check accuracy by giving x_test and y_test data
#Use score(x_test, y_test)
accuracy = model_logreg.score(x_test,y_test)
print('Logistic Regression Accuracy:', accuracy)

​

#Confusion matrix: tells right and wrong predictions
#Library: sklearn
#module : metrics
#class  : confusion_matrix
#Use confusion_matrix(actual,predicted)
from sklearn.metrics import confusion_matrix as conmat
cm = conmat(y_test,y_pred)

Data visualization (EDA)

Methods discussed in live session:

1. Countplot : Analyse number of values in variables

2. Box and whisker plot : Descriptive analysis on dataframe

3. Pairplot : Pairwise analysis on variables (best for multivariate)

4. Scatterplot : Analysis of variable to variable (bivariate)

5. Scatter matrix : Analyse multivariate dataframe distribution

6. Heatmap : Best in analysis of correlation of variables

#data collection
#import libraries and dataframe
import pandas as pd
data = pd.read_csv('Iris.csv')

#data cleaning
data.drop('Id',inplace=True,axis=1) #default: axis=0 (row)


#PREVIOUSLY USED METHOD: Replace with dictionary and replace()
#Replace species with 0,1,2
classes = {'Iris-setosa':0, 'Iris-versicolor':1, 'Iris-virginica':2}
data.replace({'Species':classes},inplace=True)

x=data.iloc[:,:-1].values
y=data.iloc[:,-1].values


'''
#ALTERNATE METHOD
#Encoding: Label Encoding

#Library : sklearn
#Module  : preprocessing
#Class   : LabelEncoder
from sklearn.preprocessing import LabelEncoder
laben = LabelEncoder()      #Create object for class
y=laben.fit_transform(y)    #fit y, transform y to encoded values
'''

#VISUALISATION

#Countplot (count some values - Species)
import seaborn as sb
#sb.countplot(data['Species'])
sb.countplot(x='Species',data=data)

#Box-whisker plot
#data.plot(kind='box')
data.plot(kind='box',subplots=True,layout=(3,5),figsize=(15,15))

#Pairplot
#sb.pairplot(data)
sb.pairplot(data,hue='Species')

#Scatter plot
import matplotlib.pyplot as plt
plt.scatter(data['PetalLengthCm'],data['PetalWidthCm'])

#Scatter-matrix
#pandas.plotting
from pandas.plotting import scatter_matrix as sm
sm(data,c='r',alpha=0.7,figsize=(10,10))

#Heatmap (for correlation)
#sb.heatmap(data.corr())
sb.heatmap(data.corr(),annot=True)

 

Need for Visualisation:

Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed.
Python offers multiple great graphing libraries that come packed with lots of different features. No matter if you want to create interactive, live or highly customized plots python has an excellent library for you.
 its favours for :

1. Huge dataset 

2. Feature selection

3. Data science

4. Algorithm selection

Heatmap Python

What is a heatmap? A heatmap is a two-dimensional graphical representation of data where the individual values that are contained in a matrix are represented as colors. The seaborn python package allows the creation of annotated heatmaps which can be tweaked using Matplotlib tools as per the creator's requirement.

​

+1 Strong Positive relationship

+0.5 Moderate

+0.3 Weak

0 No relationship

-0.3 Weak

-0.5 Moderate

-1 Strong negative relationship

import pandas as pd
data = pd.read_csv('Breast_Cancer.csv')
iris = pd.read_csv('Iris.csv')
usa  = pd.read_csv('USA_Housing.csv')
usa.drop('Address',inplace=True,axis=1)
iris.drop('Id',inplace=True,axis=1)
data.drop(['id','Unnamed: 32'],inplace=True,axis=1)

data.info()

​

#EXPLORATORY DATA ANALYSIS
#HEATMAP - Correlation of variables
import seaborn as sb
sb.heatmap(usa.corr(),annot=True)
sb.heatmap(iris.corr(),annot=True)
sb.heatmap(data.corr(),annot=True)
sb.heatmap(usa.corr(),annot=True, vmin=0.5,vmax=0.7)
sb.heatmap(usa.corr(),annot=True,cmap='YlGnBu')
sb.heatmap(usa.corr(),annot=True,linewidths=3,linecolor='red')

knn kloop pickle

import pandas as pd

iris = pd.read_csv('Iris.csv')
iris.drop('Id',inplace=True,axis=1) 

​

#Create arrays x,y
x=iris.iloc[:,:-1].values
y=iris.iloc[:,-1].values

​

#Label Encoding
from sklearn.preprocessing import LabelEncoder
laben = LabelEncoder()
y = laben.fit_transform(y)

​

#Split the universal dataset x,y
from sklearn.model_selection import train_test_split as tts
x_train,x_test,y_train,y_test=tts(x,y,test_size=0.25,random_state=42)

#Library: sklearn
#Module : neighbors


#Class  : KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier as knn
#knnmodel=knn(n_neighbors=5)

#train x_train,y_train
#knnmodel.fit(x_train,y_train)

#test x_test,y_test
#accuracy = knnmodel.score(x_test,y_test)
#predicted = knnmodel.predict(x_test)

​

#Different k value, different accuracy
k_range = range(1,96)
k_score = []
k_best  = 1
acc_best= 0

for k in k_range:
    knnmodel=knn(n_neighbors=k)
    knnmodel.fit(x_train,y_train)
    k_acc = knnmodel.score(x_test,y_test)
    k_score.append(k_acc)
    
    if(acc_best<k_acc):
        acc_best = k_acc
        k_best   = k

​

from matplotlib import pyplot as plt

plt.figure(figsize=(20,8))
plt.plot(k_range,k_score)
plt.xlabel('K values')
plt.ylabel('Accuracy')
plt.title('K vs Accuracy')

plt.show()


##Saving model
#import pickle
#file = open('myknnmodel.sav','wb') #pkl
#dump(modelname,filename)
#pickle.dump(knnmodel,file)

##Loading model
#model = open('myknnmodel.sav','rb')
#loaded_model = pickle.load(model)
#loaded_model.predict(x_test)

Data Imputation and One-hot Encoding

SimpleImputer is a scikit-learn class which is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values with a specified placeholder.
It is implemented by the use of the SimpleImputer() method which takes the following arguments :

missing_values : The missing_values placeholder which has to be imputed. By default is NaN


stategy : The data which will replace the NaN values from the dataset. The strategy argument can take the values ‘mean'(default), ‘median’, ‘most_frequent’ and ‘constant’.


fill_value : The constant value to be given to the NaN data using the constant strategy.

​

One hot encoding is the most widespread approach, and it works very well unless your categorical variable takes on a large number of values (i.e. you generally won't it for variables taking more than 15 different values. It'd be a poor choice in some cases with fewer values, though that varies.)

0fab3e4f7e9e7d3f199c49f10308ac05.gif

Special Thanks to Sobin Sunny for guiding me all the way through my internship journey and also thanks to Dlithe for giving such a wonderfull opportunity to learn real time technology ....

CONTACT ME

Joyston Menezes

computer science engineer

​

Phone:

+91 9480966920

​

Email:

joystonmj7@gmail.com

​

  • Black LinkedIn Icon
  • Black Facebook Icon
  • Black Twitter Icon
  • Black Instagram Icon

Thanks for submitting!

© 2020 All Right reserved developed by Joyston Menezes

bottom of page