Machine Learning in Python: top 10 best Libraries

Machine Learning in Python has become a popular language for machine learning due to its simplicity, readability, and versatility. There are many libraries available for implementing machine learning algorithms in Python. In this blog post, we will discuss the top 10 Python libraries for machine learning.

Contents hide

10.1 You may also like our Python Learning path under following Categories:

1. TensorFlow

TensorFlow is an open source machine learning framework developed by Google Brain Team. It is widely used for building and training deep learning models. TensorFlow offers a rich set of tools and libraries for machine learning, including Keras, TensorFlow-Datasets, and TensorFlow Lite.

Here’s an example of how to use TensorFlow to build a simple neural network for image classification:


import tensorflow as tf
from tensorflow import keras

# Load the dataset (e.g. MNIST)
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data
x_train = x_train / 255.0
x_test = x_test / 255.0

# Build the model
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10)
])

# Compile the model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test,  y_test, verbose=2)

print('\nTest accuracy:', test_acc)

2. PyTorch

PyTorch is another open source machine learning framework that provides a dynamic computational graph for building deep learning models. PyTorch is known for its ease-of-use and flexible architecture, making it a popular choice for researchers and developers alike.

Here’s an example of how to use PyTorch to build a simple convolutional neural network for image classification:

import torch
import torch.nn as nn
import torch.optim as optim

# Define the network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(nn.functional.relu(self.conv1(x)))
        x = self.pool(nn.functional.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Load the dataset (e.g. CIFAR-10)
train_loader, test_loader = ...

# Define the loss function, optimizer, and learning rate schedule
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

# Train the network
for epoch in range(10):
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    scheduler.step()
    print('Epoch %d, loss: %.3f' % (epoch + 1, running_loss / (i + 1)))

# Evaluate the network
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        inputs, labels = data
        outputs = net(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy: %.2f%%' % (100 * correct / total))

3. Scikit-learn

Scikit-learn is a popular machine learning library for Python that provides a wide range of tools for data preprocessing, feature extraction, model selection, and evaluation. Scikit-learn is especially useful for building and comparing different machine learning models, as it provides a standardized interface for training and testing models.

Here’s an example of how to use Scikit-learn to build a simple support vector machine for binary classification:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the dataset (e.g. iris)
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the model
clf = SVC(kernel='linear')

# Train the model
clf.fit(X_train, y_train)

# Test the model
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print('Accuracy:', accuracy)

4. Keras

Keras is a high-level neural network API for Python that provides a simple and intuitive interface for building and training deep learning models. Keras can be used with various backends, including TensorFlow, CNTK, and Theano.

Here’s an example of how to use Keras to build a simple convolutional neural network for image classification:


from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.datasets import cifar10
from keras.utils import to_categorical

# Load the dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Preprocess the data
x_train = x_train / 255.0
x_test = x_test / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# Build the model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    MaxPooling2D((2, 2)),
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, batch_size=32)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)

print('\nTest accuracy:', test_acc)

5. Pandas

Pandas is a Python library for data manipulation and analysis. Pandas provides powerful data structures for working with tabular data, such as data frames and series, and a wide range of tools for data cleaning, transformation, and aggregation. Pandas is often used for preparing data for machine learning.

Here’s an example of how to use Pandas to load and preprocess a dataset for regression:


import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the dataset
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Preprocess the data
X_train = pd.get_dummies(X_train, columns=['CHAS', 'RAD'])
X_test = pd.get_dummies(X_test, columns=['CHAS', 'RAD'])
X_train = X_train.fillna(X_train.mean())
X_test = X_test.fillna(X_train.mean())

# Build the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Test the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

print('Mean squared error:', mse)

6. NumPy

NumPy is a Python library for numerical computing, providing powerful tools for working with arrays and matrices. NumPy is often used for data preprocessing and feature selection in machine learning.

Here’s an example of how to use NumPy to preprocess a dataset for classification:


import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the dataset
digits = load_digits()
X = digits.data
y = digits.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Preprocess the data
X_train = X_train / 16.0 - 0.5
X_test = X_test / 16.0 - 0.5

# Build the model
model = KNeighborsClassifier()

# Train the model
model.fit(X_train, y_train)

# Test the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print('Accuracy:', accuracy)

7. Matplotlib

Matplotlib is a Python library for creating static, animated, and interactive visualizations in Python. Matplotlib provides a wide range of tools for data visualization, including line plots, scatter plots, histograms, bar plots, 3D plots, and more.

Here’s an example of how to use Matplotlib to visualize the decision boundary of a support vector machine:


import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.svm import SVC

# Generate some data
X, y = make_blobs(n_samples=50, centers=2, random_state=42)

# Build the model
clf = SVC(kernel='linear')

# Train the model
clf.fit(X, y)

# Plot the data points
plt.figure(figsize=(6, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)

# Plot the decision boundary
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx = np.linspace(xlim[0], xlim[1], 200)
yy = np.linspace(ylim[0], ylim[1], 200)
YY, XX = np.meshgrid(yy, xx)
Z = clf.decision_function(np.c_[XX.ravel(), YY.ravel()])
Z = Z.reshape(XX.shape)
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])

plt.show()

8. Seaborn

Seaborn is a Python library for creating statistical visualizations in Python. Seaborn provides a high-level interface for creating various types of plots, including scatter plots, line plots, bar plots, heat maps, and more.

Here’s an example of how to use Seaborn to visualize the relationship between two variables in a dataset:


import seaborn as sns
import pandas as pd

# Load the dataset
iris = sns.load_dataset('iris')

# Plot the data
sns.relplot(x='sepal_length', y='petal_length', hue='species', data=iris)

plt.show()

9. NLTK

NLTK (Natural Language Toolkit) is a Python library for working with human language data, including text data, speech data, and symbolic data. NLTK provides a wide range of tools for parsing, tokenizing, stemming, and tagging natural language data, making it an essential library for text mining and natural language processing.

Here’s an example of how to use NLTK to tokenize and stem a text document:


import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer

# Define a sample text document
document = "The quick brown fox jumped over the lazy dog"

# Tokenize the text document
tokens = word_tokenize(document)

# Stem the tokens
stemmer = PorterStemmer()
stems = [stemmer.stem(token) for token in tokens]

print('Tokens:', tokens)
print('Stems:', stems)

10. Gensim

Gensim is a Python library for topic modeling and document indexing, providing tools for analyzing large text corpora and extracting semantic meaning from them. Gensim is especially useful for natural language processing and machine learning tasks that involve large amounts of text data.

Here’s an example of how to use Gensim to perform topic modeling on a corpus of text documents:


import gensim
from gensim import corpora, models
from nltk.tokenize import word_tokenize

# Define a sample corpus of text documents
corpus = [
    'I love cats',
    'Dogs are great too',
    'Birds can fly',
    'Fish are underwater',
]

# Tokenize the documents
texts = [word_tokenize(document.lower()) for document in corpus]

# Create a dictionary of unique words in the corpus
dictionary = corpora.Dictionary(texts)

# Convert the documents to bag-of-words vectors
corpus_vectors = [dictionary.doc2bow(text) for text in texts]

# Build the topic model
lda_model = models.ldamodel.LdaModel(corpus=corpus_vectors,
                                     id2word=dictionary,
                                     num_topics=2)

# Print the topics and their top words
for topic in lda_model.show_topics(num_topics=2):
    print(topic)

In conclusion, these are the top 10 Python libraries that you can use for machine learning. Each library offers unique features and functionalities, making it suitable for different machine learning tasks. By mastering these libraries and their tools, you can become a proficient machine learning engineer and build powerful machine learning models.

Want to learn more about Python, checkout the Python Official Documentation for detail.

What's Hot

Best 5 Data Mining Chrome Extension

Designing Best Google SERP Scraping API in Python

Scrape Google search results: The Ethical way

Designing Best Google SERP Scraping API in Python

Scrape Google search results: The Ethical way

How to Scrape data from Website into Excel

Best 5 Data Mining Chrome Extension

Designing Best Google SERP Scraping API in Python

Scrape Google search results: The Ethical way

Most Popular

Best 5 Data Mining Chrome Extension

Build best Web Scraper with Python in 8 steps

Easy Trick for Solving Sudoku Puzzles in Python

Subscribe to Updates

What's Hot

Machine Learning in Python: top 10 best Libraries

1. TensorFlow

2. PyTorch

3. Scikit-learn

4. Keras

5. Pandas

6. NumPy

7. Matplotlib

8. Seaborn

9. NLTK

10. Gensim

You may also like our Python Learning path under following Categories:

Related Posts