Machine Learning tools

Data Processing and analytics
- PostgreSQL is a powerful open-sourced object-relational database system based on SQL, a language designed for storing, manipulating, and retrieving data in databases.
- Hadoop is an open-source, highly scalable disk-based solution for storing and batch-processing massive data.
- Spark is a distributed, in-memory data processing framework for real-time big data processing. It supports DataFrames and SQL, and is faster and easier to use than Hadoop.
- Apache Kafka is a distributed streaming platform for building big data pipelines and running real-time analytics.
- Pandas is a popular Python library for exploring and wrangling data. Central to Pandas is the DataFrame, a tabular data structure for cleaning and transforming structured data.
- NumPy is a Python library that provides comprehensive mathematical functions, random number generators, and linear algebra routines for distributed and graphics processing unit, or GPU-based computing platforms.
Data visualisation
- Matplotliboffers an extensive foundational library for generating customizable plots and interactive visualizations.
- Seaborn is a library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
- ggplot2 is an open-source data visualization package in R. It allows you to build and add elements to your graphics in layers.
- Tableau is a business intelligence tool for interactive data visualization dashboards.
Machine learning
- NumPy provides foundational machine learning support with efficient numerical computations on large, multidimensional data arrays.
- Pandas is used for data analysis, visualization, cleaning, and preparing data for machine learning.
- SciPy, built on NumPy, is used for scientific computing and has modules for optimization, integration, linear regression, and more.
- Scikit-learn is used for building classical machine learning models, offering a full suite of classification, regression, clustering, and dimensionality reduction algorithms. It's built on NumPy, SciPy, and Matplotlib.
Deep learning
- TensorFlow is an open-source library for numerical computing and large-scale machine learning.
- Keras is an easy-to-use deep learning library for implementing neural networks.
- Theano is for efficiently defining, optimizing, and evaluating mathematical expressions involving arrays. PyTorch is an open-source library for deep learning applications and computer vision and NLP. It also allows experimentation to test ideas.
Computer vision
All of the deep learning tools can be tailored to computer vision applications.
- OpenCV, or Open Source Computer Vision Library, is a library for real-time computer vision applications such as object detection, image classification, and augmented reality.
- Scikit-Image, built on SciPy and compatible with Pandas, offers image processing algorithms such as filters, segmentation, feature extraction, and morphological operations.
- TorchVision, part of the PyTorch project, consists of popular datasets, image loading, pre-trained deep learning architectures, and common image transformations for computer vision.
NLP tools
- The Natural Language Toolkit, or NLTK, is a comprehensive library that offers text processing, tokenization, and stemming tools.
- TextBlob is a library for tasks like part-of-speech tagging, noun-phrase extraction, sentiment analysis, and translation.
- Stanza is an NLP library from the Stanford NLP Group with accurate pre-trained models for many NLP tasks, including part-of-speech tagging, named entity recognition, and dependency parsing.
Generative AI tools
- Hugging Face Transformers is a powerful library of transformer models for NLP tasks, such as text generation, language translation, and sentiment analysis.
- ChatGPT is a powerful language model used for text generation, building chatbots, and other NLP tasks.
- DALL-E is a tool from OpenAI for generating images from textual descriptions.
- PyTorch uses deep learning to create generative models, such as Generative Adversarial Networks, or GANs, and Transformers for text and image generation.