Essential Python Libraries for Data Science - Part 3
Written on
Introduction to Python Libraries
This third installment of the series delves into the pivotal Python libraries essential for data science. If you haven't yet explored Parts 1 and 2, I highly encourage you to do so for a comprehensive understanding.
What Are Libraries and Their Importance?
As highlighted in Part 1, Python was designed with simplicity in mind, allowing users to enhance its functionality through additional modules. Similar to how a library expands your knowledge, these external modules broaden Python's capabilities. A library encompasses a collection of pre-written functions that researchers and programmers have developed, enabling users to leverage these functions in their projects without needing to write all the code from scratch.
For instance, if you want to compute the sine of 2, you could use Bhaskara I's sine approximation formula, but this would require you to create the full function yourself:
def sin_x(x):
x = ((16*x)*(3.1415926535-x)) / ((5*(3.1415926535**2))-(4*x*(3.1415926535-x)))
print(x)
sin_x(2)
[OUT] 0.9083851761902779
Thankfully, Python offers a library that simplifies this process:
import numpy as np
np.sin(2)
[OUT] 0.9092974268256817
The first method is more prone to errors and doesn't yield accurate results beyond the third decimal place, even with a Pi value rounded to five decimal places. The second method, however, is quicker, simpler, and provides precise results.
Examining the second option, we see:
import numpy as np
Here, 'Numpy' is the library's name, and 'np' serves as a shorthand for invoking functions from this library. The function is called using np.sin(), where sin is the desired function, followed by parentheses containing the value to be calculated.
Checking Installed Libraries
To verify which modules are currently installed in Python, simply type the following command in the VSCode editor:
pip list
This command will display all installed modules. To check for a specific module, you can use:
pip list | grep <module_name>
If you need to install a new module, use:
pip install <module_name>
To upgrade an existing module, type:
pip install --upgrade <module_name>
Here’s a quick demonstration of these commands. First, let’s check all installed modules:
Next, let’s search specifically for the NumPy package:
It will show that NumPy is installed, and its version is 1.22.4. You can also check for available upgrades:
After performing the upgrade, the installed version of NumPy is now 1.23.4.
Key Libraries for Data Science and Machine Learning
The following sections will summarize the crucial libraries in data science and machine learning. While there are other libraries you may need for your projects, mastering these core libraries is essential.
Pandas
Pandas is a powerful library used for data manipulation and analysis. It allows users to import data into Python's working environment, typically creating data frames (tabular data). The library supports various data structures beyond just tabular forms, enabling users to find and manage missing data, join and pivot tables, adjust specific columns and values, and filter data among other capabilities.
NumPy
NumPy provides support for an extensive array of mathematical functions and facilitates the handling of multi-dimensional arrays. It performs the heavy lifting for linear algebra computations.
Scikit-Learn
Scikit-Learn includes a variety of classification, regression, and clustering algorithms essential for machine learning projects. It integrates seamlessly with NumPy and Pandas to utilize data efficiently.
Matplotlib
Matplotlib is designed to replicate Matlab's capabilities within Python, allowing for mathematical computation and visualization. Its ongoing development has enhanced its ability to plot various graphs across different contexts.
Seaborn
TensorFlow and Keras
Developed by Google, TensorFlow is the premier library for working with deep learning neural networks. It is particularly useful for projects involving image recognition. Keras serves as a simplified version of TensorFlow, providing fundamental components for building deep neural networks or enabling the use of pre-trained models through transfer learning.
To see these libraries in action, feel free to explore my other articles featuring practical examples of implementing various machine learning algorithms.
In this video, titled "Workshop Part 1 | Introduction to Python for Aspiring Data Scientists," you'll get a foundational understanding of Python tailored for data science enthusiasts.
The video "Introduction to PYTHON (PART 3)" continues to build on your Python knowledge, focusing on essential libraries for data science.
Conclusion
Thank you for reading! If you enjoyed this article, please consider subscribing to receive updates on my future publications. If you're interested in diving deeper into the topic, check out my book "Data-Driven Decisions: A Practical Introduction to Machine Learning," which provides all the insights you need to embark on your machine learning journey. It's an affordable investment—less than the price of a coffee!
Thank you once again!