Python在数据科学、机器学习领域也是重要利器(yyds)。这是一篇Python环境攻略,告知你怎么配置Python环境,并装置相关的库,进行数据科学和机器学习研究。

在数据科学和机器学习领域,咱们能够运用Python的标准环境,也能够运用Anaconda。由于Anaconda不仅仅支持Python言语,还支持其它的数据科学东西,比方Matlab,R言语,Fortran言语。Anaconda同时也是一个东西库的分发渠道,能够从中下载和装置库。在Anaconda环境中,咱们能够运用 conda 指令进行库的装置。假如咱们不需求其它言语,咱们能够运用其简洁版别Miniconda。另外数据科学家更习气运用Jupyter进行研究,Jupyter是一个Web化的开发东西,能够单步交互式的履行称为Notebook的代码。JupyterLab则是下一代的Jupter。以上是咱们在开端之前,需求了解的根底概念, 总结下来便是下面一张表:

称号 描述
Anaconda/Miniconda 一种数据科学环境和开发渠道,能够理解为Python的PyPi源。
conda Anaconda的指令行东西,能够理解为Pip指令
Jupyter/JupterLab 一种Web化开发东西
Notebook 混合了代码,注释文档及履行成果的文件

jupyter的界面大概是这样:

Python虚拟环境指南2023版

咱们也能够直接体会jupyter.org/try-jupyter…

环境装置

能够运用下面指令装置miniconda:

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh | sh

装置完结miniconda后能够这样创立和运用虚拟环境:

# Best practice, use an environment rather than install in the base env
conda create -n my-env
conda activate my-env
# If you want to install from conda-forge
conda config --env --add channels conda-forge
# The actual install command
conda install numpy

这运用起来和Python的虚拟环境相似:

python3 -m venv .venv
source .venv/bin/activate
pip install numpy

当然,你能够在conda环境中直接运用pip:

(my-env) [game404@y ~]$ pip list
Package    Version
---------- -------
numpy      1.24.1
pip        22.3.1
setuptools 65.6.3
wheel      0.38.4

咱们能够运用下面两个指令之一装置jupyterlab:

conda install jupyterlab
或许
pip install jupyterlab
  • conda指令是从anaconda源装置;pip是用PyPi源装置,两个指令异曲同工,就看谁网速快

发动jupyter-lab

装置完jupyter-lab后,能够运用下面指令翻开它:

(my-env) [game404@y ~]$jupyter-lab
[I 2023-01-08 21:26:50.250 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2023-01-08 21:26:50.257 ServerApp] jupyterlab | extension was successfully linked.
[I 2023-01-08 21:26:50.262 ServerApp] nbclassic | extension was successfully linked.
...
[I 2023-01-08 21:26:50.842 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2023-01-08 21:26:50.847 ServerApp] No web browser found: could not locate runnable browser.
[C 2023-01-08 21:26:50.848 ServerApp]
    To access the server, open this file in a browser:
        file:///home/yuanzuxiang/.local/share/jupyter/runtime/jpserver-3540-open.html
    Or copy and paste one of these URLs:
        http://localhost:8889/lab?token=f5028b1978baa74512cec56cff7c4f9e2dbbc4592cdf5b69
     or http://127.0.0.1:8889/lab?token=f5028b1978baa74512cec56cff7c4f9e2dbbc4592cdf5b69
  • 留意这里的token,是权限拜访的token, 初次拜访首页需求运用

然后咱们通过浏览器拜访jupyter-lab,创立Notebook,直接测验python环境:

Python虚拟环境指南2023版

  • 赤色的Notebook Icon和引导界面共同
  • 运用顶部东西栏【->】履行代码
  • notebook主要是能够依照cell履行代码

装置常用库

装置好Python环境和Jupyter-lab东西后,接下来咱们开端装置常用库,主要触及下面7个库:

  • numpy The fundamental package for scientific computing with Python
  • pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
  • matplotlib Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.
  • seaborn is a Python data visualization library based on matplotlib.
  • scipy Fundamental algorithms for scientific computing in Python
  • statsmodels statistical models, hypothesis tests, and data exploration
  • sklearn Machine Learning in Python

这些库相互也有依靠关系,numpy是最根底的矩阵实现,pandas是最核心的数据表操作,seaborn又基于matplotlib,它们担任数据可视化,scipy和statsmodels提供一些统计方法,sklearn进行机器学习和线性回归。咱们能够依照这样的次序去装置:

conda install numpy
conda install pandas
conda install matplotlib
conda install seaborn
conda install scipy
conda install statsmodels
conda install scikit-learn

也能够直接运用pip指令装置:

pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn
pip install scipy
pip install statsmodels
pip install scikit-learn

一般咱们这样导入它们:

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn.objects as so
from scipy import stats
from sklearn import linear_model
import statsmodels.api as sm

jupyter-lab中也能够运用pip指令装置库:

Python虚拟环境指南2023版

  • 留意前面的 ! 是有必要的

参阅链接

  • www.anaconda.com/
  • jupyter.org/
  • numpy.org/
  • pandas.pydata.org/
  • matplotlib.org/stable/#
  • seaborn.pydata.org/
  • www.statsmodels.org/stable/inde…
  • docs.scipy.org/doc/scipy/i…
  • scikit-learn.org/stable/inde…