Python机器学习(1)-NumPy的基本操作

numpy是比较python供给的原生数组，在类型上加了约束，即，只能存储相同类型的数据。
同时，numpy也供给了大量的数学函数库供咱们运用。

引入numpy

安装numpy

$ pip3 install numpy

引入numpy并命别号

import numpy as np

一维数组

创立一个一维数组，并经过for-range的办法为每个数组元素进行赋值

nparr = np.array([i for i in range(1000)])

检查某个索引下元素的值，修改某个索引下元素的值

print(nparr[3])
# out: 3
nparr[4] = 1234
print(nparr[4])
# out: 1234

打印数组类型
经过dtype能够看出元素的类型为int64

print(nparr.dtype)
# out: int64

修改某个元素的值为其他类型，并打印

nparr[5] = 3.14
print(nparr[5])
# out: 3

能够看出，给下标为5的数组地址赋值3.14后，打印的结果为3，这是由于数组在初始化的时分就指定了类型为int64，对于浮点数，会进行整形转化。

其他创立数组的办法

zeros

创立一个长度为10，数组中每个元素的值为0的数组

nparr = np.zeros(10)
# out: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

创立一个长度为10，并且类型为int64的数组

nparr = np.zeros(10, dtype = int)
# out: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

创立一个二维数组，长度和宽度都为10的数组

nparrs = np.zeros((10,10),dtype = int)
# out: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

ones

创立一个初始值为1，长度为10的数组

nparr = np.ones(10)
#out: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

其他办法同zeros

full

创立一个指定某个值的数组

创立一个初始值元素为666，长度和宽度为10 的数组

nparr = np.full((10,10),666)
#out: 
array([[666, 666, 666, 666, 666, 666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666, 666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666, 666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666, 666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666, 666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666, 666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666, 666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666, 666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666, 666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666, 666, 666, 666, 666, 666]])

arange

创立一个数组，元素从0开端，到10结束，每个元素之间的步长为2

nparr = np.arange(0,10,2)
# out: array([0, 2, 4, 6, 8])

【注意】

假如咱们不填写起始值，那么起始值从0开端
这里的步长能够是浮点型

linspace

创立一个数组，数组中的元素的值是从[0~20]中等长的截取10个点。

nparr = np.linspace(0,20,10)
#out:
array([ 0.        ,  2.22222222,  4.44444444,  6.66666667,  8.88888889,
       11.11111111, 13.33333333, 15.55555556, 17.77777778, 20.        ])

random

生成一个从 [0~10) 的随机数

npval = np.random.randint(0,10)
# out: 9

生成一个长度为10的数组，数组中每个元素的值为[0~10)

nparr = np.random.randint(0,10,10)
# out: array([7, 8, 3, 4, 2, 3, 7, 3, 1, 9])

生成一个长度为10，宽度为8，的数组，数组中的每个元素的值为[0,10)

nparr = np.random.randint(0,10,size = (10,8))
out: 
array([[0, 1, 3, 9, 8, 9, 0, 7],
       [3, 3, 6, 5, 4, 7, 7, 0],
       [5, 2, 2, 3, 0, 6, 3, 1],
       [9, 5, 0, 8, 6, 0, 2, 4],
       [0, 8, 1, 0, 2, 4, 7, 4],
       [3, 1, 6, 7, 8, 9, 5, 3],
       [0, 1, 2, 5, 4, 5, 6, 8],
       [2, 1, 1, 5, 2, 0, 7, 0],
       [7, 9, 4, 5, 1, 7, 3, 1],
       [6, 4, 2, 4, 2, 7, 5, 9]])

指定随机数种子为123

np.random.seed(123)

random.random

随机生成一个0~1之间的浮点数

val = np.random.random()
#out: 0.6872533718021444

随机生成一个长度为10的数组，值为0~1之间的浮点数

nparr = np.random.random(10)
#out: array([0.18574062, 0.04076963, 0.80997571, 0.78492276, 0.69390878,
       0.57857778, 0.63972799, 0.92268458, 0.58265581, 0.03772895])

随机生成一个长度为10，宽度为10，值为0~1之间的浮点数

nparr = np.random.random(size=(10,10))
#out:
array([[0.21364282, 0.42380251, 0.99269981, 0.41934453, 0.45071781,
        0.63802807, 0.44192544, 0.19999932, 0.93410079, 0.56824992],
       [0.54934446, 0.97603901, 0.55784307, 0.18827465, 0.83616865,
        0.73483034, 0.1440271 , 0.51881543, 0.5649083 , 0.01743271],
       [0.58115857, 0.2460587 , 0.54160124, 0.1413303 , 0.72601038,
        0.90688538, 0.64774493, 0.38698338, 0.55454244, 0.27996411],
       [0.18016705, 0.89726895, 0.6051986 , 0.81681487, 0.36486389,
        0.02468537, 0.80694041, 0.42597038, 0.88808368, 0.02143695],
       [0.95200537, 0.27743176, 0.24252129, 0.48588503, 0.17302874,
        0.59502977, 0.50362905, 0.0977509 , 0.79654415, 0.75509455],
       [0.69694375, 0.06531238, 0.46859955, 0.82195428, 0.07793506,
        0.27193549, 0.10292629, 0.24884253, 0.51895447, 0.26217372],
       [0.56977346, 0.50832221, 0.79465159, 0.17183169, 0.10442652,
        0.05789666, 0.74295273, 0.14149461, 0.30551489, 0.35223257],
       [0.87217489, 0.74508258, 0.86196495, 0.00587003, 0.75086807,
        0.99587643, 0.71878467, 0.86090835, 0.84246279, 0.11617587],
       [0.20790301, 0.23997847, 0.04918334, 0.45699179, 0.57632283,
        0.88276845, 0.49710772, 0.04093441, 0.6514072 , 0.69784129],
       [0.20484758, 0.47067569, 0.38522   , 0.46417554, 0.94274731,
        0.32808448, 0.17514324, 0.55290533, 0.11091203, 0.36243297]])

生成一个符合正态分布的随机数

val = np.random.normal()
#out: 1.5218875894536879

兼并操作

兼并一维数组，

npArr1 = np.array([1,2,3])
npArr2 = np.array([3,2,1])
npArr3 = np.concatenate([npArr1,npArr2])
#out: array([1, 2, 3, 3, 2, 1])

兼并二维数组

concatenate

np1 = np.array([
    [1,2,3],
    [4,5,6]
])
np2 = np.array([
    [1,2,3],
    [4,5,6]
])
np.concatenate([np1,np2])
#out: array([[1, 2, 3], [4, 5, 6], [1, 2, 3], [4, 5, 6]])

concatenate默认依照X轴进行拼接，假如想依照Y轴进行拼接，那么需求加上axis参数指定依照的维度。

np.concatenate([np1,np2],axis = 1)
# out:array([[1, 2, 3, 1, 2, 3],[4, 5, 6, 4, 5, 6]])

一维数组，二维数组拼接，concatenate只能兼并同维度的数组，所以在兼并之前需求经过调用(reshape)将一维数组进行升维.

A = np.array([1,2,3])
B = np.array([[4,5,6],[7,8,9]])
C = np.concatenate([A.reshape(1,-1),B])
# out: array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])

vstack

笔直方向兼并数组

上面经过升维的办法将一维数组和二维数组进行兼并，除了这种办法，还能够运用vstack办法将不同维度数组进行兼并。v代表的是笔直方向。

D = np.vstack([A,B])
# out: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

hstack

水平方向兼并数组
跟vstack功用相同，都是能够兼并不同纬度的数组，但是hstack是水平方向的。

a = np.full((2,2),100)
b = np.full((2,2),50)
c = np.hstack([a,b])
# out:
array([[100, 100,  50,  50],
       [100, 100,  50,  50]])

切割操作

split

假设有一个数组，长度为10，切割点为下标3和下标7，将数组进行切割

data = np.arange(10)
x,y,z = np.split(data,[3,7])
#out: [array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])]

切割二维数组

data = np.arange(16).reshape(4,4)
a, b = np.split(data,[2])
#out:
[array([[0, 1, 2, 3],
        [4, 5, 6, 7]]),
 array([[ 8,  9, 10, 11],
        [12, 13, 14, 15]])]

split默认是依照水平方向将二维数组进行切割，咱们相同能够指定axis参数，完成笔直方向将数组进行切割。

a,b = np.split(data,[2],axis=1)
#out: 
[array([[ 0,  1],
        [ 4,  5],
        [ 8,  9],
        [12, 13]]),
 array([[ 2,  3],
        [ 6,  7],
        [10, 11],
        [14, 15]])]

vsplit

numpy相同供给了专门进行水平方向切割数组的办法vsplit

a,b = np.vsplit(data,[2])
#out:
[array([[0, 1, 2, 3],
        [4, 5, 6, 7]]),
 array([[ 8,  9, 10, 11],
        [12, 13, 14, 15]])]

hsplit

相同的供给了笔直方向的切割数组的办法hsplit

a,b = np.hsplit(data,[2])
#out:
[array([[ 0,  1],
        [ 4,  5],
        [ 8,  9],
        [12, 13]]),
 array([[ 2,  3],
        [ 6,  7],
        [10, 11],
        [14, 15]])]

运算操作

数组长度为10，元素为[0,1,2,3,4,5,6,7,8,9]每个元素乘以2

完成办法1，遍历每个元素，每个元素乘以2

L = np.arange(10)
l2 = np.array([n*2 for n in L])
#out array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

universal

完成办法2，直接乘

L = np.arange(10)
l2 = L * 2
#out array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

上面的两个示例比原生python的list功率要快许多

上面的办法称为universal，除了支撑乘法，还支撑许多数学运算。

加法&减法

A = np.arange(1,16).reshape((3,5))
# out: 
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])
A = A + 1
# out:
array([[ 2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16]])
A = A - 1
# out:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

除法

浮点数除法

B = A / 2
# out:
array([[0.5, 1. , 1.5, 2. , 2.5],
       [3. , 3.5, 4. , 4.5, 5. ],
       [5.5, 6. , 6.5, 7. , 7.5]])

整数除法

C = A // 2
# out:
array([[0, 1, 1, 2, 2],
       [3, 3, 4, 4, 5],
       [5, 6, 6, 7, 7]])

幂次运算

A = np.arange(1,16).reshape((3,5))
D = A ** 2
# out:
array([[  1,   4,   9,  16,  25],
       [ 36,  49,  64,  81, 100],
       [121, 144, 169, 196, 225]])

求余运算

A = np.arange(1,16).reshape((3,5))
E = A % 2
#out:
array([[1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1]])

除了上面的运算，numpy还供给了其他数学运算，如：np.abs(X),np.sin(X),np.cos(X),np.tan(X)等

矩阵运算

A = np.full([2,2],10)
B = np.full([2,2],10)
A + B
# out:
array([[20, 20],
       [20, 20]])
A - B
# out:
array([[0, 0],
       [0, 0]])
A * B
# out:
array([[100, 100],
       [100, 100]])
A / B
# out:
array([[1., 1.],
       [1., 1.]])

注意：这里的矩阵运算是矩阵中对应每个值进行加减乘除，并不是标准的矩阵运算

标准的矩阵运算

矩阵相乘

A = np.arange(4).reshape((2,2))
B = np.full([2,2],10)
C = A.dot(B)
#out:
array([[10, 10],
       [50, 50]])

矩阵的转置

D = A.T
#out:
array([[0, 2],
       [1, 3]])

向量和矩阵的加法

A = np.arange(2)
B = np.full([2,2],10)
A + B
#out:
array([[10, 11],
       [10, 11]])

矩阵的逆

A = np.arange(4).reshape((2,2))
B = np.linalg.inv(A)
# out:
array([[-1.5,  0.5],
       [ 1. ,  0. ]])

聚合操作

随机生成一个长度为100的一维数组

L = np.random.random(100)
#out：
array([0.90328671, 0.28120083, 0.07967393, 0.52537964, 0.96879494,
       0.91746851, 0.49402103, 0.68123372, 0.77433393, 0.00954455,
       0.22250466, 0.81700131, 0.14756778, 0.23124508, 0.92174384,
       0.87678925, 0.40548807, 0.2412917 , 0.24308536, 0.44500967,
       0.93762453, 0.08295336, 0.00968223, 0.88366331, 0.60392585,
       0.95210809, 0.04706185, 0.39168825, 0.56027198, 0.72620413,
       0.40802909, 0.00980734, 0.26008968, 0.5470524 , 0.38372671,
       0.5497413 , 0.35348543, 0.17006553, 0.19810973, 0.64671555,
       0.46616224, 0.03920279, 0.25192456, 0.19304411, 0.5681688 ,
       0.29330234, 0.30441446, 0.85510032, 0.23496952, 0.144754  ,
       0.11603474, 0.15409606, 0.36609597, 0.48058366, 0.44932898,
       0.10986889, 0.96398158, 0.67367576, 0.03145002, 0.27000791,
       0.38325639, 0.74979293, 0.2216278 , 0.70898754, 0.24619732,
       0.3713945 , 0.14106551, 0.84603347, 0.71202302, 0.15757943,
       0.51301604, 0.49796778, 0.71035382, 0.74176836, 0.80031979,
       0.60780179, 0.25911781, 0.22624925, 0.41139973, 0.06187766,
       0.88562678, 0.65329357, 0.02886998, 0.1601868 , 0.25973595,
       0.85255463, 0.74649799, 0.3793212 , 0.07096633, 0.27025233,
       0.92927356, 0.43842597, 0.19533115, 0.46123102, 0.88844079,
       0.94521973, 0.06559245, 0.20671055, 0.17453499, 0.99184493])

经过sum求和

B = np.sum(L)
#out: 44.84657818165318

求最值操作

min = np.min(L)
max = np.max(L)

二维矩阵向量运算

结构一个4*4的矩阵

L = np.arange(16).reshape(4,-1)
#array([[ 0,  1,  2,  3],
#       [ 4,  5,  6,  7],
#       [ 8,  9, 10, 11],
#       [12, 13, 14, 15]])

矩阵加法

np.sum(L)
#out: 120

矩阵的每一列进行相加

np.sum(L,axis = 0)
out: array([24, 28, 32, 36])

矩阵的每一行进行相加

np.sum(L,axis = 1)
out: array([ 6, 22, 38, 54])

阶乘

np.prod(L)
# out: 0 由于L中有一个元素为0
np.prod(L + 1) # 对L中每个元素+1然后进行阶乘
#out: 20922789888000

均值

np.mean(L)
#out: 7.5

平均数

np.median(L)
#out: 7.5

方差

np.var(L)
#out: 21.25

标准差

np.std(L)
#out: 4.6097722286464435

索引

生成一个长度为1000000长度的数组，遵守均值为0，方差为1的数组

x = np.random.normal(0,1,size = 1000000)

该数组中最小值为

np.min(x)
#out: -4.672634384887452

这个最小值在数组中的方位能够运用argmin获取

np.argmin(x)
#out：967231
x[967231]
#out: -4.672634384887452

相同的，咱们能够运用argmax获取该数组中最大值的方位

np.argmax(x)
#out: 536426

排序和运用数组

创立一个数组，长度为16，值为0~15

np.arange(0,16)
#out: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

经过random.shuffle随机打乱该数组

np.random.shuffle(Y)
#out: array([ 6,  8, 11,  9,  5,  7,  4,  2, 15,  0, 13,  1, 14, 12, 10,  3])

经过调用np.sort()或许调用x.sort对数组进行排序

np.sort(X)
# 或
X.sort()

二维矩阵排序

# 声明一个矩阵
X = np.random.randint(10,size = (4,4))
#out:
array([[8, 8, 3, 1],
       [5, 1, 9, 3],
       [3, 7, 2, 5],
       [9, 1, 2, 3]])

经过sort对二维矩阵进行排序

array([[1, 3, 8, 8],
       [1, 3, 5, 9],
       [2, 3, 5, 7],
       [1, 2, 3, 9]])

能够看出，sort办法，默认对每一行进行排序，咱们能够经过axis参数指定对每一列进行排序

np.sort(x,axis = 0)
#out:
array([[3, 1, 2, 1],
       [5, 1, 2, 3],
       [8, 7, 3, 3],
       [9, 8, 9, 5]])

获取排序后的元素的坐标

# 打乱Y数组
np.random.shuffle(Y)
# out：
array([ 5,  6, 11,  2,  3,  9,  1,  7,  4, 14, 13,  8, 15, 10, 12,  0])

经过np.argsort能够获取排序后，每个元素的坐标

np.argsort(Y)
# out:
array([15,  6,  3,  4,  8,  0,  1,  7, 11,  5, 13,  2, 14, 10,  9, 12])

partition

快排算法中，原理是，找到一个基准值，经过比较和交换，终究基准值左边的元素都比基准值小，基准值右边的元素都比基准值大。numpy相同供给了partition办法。
1.eg:3为基准值

np.partition(Y,3)
#out: array([ 2,  1,  0,  3,  4,  5,  9,  6, 11, 14, 13,  8, 15, 10, 12,  7])

相同的供给了argpartition回来元素坐标的办法

np.argpartition(Y,3)
#out: array([ 3,  6, 15,  4,  8,  0,  5,  1,  2,  9, 10, 11, 12, 13, 14,  7])

fancy Indexing

拜访数组下标为3,5,8的元素

X = np.arange(0,16)
idx = [3, 5, 8]
X[idx]
#out: [3, 5, 8]

这种办法不仅支撑一维数组索引，并且支撑二维数组索引

idxs = np.array([
    [0,2],
    [1,3]
])
X[idxs]
#out:
array([[0, 2],
       [1, 3]])

二维数组索引

Y = np.arange(0,16).reshape(4,-1)
#out
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]])
# 定义想要讨取的行和列
row = np.array([0,1,2])
col = np.array([1,2,3])
Y1 = Y[row,col]
#out: array([ 1,  6, 11])
# 获取第 0 行，col列的元素
Y[0,col]
#out: array([1, 2, 3])
# 获取第2行之后，第col列的元素
Y[2:,col]
#out 
array([[ 9, 10, 11],
       [13, 14, 15]])

np.array的比较

x = np.arange(16)
#out: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
y = x < 3
# out
array([ True,  True,  True, False, False, False, False, False, False,
       False, False, False, False, False, False, False])

x < 3生成了一个新的数组，比x<3的数字是true，不然回来false

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。