2022-11-16

Basic introduction of Tensor in PyTorch

Table of content

1. List vs Tensor
2. NumPy vs Tensor
3. Data type in PyTorch

Tensor is the basic data structure in PyTorch which is not only support the List of Python, but also compatible with NumPy. Additionally, it is also good at parallel computing and can work with GPUs which boosts the calculation of matrix. In this post, I’ll introduce some basic usage of Tensor with PyTorch.

List vs Tensor

List to Tensor:

l1 = [1,2,3]
t1 = torch.tensor(l1)
print(t1)

# output
tensor([1,2,3])

Tensor to List, we can use tolist() function with a tensor instance:

1	t1.tolist()

NumPy vs Tensor

Two ways can be used to construct tensor instance, one is torch.Tensor(data), another one is torch.from_numpy(data),

import torch
import numpy as np

l1 = [1,2,3]
nd = np.array(l1)

t1 = torch.Tensor(nd)
print(t1)

# output
 tensor([[1., 2., 3.],
        [4., 5., 6.]])

t2 = torch.from_numpy(nd)
print(t2)

# output
 tensor([[1, 2, 3],
        [4, 5, 6]])

Even though both the two approaches can create tensor instance, there exists a difference in their results which return a different data type in each method. The first method returns the data as a float32 value, while the other one returns a int64 type which is the same as the input data.

To figure it out, storage() is used to check the data type of the results:

print(t1.storage())
# output
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]

print(t2.storage())
# output
 1
 2
 3
 4
 5
 6
[torch.storage.TypedStorage(dtype=torch.int64, device=cpu) of size 6]

From the output, the first method returns the data as a float32 value, while the other one returns a int64 type which is the same as the input data. So we can find that in ``tensor.Torch(), the data will be converted to float32, from_numpy()` just copies the value from the numpy.

Tensor to NumPy:

print(t2.numpy())

# output
[[1 2 3]
 [4 5 6]]

Data type in PyTorch

First, let’s see the data type of a number in python:

a = 1
print(type(a))

# output
<class 'int'>

In Python, numbers are represented as an object. In the above code, a is an instance of int class. The object can be allocated from the memory without continuous storage. This object usually needs more storage and is not efficient when dealling with huge data. Much more time is used to allocate and assign new object.

Overall, Python list is certainly not suitable for efficient numerical calculation, while the third libraries such as NumPy or PyTorch are good replacement. The Tensor in PyTorch supports the common data types and we can specify the type by dtype, here are the common data types of the dtype:

torch.float16 or torch.half
torch.float32 or torch.float (default)
torch.float64 or torch.double
torch.int8
torch.uint8
torch.int16 or torch.short
torch.int32 or torch.int
torch.int64 or torch.long

The higher accuracy of the data, the much more storage and time will be consumed. More over, the half float is only suitable for GPUs which can gain similar results with less time and storage.

Book: Deep Learning with PyTorch
https://mp.weixin.qq.com/s/sQEM2Scpn7mannDDAHq2Dw