To convert a list of strings to a tensor in PyTorch, follow these steps:
- Tokenize the strings.
- Convert the tokens to numerical values.
- Create a tensor from the numerical values.
Step 1: Tokenize the strings
- Character-level tokenization: Each character is treated as a token. E.g., “hello” -> [“h”, “e”, “l”, “l”, “o”]
- Word-level tokenization: Each word is treated as a token. E.g., “hello world” -> [“hello”, “world”]
- Other advanced tokenization methods, like subword tokenization, are used mainly in NLP models like BERT.
import torch
string = "Today is Mahavir Jayanti"
tokens = string.split()
Step 2: Convert the tokens to numerical values
Once tokenized, you must map each unique token to a unique integer. This often involves creating a vocabulary of all unique tokens and assigning each token an integer ID.
word_to_ids = {word: i for i, word in enumerate(tokens)}
numerical_values = [word_to_ids[word] for word in tokens]
Step 3: Create a tensor from the numerical values
After converting the tokens to integers, you can create a PyTorch tensor. If working with sequences of varying lengths (like sentences), you might need to pad the sequences to make them the same length.
tensor = torch.tensor(numerical_values)
print(tensor)
Here is the complete code.
import torch
string = "Today is Mahavir Jayanti"
tokens = string.split()
word_to_ids = {word: i for i, word in enumerate(tokens)}
numerical_values = [word_to_ids[word] for word in tokens]
tensor = torch.tensor(numerical_values)
print(tensor)
Output
tensor([0, 1, 2, 3])
That’s it!
Related posts
Convert a Torch Tensor to a PIL Image
Pandas DataFrame to a PyTorch Tensor

Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Machine Learning frameworks like PyTorch and Tensorflow is a testament to his versatility and commitment to the craft.