Python Convert Text to SubRip Subtitle File

If you only have a text, how to create a subrip subtitle file (.srt) using python? In this tutorial, we will introduce you how to do.

Subrip Subtitle File

Subrip subtitle file is a type of file with extension (.srt). The content of it may be:

1
00:00:00,000 --> 00:00:01,428
Homework is important

2
00:00:01,428 --> 00:00:04,400
because it develops core skills in young children
......

In order to create a .srt file, we have to answer these questions:

How to split a text to display in a video?
How to get the duration of each line?
How to convert text start time and end time to subrip subtitle file time format?

How to split a text to display in a video?

We should not display a long text in each video frame. To fix this problem,we can split a long text to short manually or use python code.

Here is an example:

How to get the duration of each line?

In order to get the duration of each text line, we have to use tts model. For example, we can use VITS model to convert text to speech and get the duration of each line.

Convert Text to Speech in Python Using VITS Model

After we have got the duration of each text line, we can get the start time and end time of each text line in the whole audio.

How to convert text start time and end time to subrip subtitle file time format

As to subrip subtitle file time format, it is:

00:00:01,428	–>	00:00:04,400
start time		end time

As we have got the start time and end time of each text line, we can convert seconds to subrip subtitle file time format. Here is the tutorial:

Python Convert Seconds to Days, Hours, Minutes and Seconds

Then, we can create a .srt file easily. Here is an example code:

subText = []
i = 0
start_time = 0
end_time = 0
line_info = ""
for text in data:
    # Tokenize inputs
    inputs = tokenizer(text)
    # Generate speech
    #print(inputs)
    outputs = model.run(None, {"text": inputs})
    wav = outputs[0]
    end_time = start_time + wav.shape[0]
    audio_data.append(wav)
    #
    i += 1
    info = str(i)
    start_info = sec2time(start_time)
    end_info =sec2time(end_time)
    start_time = end_time

    time_info = f"{start_info} --> {end_info}" # subrip subtitle file time format
    text_info = text

    line_info += info+"\n"+time_info+"\n"+text_info+"\n\n"

print(len(audio_data))
audio = np.concatenate(audio_data, axis = 0)

# Write to file
with open("sub-text.srt","w",encoding="utf-8") as f:
    f.writelines(line_info)
sf.write("sub-audio.wav", audio, 22050)