This is a CLI program that builds a n-gram language model on a character level for both users from their discord DM's.
C++20 and Tyrrrz/DiscordChatExporter are required.
- Clone this repository
git clone git@github.com:kiletic/ngram-discord.git - Enter the repository folder and export your discord DM's into a
data.txtfile using DiscordChatExporter CLI tool with the following command:
dotnet DiscordChatExporter.Cli.dll export --token "INSERT_YOUR_TOKEN_HERE" -c INSERT_CHANNEL_ID -f "PlainText" -o data.txt
Note: Make sure you enter your discord token into
INSERT_YOUR_TOKEN_HEREand the channel id intoINSERT_CHANNEL_ID. How to get the token and the channel_id?
- Compile
main.cppwith:c++ -std=c++20 -O3 main.cpp -o main - Run the program:
./main INSERT_USER1_NAME INSERT_USER2_NAME
Note: Inserted names should be actual discord usernames as they appear in
data.txt, and not display names. It doesn't matter in which order you write them in.
It parses the data.txt file for messages from both users, optionally skipping messages that contain embeddings or attachments (this can be changed by setting filter_attachments and filter_embeds to false when calling the filter_data function). Then, it builds a n-gram language model on a character level from the messages (to read more about the math check wiki). Finally, to use the model just call the generate_sentence function. Feel free to play with the value of n to get different results.