There’s been a wave of interest in NLP recently as generative capabilities take the world by storm - from writing essays to crunching out blog posts to synthesizing complex topics, folks are writing 10+x faster thanks to the breakthroughs and product launches.
Which is amazing (or scary amazing. All depends who you ask).
In any case, while I enjoy writing, I rarely write long-form in my day-to-day. Texting though, I text at least 100x more frequently and that data has compounded over the last 10 years / 3k days.
So, what if I could wield these advances and automatically respond to messages? At worst, it’d save me hours in the years to come. Best case? Text Connor becomes much more clever and interesting.
Win win.
1 weekend later, cue the V-MVP (VERY minimally viable product)...
Note: none of the below input texts were copied from the training; model responses are directly copy+pasted
First and foremost, the model does not seem like it will sink my marriage on the basics. It also seems to have learned my way of typing smiley faces to avoid loading emojis.
My Wife: Love you! See you soon
Model: Love you too : ]
On the comedic side, it’s also learned some of our laughable banter from back in the day; albeit, it's a bit too defensive…
Wife: Why you such a poop
Model: I’m not a poop. You’re a poop. I’m not a poop. You’re a poop. I’m not a poop. You’re a poop. (repeated 5 more times)
Testing on a recent conversation with a friend, I found the model’s response a bit too hyped. I attribute this to the training data which is heavily concentrated in some outdated mannerisms from college and could benefit from training on recent data.
Me: Sorry to hear…conciliatory prize we get to watch more of the playoffs together though : ]
Friend A: Lol ofc
Friend A: Who thunk we would see Lebron curry again NBD
Model: Lebron curry again?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!??!?!?!?!
Lastly, the model demonstrated an ability to go “Black Mirror” and hallucinated a conversation with the deceased. About as vanilla of a conversation as could be but stirred enough emotion that it clicked a bit the…surreal appeal of such a product.
Deceased Friend: when we hanging out?
Model as me: Sunday or Monday?
Model as Friend: sunday
Model as Friend: or monday
Model as Friend: or both
Model as Friend: haha
Model as me: Haha.
Model as me: I’m down for both.
Model as Friend: cool cool
Model as Friend: let’s do sunday
Model as me: Sounds good.
Base Model: 7bn LLaMA
To Do: Bump to larger model (e.g., 30b) to improve flow and logic reasoning
Training: LoRA + $10 of GPUs on Lambda Labs over a day
To Do: 2-3x the training time / epochs to improve realism
Data: Entirety of my Facebook Messenger history which is heavily concentrated over 10 years ago when I was a teenager / early college student. Given that, I will spare myself the embarrassment of sharing it publicly. Also, wouldn’t want anyone else able to clone me…
To Do: Would love to have included Apple message data but found the acquisition of such data to be high-friction after some investigation. Open to recommendations if others are aware of secure, simple methodologies of acquiring said data.
For the amount of time invested, this project turned out much better than anticipated even with the constrained performance of the smallest LLaMA models and data scoped to my early online days. From the perspective of the original, the clone not only demonstrates an ability to replicate my style but also my “content meaning.”
There are a few obvious kinks to work out (e.g., providing access to a calendar, real time data, etc) to boost the realism before such a product could be reasonably deployed to the masses.
However, that future is not far off. Arguably, quite near…