Join our daily and weekly newsletter for the latest updates and exclusive content on the top AI coverage. Learn more
Midjourney is best known as one of the leading AI image generators – with nearly 20 million users on its discord channel, According to third-party trackersAnd maybe more than the above on its website – but its ambitions are starting to expand.
Following News in the late Tag -init 2024 As it is developing its own computing and AI hardware, this week's company has released a new research role with machine study experts at New York University (NYU) in training large language models (LLMs) such as Meta's Open Source Llama and Mistral's eponymous source models to write more creatively.
The cooperation, documented in a New paper of research Published in the AI Code Community Hugging Face, introducing two new techniques – Diversified Direct Preference Optimization (DDPO) and Diversified ODDS Ratio Preference Optimization (DORPO) – designed to expand the range of possible outputs while maintaining coherence and ability to read.
For a company better known for disseminating the AI image that generates models, the new midjourney approach to the re-creation of the LLMs based in the text shows that it does not limit its ambitions to the visuals, and that, a picture may not really cost a thousand words.
Can a midjourney-native LLM or fine tone-tone version of an existing LLM be on cards from small, bootstrapped startups? I reached out to the founder of midjourney David Holz but I have never heard.
Regardless of the first-party midjourney LLM offer, the implications of new research have exceeded academic exercises and can be used to help fuel a new LLM training wave to Enterprise AI Enterprise teams, product developers, and content creators seeking to improve AI-generated text.
It also shows that despite recent interest and investment in AI model providers in new multimodal models and reasoning languages, there is still a lot of juice left to oppress, cognitively and smart performance, from classic transformers-based transformers, LLM focused LLMs.
The problem: AI-generated writing crumbles around homogeneous outputs
With domains such as fact-based Q&A or coding help, LLMs are expected to come up with a best response.
However, creative writing is naturally open, meaning there are many valid responses to a single prompt.
For an instance provided by Midjourney researchers, given a preference such “Write a story about a dog on the moon”LLM can explore many different paths such as:
- Astronaut's pet pet was accidentally left after a lunar mission.
- A dog who finds itself in a futuristic colony colony of canine.
- A stranded dog making friends with an alien species.
Despite the scope of these possibilities, LLMs focused on LLMs often meet with similar storylines and themes. This happens because:
- Post-training techniques prioritize user preferences in originality, boosting popular but repeated responses.
- Teaching tunes often ease the difference -differently, making the models “safe” responses to unique ones.
- Existing techniques that promote diversity (such as temperature tuning) operate only at the time of understanding, rather than cooked in the model learning process.
This leads to homogenized storytelling, where ai-generated creative writing feels repeated and no surprise or depth.
The solution: Changing post-training methods to prioritize variety
To overcome these limitations, researchers introduced DDPO and Dorpo, two extensions of existing preference optimization methods. The main change in these methods is the use of deviation – a measure of how much response to others – to guide training.
Here's how it works:
- During training, the model was given a writing prompt and many possible responses.
- Each response was compared to others for both prompts, and a deviation mark was calculated.
- Rarely but high quality responses are heavier in practice, encouraging the model to learn from different examples.
By incorporating deviation into direct optimization of preference (DPO) and optimization of ratio (orpo) preferences, the model learns to produce high quality but more different responses.
This method ensures that Ai-generated stories do not link to a single unpredictable structure, but instead explore a wider set of characters, settings, and themes-like a human writer.
What did midjourney researchers do to achieve this
The study is involved in LLMS training in creative writing activities using a dataset from subreddit R/WritingRompts, a Reddit community where users post and respond to short stories.
Researchers used two base models for their training:
- Meta's Llama-3.1-8B (An 8-billion-parameter model from the llama series 3).
- MISTRAL-7B-V0.3 (A 7-billion-parameter model from Mistral Ai).
Then, they took these models through the following processes:
- Administers fine tuning (SFT): The models are first properly used using the lora (low rank adaptation) to properly adjust the parameters.
- Preference -Opposition:
- DPO and Orpo were used as grounds-The standard methods focus on improving the quality of the response based on user preference signals.
- DDPO and Dorpo were then appliedIntroducing deviation -based weighting to encourage more unique responses.
- Analysis:
- Automatic analysis: measured semantic and stylistic variations using embed-based techniques.
- Human Evaluation: Judges review whether outputs are different and engaged compared to GPT-4O and Claude 3.5.
Basic Training Searches:
- DDPO is significantly outperformed standard DPO In terms of difference -output variety while maintaining quality.
- Llama-3.1-8B along with DDPO has achieved the best balance of quality and difference -Iba, making responses that more varied than GPT-4O while maintaining unity.
- When the size of the dataset is reducedDDPO models still maintain a difference -even though they demand a certain number of different training samples to be completely effective.
Enterprise Implications: What do AI users mean to produce creative responses – such as marketing copywriting, corporate storytelling, and film/TV/video game scripting?
For AI teams in charge of LLM expansion, enhancing the difference -output variety while maintaining quality is a critical challenge. These findings have significant implications for organizations that rely on AI-generated content in applications such as:
- Talking AI and Chatbots (Ensures different and attractive responses).
- Marketing and storytelling tools (Avoiding repeated copies of AI-generated).
- Developing the game and design of the narrative (Creating different dialogues and branches of storylines).
For professionals responsible for fine-tunings and deployment of models in a business setting, this research provides:
- A new approach to LLM post-training that improves creativity without sacrificing quality.
- A practical alternative to varying time diversity (such as temperature adjustments) by incorporating diversity into the study process itself.
- The potential to develop more engaging AI applications, from Ai-Assisted writing tools to virtual assistants who can adapt their responses to dynamic.
For those who hold the orchestra and automation of the AI model, these research has developed -Highlight:
- The importance of tuning models in the training stage, reduces the need for post-processing adjustments to expansion.
- One way to introduce adaptive storytelling in AI-driven applications, ensuring diversity while maintaining high quality content.
- A method for making LLM outputs more human-like, which is essential for applications that require interactive storytelling, customer interaction, or content creation.
AI's future formed by creative projects looks bright
The success of the DDPO and Dorpo shows that LLMs of training with diversity-focused goals can result in significant improvements in creative writing. Some ideas include:
- Integration of Study Based on deviation from Enterprise Ai Enterprise Model To enhance the response variation on the customer-facing applications.
- Exploring how these methods apply to other tasks of developingsuch as ai-powered poetry, screenwriting, or game storytelling.
- Developing hybrid training techniques That balance variation and abilities that comply with instruction For AI assistants.
For those interested to apply these methods, researchers plan to make their code public here Repository of GitHub
Whether you are well tuning LLMs for business applications or optimizing large-scale AI orchestations, this study provides action insights into how models can be more dynamic, engaging, and responding to creative activities.
By adopting these techniques, AI teams can move across the strict, formulaic output – the development of AI systems that is not only intelligent but truly imaginative.