🥳 Earning Growth Points can Win an iPhone 16?
🔥 Gate Post Growth Points Summer Lucky Draw Round 1️⃣ 1️⃣ Is Live!
🎁Prize pool over $10,000! Win iPhone 16 Pro Max 512G, exclusive Gate merch, popular tokens & more!
Try your luck now 👉 https://www.gate.com/activities/pointprize?now_period=11
How to earn Growth Points fast?
1️⃣ Go to [Post], tap the icon next to your avatar to enter [Community Center]
2️⃣ Complete daily tasks like posting, commenting, liking, and chatting to earn points
New feature this round: “Fragment Exchange”! Collect fragments to redeem exclusive Gate merch!
100% chance t
Defeating the entire alpaca family, the new Meta AI self-alignment method requires very little manual labeling data
Original source: Qubit
Is it urgent to manually label data?
Mata's new method builds a high-quality instruction following (instruction following) language model with only a small amount of seed data.
In other words, large language models require a large amount of human-labeled instruction data for fine-tuning, but now the model can automatically infer instructions from unlabeled text in web corpora.
Then use the instruction data generated by yourself for training, which is comparable to self-produced and sold.
And the model trained by this method surpasses the open source alpaca and its series of derivative models on the Alpaca benchmark test.
LeCun tweeted that the study was sensational in terms of model self-alignment:
Alpaca: I used data to train a whale
This scalable new method is called Instruction Back Translation, and Mata named the model trained by this method-Humpback (humpback whale, also known as humpback whale).
(The researchers said that the name was given because of its relationship with the camel's back, and the larger size of the whale corresponds to a larger scale of the model)
The labeled examples and corpus sources are available, and the next step is the Self-augment stage.
The researchers fine-tuned the basic model LLaMa with the seed data to obtain the instruction prediction model. This instruction prediction model is then used to infer a candidate instruction for the unlabeled text. Then combine the candidate instruction and text (instruction-output pair) as a candidate enhanced training data, which is the Augmented Data A in the above figure.
However, it is not possible to use the data of A for direct training, because the quality of the unlabeled text itself is uneven, and the generated candidate instructions also have noise.
So the key Self-curate steps are needed, using the model to predict data quality and selecting high-quality samples for training.
In order to improve the quality of model instruction prediction, the researchers trained the model with candidate data iterative, and in iterative training, the data quality will get better and better.
In addition, when combining seed data and augmentation data to fine-tune the model, they also use different system hint tags to distinguish between these two data sources:
After two iterations, the final model is fresh out of the oven.
Merge two kinds of training data: 1+1>2
Let's take a look at the results of the researchers' analysis:
**
**###### △ Instruction diversity for seed data and enhanced data. The inner circle is the common root verb and the outer circle is the common noun that corresponds to it.
The figure above shows the instruction diversity with 8% seed data and 13% enhanced data statistics.
It can be seen intuitively that the enhanced data diversity is stronger in the long tail part, and the enhanced data complements the existing artificially labeled seed data, supplementing the types that do not appear in the seed data.
Second, the researchers compared three augmented datasets: Augmented data, all (no self-management),
**
**###### △ Use self-filtering to evaluate self-augmentation data of different data sizes and qualities. The y-axis represents the win rate with text-davinci-003 when fine-tuning LLaMa 7B with a given data size and quality.
(text-davinci-003, A GPT-3-based instruction following model fine-tuned on human-written instruction data, outputs, model responses, and human preferences using reinforcement learning)
Finally, let's take a look at the results on the Alpaca leaderboard. Humpback significantly outperforms other methods without relying on distilled data and closes the gap with proprietary models.
Non-distilled (Non-distilled), refers to a training model that does not rely on any external model as any form of supervision; Distilled (Distilled), refers to the introduction of a more powerful external model during the training process, such as using data distilled from an external model; Proprietary refers to models trained using proprietary data and techniques.
**
**###### △ Compared to text-davinci-003's winning rate
In comparison with open source models LIMA 65B, Guanaco 65B, Falcon-Instruct 40B and proprietary models davinci-003, Claude, Humpback's performance is also more in line with human preferences.
Since the text data used for training comes from a web corpus, the fine-tuned model may amplify the bias of the web data. Although compared to the base model, the fine-tuned model improves the accuracy of detecting bias. However, this does not mean that the problem will be completely solved.
Portal: paper link)
Reference link: [1] [2] [3]