Huawei is the first to integrate large models into mobile phones! Xiaoyi + big model, smart assistant IQ +++

Source: Xinzhiyuan

Good guy, HUAWEI HarmonyOS Access Large Model, can the smart assistant play like this?

With just one Chinese command, Huawei Xiaoyi will write an English email:

You can use AI to make your own photos into different styles:

You can also speak a long list of instructions, let it create complex scenes by itself, and you can understand it in plain language:

This is the New Xiaoyi in HUAWEI HarmonyOS 4.

It is based on the large model of Huawei's Pangu L0 base, incorporating a large amount of scene data, fine-tuning the model, and finally refining a L1 layer dialogue model.

It can handle tasks such as text generation, knowledge search, data summary, intelligent arrangement, and fuzzy/complex intent understanding.

Moreover, various APP services can also be called to realize a system-level intelligent experience.

So, what can the new Huawei Xiaoyi do?

Smarter, more capable, more caring

Based on the ability of large models, Huawei Xiaoyi has mainly upgraded in three aspects this time:

  • Smart interaction
  • High productivity
  • Personalized service

Specific capability enhancements include more natural language dialogues, knowledge quizzes on gaming machines, search for life services, dialogue recognition screen content, generation of summary copywriting pictures, etc.

**First of all, the intelligent interaction upgrade makes the dialogue and interaction more natural and smooth. **

Huawei Xiaoyi can understand vernacular, understand fuzzy intentions and complex commands.

If you can't find the latest wallpaper setting function, and don't know the name of the function, you can directly ask:

How to change the wallpaper that can change in real time according to the weather?

Or a complex command with multiple requirements:

Find a high-rated seafood restaurant near Songshan Lake, preferably with a discounted set meal for four people.

Xiaoyi can also call the service to find a restaurant that meets the requirements.

At the same time, Xiaoyi also has multi-modal capabilities, can understand image content. In this way, the steps that users had to read and then manually operate can also be handed over to Xiaoyi.

For example, let it look at an invitation letter and say:

Navigate to the address on the map.

It can extract the address information on the map and call the map service for navigation.

Or save the contact information in the invitation, and you can see that it can understand the text information in the image very well.

Going a step further, we can now perform complex task arrangement through Xiaoyi, so we don't need to manually set it repeatedly.

For example, you can let it set a morning running scene:

Help me create the morning run scene. Every Monday to Friday at 6:30 in the morning, I will broadcast the weather of the day for me. When I put on my bluetooth headphones, I play favorite songs and put my phone on silent mode.

Xiaoyi can understand this long list of requirements, and then call different functions. And based on the status of the mobile phone (whether the Bluetooth headset is connected), it can judge whether to perform certain operations.

Secondly, thanks to capabilities such as large models, Xiaoyi can now provide more efficient productivity tools.

Help you see, read and write.

For example, show it an English article, and then ask what is said in this article?

Xiaoyi can give simple and concise explanations in Chinese.

If the user has asked it to remember some information in the past, it can also be called to generate corresponding content.

In a few days, I will make an appointment with David to discuss the project. Combining the information from the last meeting, I will write an English meeting appointment email.

Including what was mentioned at the beginning, Xiaoyi can also use AI visual ability to create photos into various styles.

**Finally, as a smart assistant, Xiaoyi now supports more personalized services and can understand you better. **

It can be used as a notepad and a memorandum, and some small things can be recorded orally.

Huawei stated that ** all memory content is completed under the user's authorization and will fully protect user privacy. **

In addition, Xiaoyi Suggestion can now perceive more high-frequency scenarios of users, and can actively provide one-stop smart combination suggestions, saving a lot of manual search processes.

For example, in the scenario of outbound travel, Xiaoyi can remind the latest exchange rate in real time before departure, exchange foreign currency, and help users obtain travel strategies in the destination in real time; Real-time translation tools and more.

According to reports, The new Xiaoyi smart scene has increased by 3 times, and the number of POIs has increased by 7 times, which can cover core catering and shopping stores, business districts, airport high-speed rail stations and other scenarios.

To sum up, the new Xiaoyi not only obtained the latest AIGC capabilities, but also improved some shortcomings of mobile phone voice assistants that were often criticized in the past.

Such as lack of memory, dull dialogue, incomprehension of vernacular, etc...

All of this is of course benefited from the blessing of the big model, but how did Xiaoyi do it?

Xiaoyi embraces the big model

The underlying model that Xiaoyi relies on is Huawei Pangu Series.

In July this year, Huawei officially released Pangu Large Model 3.0, and proposed a three-layer model architecture.

  • L0: Basic large models, including natural language, vision, multimodality, prediction, and scientific computing;
  • L1: Large models of N industries, such as government affairs, finance, manufacturing, mining, meteorology, etc.;
  • L2: A more detailed model of the scene, providing "out-of-the-box" model services

Among them, the largest version of the L0 layer basic large model contains 100 billion parameters, and pre-training uses more than 3 trillion tokens.

Based on the large model of Huawei's Pangu L0 base, Xiaoyi built a large amount of scene data for end consumer scenarios, fine-tuned the model, and finally refined the L1 layer dialogue model.

In fine-tuning, Xiaoyi has added mainstream data types covering end consumers, such as conversations, travel guides, equipment control, food, clothing, housing and transportation, etc.

This can well cover the knowledge range of ordinary users' daily conversations, and can enhance the factuality, real-time performance, security compliance, etc. in the dialogue process of the model.

However, as we all know, large models are very challenging in deployment and rapid response due to their large-scale characteristics.

In terms of deployment, Huawei is continuously enhancing the capability of large-scale model device-cloud collaboration. The large-scale model on the device side can perform a layer of preprocessing on user requests and context information, and then send the preprocessed request to the cloud side.

The advantage of doing this is that it can not only take advantage of the fast response of end-side model, but also improve the quality of Q&A and response through the cloud model, and at the same time further protect user privacy data.

In terms of reducing the inference delay, Huawei Xiaoyi has done systematic engineering optimization, including the entire link from the underlying chip, inference framework, model operator, and input and output length.

By dismantling the time delay of each module, the R&D team clarified the optimization goals of each part, and reduced the time delay by means of operator fusion, memory optimization, and pipeline optimization.

At the same time, the length and output length will also affect the inference speed of large models.

In this regard, Huawei has done word-by-word analysis and compression for different scenarios and output formats, and finally realized reasoning delay halved.

From the perspective of the overall technical architecture, the integration of Huawei Xiaoyi and the large model is not simply to enhance tasks such as chat, AIGC, and reply, but to carry out system-level enhancements with the large model as the core.

**In other words, let the large model become the "brain" of the system. **

Its underlying logic is: assign user tasks to appropriate systems, each system performs its own duties, and at the same time enhances the experience in complex scenarios.

Looking specifically at Xiaoyi’s typical dialogue process, it can be divided into three steps:

The first step is to receive user questions and analyze how to deal with them based on contextual understanding/Xiaoyi’s ability to remember.

Second step, invoke different capabilities according to the request type, including meta-service retrieval, idea generation, and knowledge retrieval.

If the request initiated by the user involves meta-services, for example, if he asks about nearby restaurants that can meet, this involves the call of the gourmet APP service. The system needs to generate an API, and finally the service party will give a response based on the recommendation mechanism .

If the user asks a knowledge question, such as asking how many parameters the Pangu model has. At this time, the system will call the search engine, corresponding domain knowledge, and vector knowledge to query, and then generate an answer through fusion.

If the user's request is a generative task, then the large model can give a reply by its own ability.

In the last step, all generated answers will be evaluated by risk control and returned to the user.

In addition, Xiaoyi has further controlled the details and carried out a series of low-level development to ensure the effect of question and answer and task execution.

You can look at the data aspect.

Since Xiaoyi went online on HarmonyOS in 2017, he has accumulated a certain amount of dialogue habits for ordinary users. On top of this, Huawei has built a large corpus of different types of expressions to cover as many written and spoken expressions as possible, so that the large model can be proficient in various expressions during the pre-training stage.

In order to better evaluate and improve Xiaoyi's ability, Huawei has built a complete test data set.

This can not only evaluate the capabilities of existing open large models, but also guide Xiaoyi to build data and capabilities based on the evaluation results.

I want Xiaoyi to master the tool call, which is very challenging.

Equipment control requires a large model to generate a complex format text with hundreds of tokens, and there must be no format errors, otherwise the central control system will not be able to parse and connect.

In order to allow the large model to meet such a generation standard, Huawei used to understand the "temper" of the large model on the one hand, and at the same time strengthen the code capability of the large model, and then enhance the format compliance ability of the model, and finally realized that the ** format is almost 100% compliant **.

For complex scenarios, Xiaoyi's method is to use the ability of large models to fully learn and understand tool scenarios, and then reason.

It is understood that the team optimized the control effect of the model equipment from being completely unusable to an availability of over 80%.

In addition, native HarmonyOS also makes it possible to optimize the existing API, and through this reverse adaptation, the advantages of large models can also be better utilized.

Facing the whole scene, not limited to the mobile terminal

So, why is Huawei able to deploy large-scale model capabilities to smart assistants so quickly?

The accumulation and research of the underlying basic research and development are essential, but there is one more thing worthy of attention——

Huawei chose to start from the actual scene to determine how to combine the large model with the smart assistant and even the entire operating system.

In Huawei's own words:

Talk is cheap. Show me the Demo.

Many of the experiences shown above also come from the daily perceptions of Huawei R&D team members.

For example, some people are used to getting news on the way to and from get off work, and it is inconvenient to read and listen to too long information, so the function of information summary appears in Huawei Xiaoyi.

Some people find that they are always short of words when writing shopping reviews and birthday wishes, so Huawei Xiaoyi provides a copywriting function.

And this focus on scene experience is the natural advantage of HarmonyOS.

Since its birth, HarmonyOS has not been limited to mobile phones, but has been oriented to various terminals and all scenarios.

Now it has created a "1+8+N" full-scenario ecology.

Huawei Xiaoyi has also been deployed on 1+8 devices. In the future, it will combine the business form of full-scenario devices and gradually deploy Xiaoyi with large-scale model capabilities to consumers’ full-scenario experience.

As an AI-driven smart assistant, Xiaoyi has been integrating various AI capabilities since its birth, such as AI subtitles and Xiaoyi's reading aloud. The R&D team behind it has always paid attention to more possibilities of AI and smart assistants.

According to reports, last year, the team noticed that the tens of billions of pre-training models combined with prompt technology can already bring very good text understanding and generation capabilities, and can be used in small talk, question and answer, and task-based dialogue.

With the outbreak of the latest round of AI trends, RLHF has brought significant improvements to large models, and the door to industrial landing has officially opened.

Since the emergence of the generative AI trend this year, many applications have chosen to access large model capabilities and built-in smart assistants.

However, as one of the world's most operating system manufacturers, Huawei chose to cut in from a lower level and use a large model to reshape the OS.

The lower level means more thorough and comprehensive.

But for research and development, the challenge is even greater.

This not only requires a sufficiently solid model base, but also requires system-level fusion optimization, and also puts forward requirements for scene understanding and user demand perception.

Correspondingly, Huawei is one of the earliest domestic manufacturers with large-scale model capabilities; builds full-stack AI development capabilities; HarmonyOS covers 700 million+ devices...

Therefore, it is not difficult to understand why Huawei Xiaoyi quickly accesses the ability of large models, making HarmonyOS 4 the first operating system to fully access large models.

As one of the most concerned operating systems in the world, HarmonyOS is the first to embrace large-scale models, and may also open a new paradigm, allowing everyone to open their phones to experience the capabilities of large-scale models, which is no longer limited to imagination.

Currently, Huawei has announced the Xiaoyi test plan:

The brand-new Xiaoyi will be invited to test at the end of August this year, and will be upgraded through OTA on some models equipped with HarmonyOS 4.0 and above later. The specific upgrade plan will be announced later.

If you are interested in children's shoes, you can come and have a look~

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)