Skip to content

Commit a770ce6

Browse files
committed
split chapter6 to several docs
1 parent dfe4bfa commit a770ce6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1373
-1337
lines changed

README.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
[![MIT License][license-shield]][license-url]
66

77
<div align="center">
8+
<img src="pictures/README/banner.png" style="max-width: 100%; height: auto;" />
89
<h3 align="center"><a href="https://seeed-projects.github.io/Tutorial-of-AI-Kit-with-Raspberry-Pi-From-Zero-to-Hero/">Tutorial of AI Kit with Raspberry Pi From Zero to Hero</a> </h3>
910
<p align="center">
1011
Play your AI Kit from Beginner to Expert
@@ -96,10 +97,10 @@ This course is designed to teach you how to harness the power of AI on the Raspb
9697
### Finish Chapter 1
9798

9899
- ⏳ Introduction of Artificial Intelligence
99-
- Introduction of Deep Neural Network
100+
- ✔️ Introduction of Deep Neural Network
100101
- ⏳ Introduction of Convolutional Neural Network
101102
- ✔️ Introduction of Computer Vision
102-
- Introduction of Mastering Large Language Model
103+
- ✔️ Introduction of Large Language Model
103104

104105
### Finish Chapter 2
105106

@@ -112,6 +113,12 @@ This course is designed to teach you how to harness the power of AI on the Raspb
112113
### Finish Chapter 3
113114

114115
### Finish Chapter 4
116+
- ✔️ Setup Ollama on RaspberryPi
117+
- ✔️ Run Llama on RaspberryPi
118+
- ✔️ Run Gemma2 on RaspberryPi
119+
- ✔️ Run Phi3.5 on RaspberryPi
120+
- ✔️ Run Multimodal on RaspberryPi
121+
- ✔️ Use Ollama with Python
115122

116123
### Finish Chapter 5
117124

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
---
2+
sidebar_position: 5
3+
---
4+
5+
# Generative AI (GenAI)
6+
7+
Generative AI is an artificial intelligence system capable of creating new, original content across various mediums such as **text, images, audio, and video**. These systems learn patterns from existing data and use that knowledge to generate novel outputs that didn't previously exist. **Large Language Models (LLMs)**, **Small Language Models (SLMs)**, and **multimodal models** can all be considered types of GenAI when used for generative tasks.
8+
9+
GenAI provides the conceptual framework for AI-driven content creation, with LLMs serving as powerful general-purpose text generators. SLMs adapt this technology for edge computing, while multimodal models extend GenAI capabilities across different data types. Together, they represent a spectrum of generative AI technologies, each with its strengths and applications, collectively driving AI-powered content creation and understanding.
10+
11+
## Large Language Models (LLMs)
12+
13+
Large Language Models (LLMs) are advanced artificial intelligence systems that understand, process, and generate human-like text. These models are characterized by their massive scale in terms of the amount of data they are trained on and the number of parameters they contain. Critical aspects of LLMs include:
14+
15+
1. **Size**: LLMs typically contain billions of parameters. For example, GPT-3 has 175 billion parameters, while some newer models exceed a trillion parameters.
16+
17+
2. **Training Data**: They are trained on vast amounts of text data, often including books, websites, and other diverse sources, amounting to hundreds of gigabytes or even terabytes of text.
18+
19+
3. **Architecture**: Most LLMs use [transformer-based architectures](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)), which allow them to process and generate text by paying attention to different parts of the input simultaneously.
20+
21+
4. **Capabilities**: LLMs can perform a wide range of language tasks without specific fine-tuning, including:
22+
- Text generation
23+
- Translation
24+
- Summarization
25+
- Question answering
26+
- Code generation
27+
- Logical reasoning
28+
29+
5. **Few-shot Learning**: They can often understand and perform new tasks with minimal examples or instructions.
30+
31+
6. **Resource-Intensive**: Due to their size, LLMs typically require significant computational resources to run, often needing powerful GPUs or TPUs.
32+
33+
7. **Continual Development**: The field of LLMs is rapidly evolving, with new models and techniques constantly emerging.
34+
35+
8. **Ethical Considerations**: The use of LLMs raises important questions about bias, misinformation, and the environmental impact of training such large models.
36+
37+
9. **Applications**: LLMs are used in various fields, including content creation, customer service, research assistance, and software development.
38+
39+
10. **Limitations**: Despite their power, LLMs can produce incorrect or biased information and lack true understanding or reasoning capabilities.
40+
41+
We must note that we use large models beyond text, calling them *multi-modal models*. These models integrate and process information from multiple types of input simultaneously. They are designed to understand and generate content across various forms of data, such as text, images, audio, and video.
42+
43+
Certainly. Let's define open and closed models in the context of AI and language models:
44+
45+
## Closed vs Open Models:
46+
47+
**Closed models**, also called proprietary models, are AI models whose internal workings, code, and training data are not publicly disclosed. Examples: GPT-4 (by OpenAI), Claude (by Anthropic), Gemini (by Google).
48+
49+
**Open models**, also known as open-source models, are AI models whose underlying code, architecture, and often training data are publicly available and accessible. Examples: Gemma (by Google), LLaMA (by Meta) and Phi (by Microsoft)/
50+
51+
Open models are particularly relevant for running models on edge devices like Raspberry Pi as they can be more easily adapted, optimized, and deployed in resource-constrained environments. Still, it is crucial to verify their Licenses. Open models come with various open-source licenses that may affect their use in commercial applications, while closed models have clear, albeit restrictive, terms of service.
52+
53+
![Adapted from https://arxiv.org/pdf/2304.13712](../../pictures/Chapter6/llms-slm.png)
54+
55+
## Small Language Models (SLMs)
56+
57+
In the context of edge computing on devices like Raspberry Pi, full-scale LLMs are typically too large and resource-intensive to run directly. This limitation has driven the development of smaller, more efficient models, such as the Small Language Models (SLMs).
58+
59+
SLMs are compact versions of LLMs designed to run efficiently on resource-constrained devices such as smartphones, IoT devices, and single-board computers like the Raspberry Pi. These models are significantly smaller in size and computational requirements than their larger counterparts while still retaining impressive language understanding and generation capabilities.
60+
61+
Key characteristics of SLMs include:
62+
63+
1. **Reduced parameter count**: Typically ranging from a few hundred million to a few billion parameters, compared to two-digit billions in larger models.
64+
65+
2. **Lower memory footprint**: Requiring, at most, a few gigabytes of memory rather than tens or hundreds of gigabytes.
66+
67+
3. **Faster inference time**: Can generate responses in milliseconds to seconds on edge devices.
68+
69+
4. **Energy efficiency**: Consuming less power, making them suitable for battery-powered devices.
70+
71+
5. **Privacy-preserving**: Enabling on-device processing without sending data to cloud servers.
72+
73+
6. **Offline functionality**: Operating without an internet connection.
74+
75+
SLMs achieve their compact size through various techniques such as knowledge distillation, model pruning, and quantization. While they may not match the broad capabilities of larger models, SLMs excel in specific tasks and domains, making them ideal for targeted applications on edge devices.
76+
77+
> We will generally consider SLMs, language models with less than 5 billion parameters quantized to 4 bits.
78+
79+
Examples of SLMs include compressed versions of models like Meta Llama, Microsoft PHI, and Google Gemma. These models enable a wide range of natural language processing tasks directly on edge devices, from text classification and sentiment analysis to question answering and limited text generation.
80+
81+
For more information on SLMs, the paper, [LLM Pruning and Distillation in Practice: The Minitron Approach](https://arxiv.org/pdf/2408.11796), provides an approach applying pruning and distillation to obtain SLMs from LLMs. And, [SMALL LANGUAGE MODELS: SURVEY, MEASUREMENTS, AND INSIGHTS](https://arxiv.org/pdf/2409.15790), presents a comprehensive survey and analysis of Small Language Models (SLMs), which are language models with 100 million to 5 billion parameters designed for resource-constrained devices.
82+
83+
84+
## Conclusion
85+
86+
This chapter introduces some basic concepts of large language models. And The potential of running LLMs on the edge extends far beyond simple data processing, as in this lab's examples. Here are some innovative suggestions for using this project:
87+
88+
**1. Smart Home Automation:**
89+
90+
- Integrate SLMs to interpret voice commands or analyze sensor data for intelligent home automation. This could include real-time monitoring and control of home devices, security systems, and energy management, all processed locally without relying on cloud services.
91+
92+
**2. Field Data Collection and Analysis:**
93+
94+
- Deploy SLMs on Raspberry Pi in remote or mobile setups for real-time data collection and analysis. This can be used in agriculture to monitor crop health, in environmental studies for wildlife tracking, or in disaster response for situational awareness and resource management.
95+
96+
**3. Educational Tools:**
97+
98+
- Create interactive educational tools that leverage SLMs to provide instant feedback, language translation, and tutoring. This can be particularly useful in developing regions with limited access to advanced technology and internet connectivity.
99+
100+
**4. Healthcare Applications:**
101+
102+
- Use SLMs for medical diagnostics and patient monitoring. They can provide real-time analysis of symptoms and suggest potential treatments. This can be integrated into telemedicine platforms or portable health devices.
103+
104+
**5. Local Business Intelligence:**
105+
106+
- Implement SLMs in retail or small business environments to analyze customer behavior, manage inventory, and optimize operations. The ability to process data locally ensures privacy and reduces dependency on external services.
107+
108+
**6. Industrial IoT:**
109+
110+
- Integrate SLMs into industrial IoT systems for predictive maintenance, quality control, and process optimization. The Raspberry Pi can serve as a localized data processing unit, reducing latency and improving the reliability of automated systems.
111+
112+
**7. Autonomous Vehicles:**
113+
114+
- Use SLMs to process sensory data from autonomous vehicles, enabling real-time decision-making and navigation. This can be applied to drones, robots, and self-driving cars for enhanced autonomy and safety.
115+
116+
**8. Cultural Heritage and Tourism:**
117+
118+
- Implement SLMs to provide interactive and informative cultural heritage sites and museum guides. Visitors can use these systems to get real-time information and insights, enhancing their experience without internet connectivity.
119+
120+
**9. Artistic and Creative Projects:**
121+
122+
- Use SLMs to analyze and generate creative content, such as music, art, and literature. This can foster innovative projects in the creative industries and allow for unique interactive experiences in exhibitions and performances.
123+
124+
**10. Customized Assistive Technologies:**
125+
126+
- Develop assistive technologies for individuals with disabilities, providing personalized and adaptive support through real-time text-to-speech, language translation, and other accessible tools.

articles/Chapter 1 - Introduction to AI/Introduction_of_Mastering_Large_Language_Model.md

Lines changed: 0 additions & 5 deletions
This file was deleted.

articles/Chapter 1 - Introduction to AI/Introduction_to_DNN.md renamed to articles/Chapter 1 - Introduction to AI/Introduction_to_Deep_Neural_Network.md

File renamed without changes.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
sidebar_position: 3
3+
---
4+
5+
# Google Gemma 2 2B
6+
## Introduction
7+
Let's install [Gemma 2](https://ollama.com/library/gemma2:2b), a high-performing and efficient model available in three sizes: 2B, 9B, and 27B. We will install [**Gemma 2 2B**](https://huggingface.co/collections/google/gemma-2-2b-release-66a20f3796a2ff2a7c76f98f), a lightweight model trained with 2 trillion tokens that produce outsized results by learning from larger models through distillation. The model has 2.6 billion parameters and a Q4_0 quantization, which ends with a size of 1.6 GB. Its context window is 8,192 tokens.
8+
9+
![](../../pictures/Chapter4/gemma_2.png)
10+
11+
12+
## Install Ollama
13+
14+
Please refer to this (article)[./articles/Chapter 4 - Large Language Model/Setup_Ollama_on_RaspberryPi.md]
15+
16+
17+
## Install and run gemma2
18+
19+
```bash
20+
ollama run gemma2:2b --verbose
21+
```
22+
23+
Running the model with the command before, we should have the Ollama prompt available for us to input a question and start chatting with the LLM model; for example,
24+
25+
`>>> What is the capital of France?`
26+
27+
Almost immediately, we get the correct answer:
28+
29+
`The capital of France is **Paris**. 🗼`
30+
31+
And it' statistics.
32+
33+
![](../../pictures/Chapter4/gemma.png)
34+
35+
We can see that Gemma 2:2B has around the same performance as Lama 3.2:3B, but having less parameters.
36+
37+
**Other examples:**
38+
39+
```bash
40+
>>> What is the distance between Paris and Santiago, Chile?
41+
42+
The distance between Paris, France and Santiago, Chile is
43+
approximately **7,000 miles (11,267 kilometers)**.
44+
45+
Keep in mind that this is a straight-line distance, and actual
46+
travel distance can vary depending on the chosen routes and any
47+
stops along the way. ✈️`
48+
```
49+
50+
Also, a good response but less accurate than Llama3.2:3B.
51+
52+
```bash
53+
>>> what is the latitude and longitude of Paris?
54+
55+
You got it! Here are the latitudes and longitudes of Paris,
56+
France:
57+
58+
* **Latitude:** 48.8566° N (north)
59+
* **Longitude:** 2.3522° E (east)
60+
61+
Let me know if you'd like to explore more about Paris or its
62+
location! 🗼🇫🇷
63+
```
64+
65+
A good and accurate answer (a little more verbose than the Llama answers).
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
sidebar_position: 2
3+
---
4+
5+
# Meta Llama 3.2 1B/3B
6+
7+
## Introduction
8+
9+
![](../../pictures/Chapter4/small_and_multimodal.png)
10+
11+
Let's install and run our first small language model, [Llama 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/) 1B (and 3B). The Meta Llama, 3.2 collections of multilingual large language models (LLMs), is a collection of pre-trained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text-only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.
12+
13+
The 1B and 3B models were pruned from the Llama 8B, and then logits from the 8B and 70B models were used as token-level targets (token-level distillation). Knowledge distillation was used to recover performance (they were trained with 9 trillion tokens). The 1B model has 1,24B, quantized to integer (Q8_0), and the 3B, 3.12B parameters, with a Q4_0 quantization, which ends with a size of 1.3 GB and 2GB, respectively. Its context window is 131,072 tokens.
14+
15+
![](../../pictures/Chapter4/llama3_2.jpg)
16+
17+
## Install Ollama
18+
19+
Please refer to this (article)[./articles/Chapter 4 - Large Language Model/Setup_Ollama_on_RaspberryPi.md]
20+
21+
## Install and run llama
22+
23+
24+
```bash
25+
ollama run llama3.2:1b
26+
```
27+
28+
Running the model with the command before, we should have the Ollama prompt available for us to input a question and start chatting with the LLM model; for example,
29+
30+
`>>> What is the capital of France?`
31+
32+
Almost immediately, we get the correct answer:
33+
34+
`The capital of France is Paris.`
35+
36+
Using the option `--verbose` when calling the model will generate several statistics about its performance (The model will be polling only the first time we run the command).
37+
38+
![](../../pictures/Chapter4/llama3_2_1b_performance.png)
39+
40+
Each metric gives insights into how the model processes inputs and generates outputs. Here’s a breakdown of what each metric means:
41+
42+
- **Total Duration (2.620170326s)**: This is the complete time taken from the start of the command to the completion of the response. It encompasses loading the model, processing the input prompt, and generating the response.
43+
- **Load Duration (39.947908ms)**: This duration indicates the time to load the model or necessary components into memory. If this value is minimal, it can suggest that the model was preloaded or that only a minimal setup was required.
44+
- **Prompt Eval Count (32 tokens)**: The number of tokens in the input prompt. In NLP, tokens are typically words or subwords, so this count includes all the tokens that the model evaluated to understand and respond to the query.
45+
- **Prompt Eval Duration (1.644773s)**: This measures the model's time to evaluate or process the input prompt. It accounts for the bulk of the total duration, implying that understanding the query and preparing a response is the most time-consuming part of the process.
46+
- **Prompt Eval Rate (19.46 tokens/s)**: This rate indicates how quickly the model processes tokens from the input prompt. It reflects the model’s speed in terms of natural language comprehension.
47+
- **Eval Count (8 token(s))**: This is the number of tokens in the model’s response, which in this case was, “The capital of France is Paris.”
48+
- **Eval Duration (889.941ms)**: This is the time taken to generate the output based on the evaluated input. It’s much shorter than the prompt evaluation, suggesting that generating the response is less complex or computationally intensive than understanding the prompt.
49+
- **Eval Rate (8.99 tokens/s)**: Similar to the prompt eval rate, this indicates the speed at which the model generates output tokens. It's a crucial metric for understanding the model's efficiency in output generation.
50+
51+
This detailed breakdown can help understand the computational demands and performance characteristics of running SLMs like Llama on edge devices like the Raspberry Pi 5. It shows that while prompt evaluation is more time-consuming, the actual generation of responses is relatively quicker. This analysis is crucial for optimizing performance and diagnosing potential bottlenecks in real-time applications.
52+
53+
Loading and running the 3B model, we can see the difference in performance for the same prompt;
54+
55+
![](../../pictures/Chapter4/llama3_2_3b_performance.png)
56+
57+
The eval rate is lower, 5.3 tokens/s versus 9 tokens/s with the smaller model.
58+
59+
When question about
60+
61+
`>>> What is the distance between Paris and Santiago, Chile?`
62+
63+
The 1B model answered `9,841 kilometers (6,093 miles)`, which is inaccurate, and the 3B model answered `7,300 miles (11,700 km)`, which is close to the correct (11,642 km).
64+
65+
Let's ask for the Paris's coordinates:
66+
67+
`>>> what is the latitude and longitude of Paris?`
68+
69+
```bash
70+
The latitude and longitude of Paris are 48.8567° N (48°55'
71+
42" N) and 2.3510° E (2°22' 8" E), respectively.
72+
```
73+
74+
![](../../pictures/Chapter4/paris-lat-lon.png)
75+
76+
Both 1B and 3B models gave correct answers.

0 commit comments

Comments
 (0)