Skip to content

Lexical analysis

Model basic information

This Module is a word segmentation network (bidirectional GRU) built by jieba using the PaddlePaddle deep learning framework. At the same time, it also supports jieba's traditional word segmentation methods, such as precise mode, full mode, search engine mode and other word segmentation modes. The usage methods are consistent with jieba.

Reference:https://github.com/PaddlePaddle/PaddleHub/blob/release/v2.2/modules/text/lexical_analysis/jieba_paddle

Sample result example

"今天天气真好"
["今天", "天气", "真好"]

Let's try it out now

Prerequisite

1、environment dependent

Please visit dependencies

2、jieba_paddle dependent

  • paddlepaddle >= 1.8.0

  • paddlehub >= 1.8.0

3、Download the model

hub install jieba_paddle

Serve the Model

Install Pinferencia

First, let's install Pinferencia.

pip install "pinferencia[streamlit]"

Create app.py

Let's save our predict function into a file app.py and add some lines to register it.

app.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import paddlehub as hub

from pinferencia import Server, task

lexical_analysis = hub.Module(name="jieba_paddle")


def predict(text: str):
    return lexical_analysis.cut(text, cut_all=False, HMM=True)


service = Server()
service.register(
    model_name="lexical_analysis", model=predict, metadata={"task": task.TEXT_TO_TEXT}
)

Run the service, and wait for it to load the model and start the server:

$ uvicorn app:service --reload
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [xxxxx] using statreload
INFO:     Started server process [xxxxx]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
$ pinfer app:service --reload

Pinferencia: Frontend component streamlit is starting...
Pinferencia: Backend component uvicorn is starting...

Test the

Open http://127.0.0.1:8501, and the template Text to Text will be selected automatically.

png

Request

curl --location --request POST \
    'http://127.0.0.1:8000/v1/models/lexical_analysis/predict' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "data": "今天天气真好"
    }'

Response

{
    "model_name": "lexical_analysis",
    "data": [
        "今天",
        "天气",
        "真好"
    ]
}

Create the test.py.

test.py
1
2
3
4
5
6
7
8
import requests


response = requests.post(
    url="http://localhost:8000/v1/models/lexical_analysis/predict",
    json={"data": "今天天气真好"}
)
print(response.json())
Run the script and check the result.

$ python test.py
{
    "model_name": "lexical_analysis",
    "data": [
        "今天",
        "天气",
        "真好"
    ]
}

Even cooler, go to http://127.0.0.1:8000, and you will have a full documentation of your APIs.

You can also send predict requests just there!