Nanjing University Unified Identity Authentication Verification Code Recognition: A Full Process Open Source Practice from Dataset Construction to Model Deployment

This article is synchronized and updated to xLog by Mix Space
For the best browsing experience, it is recommended to visit the original link
https://www.do1e.cn/posts/deepl/nju-captcha

Introduction#

In the previously written NJUlogin, account password login required captcha recognition, and I used ddddocr at that time, achieving good accuracy.

I also deployed a server and asked a friend to help write a Tampermonkey script, allowing me to automatically fill in the captcha each time I needed to log in (the browser would automatically fill in the username and password), so I only needed to click login.

However, recently I thought about making the recognition model lighter for easier deployment on edge devices, leading to this project. (Could you give it a Star? ＞︿＜ If you just want to use it and don't want to learn the related technology, scroll to the end, I recommend the NJU server API version.)

Github Repo not found

The embedded github repo could not be found…

Implementation Effect

Data Collection#

https://github.com/Do1e/NJUcaptcha/tree/main/build_dataset

The dataset construction is mostly automated, relying mainly on the following two tools:

ddddocr: for preliminary captcha recognition
NJUlogin: to verify the correctness of the recognition results

I slightly modified NJUlogin to determine the correctness of the recognition, and then saved them into different folders, manually renaming the incorrectly recognized ones (about a few hundred?).
To collect 100,000 images, it ran in the background for about 3 to 4 days, and time.sleep couldn't be too small, otherwise, the IP would get blocked. ＞︿＜

Thus, this dataset was created, welcome to download and use, containing 100,000 captcha images, with the file naming format {captcha text}_{image md5}.jpg, and all captcha texts are in lowercase.
Dataset download link: NJU-captcha-dataset.7z
Decompression password: @Do1e

The dataset is as follows:

https://github.com/Do1e/NJUcaptcha/blob/main/model/dataset.py

Recognition Model#

https://github.com/Do1e/NJUcaptcha/tree/main/model

With the data ready, I could design the model and train it. This time, I completely handed over the model design to AI, and the results were quite satisfactory.

Model size 12.98MiB -> 2.25MiB
Model accuracy 99.37% -> 99.83%
Throughput 173.95 images/sec -> 1076.56 images/sec [AMD Ryzen 7 8845H]

https://github.com/Do1e/NJUcaptcha/blob/main/model/model.py

Maybe it can be a bit smaller? ~~Let's save that for the next upgrade~~

Server Deployment#

https://github.com/Do1e/NJUcaptcha/tree/main/service

Previously, I also implemented a simple recognition server using fastapi, which recognizes the received base64 images and returns the captcha content. This time, I took the opportunity to deploy it on vercel. Test command on Linux:

curl -s -L "https://authserver.nju.edu.cn/authserver/captcha.html" -o "captcha.jpg" && [ -f "captcha.jpg" ] && curl -s -X POST -H "Content-Type: application/x-www-form-urlencoded" -d "captcha=$(base64 -i captcha.jpg | tr -d '\n')" "https://njucaptcha.vercel.app" || { echo "Failed to download captcha image"; exit 1; }

Tampermonkey Script Auto-fill#

As mentioned in the introduction, to achieve login without manual recognition and input of the captcha, I wrote a Tampermonkey script for auto-filling. The previous version was server-based:

https://github.com/Do1e/NJUcaptcha/blob/main/njucaptcha.user.js

The open-source code still uses the vercel service, which is very slow and cannot be used when logging into p.nju. (￣﹃￣)

My own solution is to set up a service on campus and map it to my public server through frp, and access the internal service when logging into p.nju:

const url_pub = 'https://example.com/';
const url_nju = 'https://nju.example.com/';
const currentUrl = window.location.href;
const serverUrl = currentUrl.includes('//p.nju.edu.cn') ? url_nju : url_pub;

This time, the most challenging part of the entire project was how to directly execute ONNX inference on the client side, which took several hours of tinkering with AI tools to successfully resolve. Implemented using ONNX Runtime Web.

https://github.com/Do1e/NJUcaptcha/blob/main/njucaptcha_onnx.user.js

One drawback of the ONNX version is that it requires an internet connection to download some necessary inference dependencies when there is no cache, but it can cache after the first use (ort-wasm-simd-threaded.jsep.mjs and ort-wasm-simd-threaded.jsep.wasm can only be cached for 7 days, which isn't too long. If anyone has a way to achieve a nearly permanent cache like @resource, feel free to submit a PR).

In summary, both solutions have their pros and cons. I still recommend deploying and using it in my way, or directly using the NJU server API version I provided at the end.

The above version of the Tampermonkey script can be installed directly by clicking the links below (provided you have the Tampermonkey extension installed):

| | vercel api version | NJU server api version | onnx local inference version |
| :--- | :--- | :--- |
| Advantages | No need for scientific internet access | Best practice, personally considered quite perfect | Very fast, filled in before the page finishes loading, and can be used when logging into p.nju (with caching) |
| Disadvantages | Very slow, and cannot be used when logging into p.nju | Requires deployment on an internal and external server, I won't be able to use it after graduation | No cache requires scientific internet access to cache some files, cannot be used when logging into p.nju, and the cache lasts only 7 days |

Note: The code this time uses the GPL-3.0 open-source license, please ignore the following explanation about the open-source license. ~~I’m too lazy to change the webpage code, my website's explanation rights are fine, right?~~