Legal Knowledge Fine-Tuning Data Generation
This template provides a complete legal data preparation workflow to help you build high-quality Q&A pair datasets based on legal documents and complete model fine-tuning on the Hugging Face AutoTrain platform. Suitable for constructing customized large language models with professional legal understanding capabilities, such as labor disputes, contract disputes, and other legal scenario Q&A applications.
Template Details
Click View Details in the template list to access the template details page. On this page, you can see example processing results and the workflow topology.

Using the Template
- Select the Legal Knowledge Fine-Tuning Data Generation template from the template list. Click Use Template either in the list or on the details page to create a data processing task and quickly generate the corresponding workflow.
- The system includes built-in sample data for quick onboarding and testing.
- You need to create the target location yourself.
- Supports custom adjustments to parsing, enhancement, and other workflow node configurations based on actual needs.

Click Create and Start Running, then wait for the workflow to complete.
Viewing Processing Results
Navigate to the Data Center, locate the target location selected during the workflow setup, and click the filename to view the processing results.

Data Export
After processing is complete, the dataset can be exported for subsequent model training. Here, we simply click the download button next to the file in the Data Center. After exporting and decompressing, you will obtain a standard Q&A pair dataset such as Labor_Dispute_Mediation_and_Arbitration_Law_of_the_People's_Republic_of_China.pdf.jsonl.
Model Fine-Tuning (Using Hugging Face AutoTrain)
We will use Hugging Face's AutoTrain platform to complete the fine-tuning process online, with zero coding and a fully visual workflow.
- Access AutoTrain: https://huggingface.co/autotrain
- Create a Project: Select Text Classification or Text Generation.
- Upload Data: Upload
Labor_Dispute_Mediation_and_Arbitration_Law_of_the_People's_Republic_of_China.pdf.jsonlto the project. - Configure Parameters: Set training epochs, learning rate, base model (e.g., Mistral, Gemma, etc.).
- Start Training: Click
Start Trainingto begin model fine-tuning.

AutoTrain will automatically handle environment deployment and the training process. Once completed, you will obtain a fine-tuned model with professional legal Q&A capabilities, ready for deployment or direct invocation on the platform.