Sensitive Information Scan
Command Function
Scans files for sensitive information, such as mobile numbers, public network addresses, and ID numbers. You can customize sensitive words.
Syntax
1 | devkit doctor sen-scan {-i INPUT_PATH | --input INPUT_PATH} [-o OUTPUT_PATH | --output OUTPUT_PATH] [-S | --show] [-t [PATH]| --template [PATH]] [-sn {1|2|3|a|b|c}*| --sen-num {1|2|3|a|b|c}*] [-sf PATH| --sen-file PATH] |
Parameter Description
Parameter |
Option |
Parameter description |
|---|---|---|
-h/--help |
- |
Obtains help information. |
-S/--show |
- |
Displays the default list of sensitive words.
|
-t/--template |
- |
Generates a sensitive word template in the specified location. If no location is specified, the sen_word.json file is generated in the doctor directory by default. You can use -sf to specify a sensitive word template for a scan. |
-sn/--sen-num |
1/2/3/a/b/c |
Sensitive word ID. You can run -S to obtain a sensitive word ID. Use commas (,) to separate multiple sensitive word IDs. If no sensitive word ID is specified, all sensitive words are scanned for by default. |
-sf/--sen-file |
- |
File path of the user-defined sensitive words. This file must have the same format as the template file generated by -t. |
-i/--input |
- |
Path to the folder or file to be scanned. Only text files can be scanned. Use spaces to separate multiple text files. |
-o/--output |
- |
Path for storing scan reports. If this parameter is not specified, the sen_scan_{time}_[zh|en]_{num}.xlsx file is generated in the doctor/report/sen_scan directory by default. |
During a scan, you can press Ctrl+C to stop the scan. After the scan is stopped, the data already detected is outputted. A single report supports a maximum of 10,000 data records.
Example
- Viewing the sensitive word list
1devkit doctor sen-scan -S
Command output:
1 2 3 4 5 6 7 8
id note ———————————————————————————————————————————————————————————— 1 Public IP address 2 Mobile number 3 ID number a Hard-coded key/Password (high false positive rate) b Common password text (high false positive rate) c Privacy sensitive words (high false positive rate)
- Generating a sensitive word template
- Generate a sensitive word template.
The following uses the /home/temp template directory as an example. Replace it with the actual one. If the template directory is not specified, the sen_word.json file is generated in the doctor directory by default.
1devkit doctor sen-scan -t /home/temp
The following information is displayed. If a file with the same name already exists in the path, the file name is automatically suffixed by 1, for example, sen_word_1.json.
1[INFO]Generating the template file in /home/temp/sen_word.json succeeded.
- Edit the template file.
1vi /home/temp/sen_word.json - Press i to enter the insert mode and configure the sensitive word template.
1 2 3 4 5 6 7 8 9 10 11 12
[ { "word": "", "word_type": "regex", "word_note": "" }, { "word": "", "word_type": "text", "word_note": "" } ]
- word: sensitive word to be scanned, which must correspond to the sensitive word type.
- word_type: type of the sensitive word. regex is a regular expression and text is a text style.
- word_note: description of the sensitive word. This parameter is optional.
- Press Esc to exit the insert mode. Type :wq! and press Enter to save the file and exit.
- Generate a sensitive word template.
- Sensitive information scanThe following example describes how to scan /home/software/RuoYi-master/ with the specified sensitive word code and template. Replace this directory with the actual one.
1devkit doctor sen-scan -i /home/software/RuoYi-master/ -sn 1,2,3 -sf /home/temp/sen_word.json
The following information is displayed and a report is generated:
1 2 3
[INFO]Start scan /home/software/RuoYi-master. [INFO]The scan is complete, starting to generate the report. Excel report is created successfully. Files are located in /usr/local/devkit/doctor/report/sen_scan/20240814101140
A scan report in both Chinese and English is generated. The report contains three tab pages: Overview, Sensitive Words, and Details. The report displays the scan path, start and end times, whether to stop the scan, sensitive word statistics, and details.