The script gen_opcode_hist.py implements the weighted opcode generation.
It disassembles Debian RISC-V packages downloaded from various Debian mirrors.
Pass the directory to inspect as the first argument.
As the used package archive used in the paper is too large to provide as an artifact, we provide a basic sample set in example_packages as an example.
To plot a histogram and run the weight generation run:
python gen_opcode_hist.py example_packagesThe script should complete in less than 5 minutes.
Expected output (Errors about wrong file formats can be safely ignored):
riscv64-unknown-linux-gnu-objcopy: /tmp/tmpt79ml0g0/usr/share/bug/vim/script: file format not recognized
Failed to disassemble /tmp/tmpt79ml0g0/usr/share/bug/vim/script
Processed 748126 binaries
[('c.ldsp', 65882), ('c.mv', 59821), ('auipc', 54631), ('c.li', 51819), ('c.sdsp', 47733), ('jal', 47558), ('ld', 42595), ('addi', 38324), ('c.j', 25240), ('beq', 24574), ('c.ld', 24445), ('c.beqz', 18833), ('jalr', 18247), ('lw', 16471), ('bne', 15803), ('c.add', 12257), ('c.lw', 11998), ('c.bnez', 11545), ('lbu', 11394), ('sw', 10799)]
A visual plot should open and a file opcodes.json should be created which can then be used for the weighted sequence generation.
The final weights used in the paper can be found in the framework code (pyutils/riscv/weighted_opcodes.json).