Skip to content

Fix potential use-after-free bug#80

Merged
Rot127 merged 1 commit intocapstone-engine:auto-sync-18from
jiegec:fix-use-after-free
May 16, 2025
Merged

Fix potential use-after-free bug#80
Rot127 merged 1 commit intocapstone-engine:auto-sync-18from
jiegec:fix-use-after-free

Conversation

@jiegec
Copy link
Collaborator

@jiegec jiegec commented May 15, 2025

Sometimes there are corrupted characters in the generated files. It is due to the result of Regex::sub being a std::string, which is passed to the constructor of StringRef and then dropped prior to the use of the StringRef. This is a use-after-free bug, found by valgrind:

==566185== Invalid read of size 1
==566185== at 0x6738C5: smatcher (regengine.inc:164)
==566185== by 0x6791E0: llvm_regexec (regexec.c:159)
==566185== by 0x5FAE8F: llvm::Regex::match(llvm::StringRef, llvm::SmallVectorImplllvm::StringRef, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) const (Regex.cpp:105)
==566185== by 0x4E6BE8: llvm::(anonymous namespace)::normalizedMnemonic(llvm::StringRef const&, bool, bool, llvm::StringRef) (PrinterCapstone.cpp:2683)
==566185== by 0x4E6E17: llvm::(anonymous namespace)::getNormalMnemonic(llvm::StringRef, llvm::StringRef, bool, bool) (PrinterCapstone.cpp:2700)
==566185== by 0x4ECBBE: llvm::(anonymous namespace)::printInsnNameMapEnumEntry(llvm::StringRef const&, std::unique_ptr<MatchableInfo, std::default_delete > const&, llvm::raw_string_ostream&, llvm::raw_string_ostream&) (PrinterCapstone.cpp:3401)
==566185== by 0x4EF42B: llvm::PrinterCapstone::asmMatcherEmitMatchTable(llvm::CodeGenTarget const&, AsmMatcherInfo&, llvm::StringToOffsetTable&, unsigned int) const (PrinterCapstone.cpp:3676)
==566185== by 0x16F550: (anonymous namespace)::AsmMatcherEmitter::run() (AsmMatcherEmitter.cpp:2258)
==566185== by 0x16FB75: (anonymous namespace)::EmitAsmMatcher(llvm::RecordKeeper&, llvm::raw_ostream&) (AsmMatcherEmitter.cpp:2306)
==566185== by 0x6BEC3B: llvm::TableGenMain(char const*, std::function<bool (llvm::raw_ostream&, llvm::RecordKeeper&)>) (Main.cpp:136)
==566185== by 0x56F387: main (TableGen.cpp:84)
==566185== Address 0x951cf00 is 0 bytes inside a block of size 31 free'd
==566185== at 0x484499B: operator delete(void*, unsigned long) (vg_replace_malloc.c:935)
==566185== by 0x139604: std::__new_allocator::deallocate(char*, unsigned long) (new_allocator.h:158)
==566185== by 0x138074: std::allocator_traits<std::allocator >::deallocate(std::allocator&, char*, unsigned long) (alloc_traits.h:496)
==566185== by 0x136777: std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_destroy(unsigned long) (basic_string.h:292)
==566185== by 0x134F93: std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_dispose() (basic_string.h:286)
==566185== by 0x133549: std::__cxx11::basic_string<char, std::char_traits, std::allocator >::~basic_string() (basic_string.h:795)
==566185== by 0x4E6B85: llvm::(anonymous namespace)::normalizedMnemonic(llvm::StringRef const&, bool, bool, llvm::StringRef) (PrinterCapstone.cpp:2684)
==566185== by 0x4E6E17: llvm::(anonymous namespace)::getNormalMnemonic(llvm::StringRef, llvm::StringRef, bool, bool) (PrinterCapstone.cpp:2700)
==566185== by 0x4ECBBE: llvm::(anonymous namespace)::printInsnNameMapEnumEntry(llvm::StringRef const&, std::unique_ptr<MatchableInfo, std::default_delete > const&, llvm::raw_string_ostream&, llvm::raw_string_ostream&) (PrinterCapstone.cpp:3401)
==566185== by 0x4EF42B: llvm::PrinterCapstone::asmMatcherEmitMatchTable(llvm::CodeGenTarget const&, AsmMatcherInfo&, llvm::StringToOffsetTable&, unsigned int) const (PrinterCapstone.cpp:3676)
==566185== by 0x16F550: (anonymous namespace)::AsmMatcherEmitter::run() (AsmMatcherEmitter.cpp:2258)
==566185== by 0x16FB75: (anonymous namespace)::EmitAsmMatcher(llvm::RecordKeeper&, llvm::raw_ostream&) (AsmMatcherEmitter.cpp:2306)
==566185== Block was alloc'd at
==566185== at 0x4841F2F: operator new(unsigned long) (vg_replace_malloc.c:422)
==566185== by 0x138958: std::__new_allocator::allocate(unsigned long, void const*) (new_allocator.h:137)
==566185== by 0x13707F: std::allocator_traits<std::allocator >::allocate(std::allocator&, unsigned long) (alloc_traits.h:464)
==566185== by 0x1354B5: std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_create(unsigned long&, unsigned long) (basic_string.tcc:155)
==566185== by 0x136F84: std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long) (basic_string.tcc:328)
==566185== by 0x1353D5: std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_append(char const*, unsigned long) (basic_string.tcc:420)
==566185== by 0x1338FC: std::__cxx11::basic_string<char, std::char_traits, std::allocator >::append(char const*, unsigned long) (basic_string.h:1422)
==566185== by 0x14EB57: llvm::operator+=(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, llvm::StringRef) (StringRef.h:900)
==566185== by 0x5FB8C1: llvm::Regex::sub(llvm::StringRef, llvm::StringRef, std::__cxx11::basic_string<char, std::char_traits, std::allocator >*) const (Regex.cpp:224)
==566185== by 0x4E6B3D: llvm::(anonymous namespace)::normalizedMnemonic(llvm::StringRef const&, bool, bool, llvm::StringRef) (PrinterCapstone.cpp:2684)
==566185== by 0x4E6E17: llvm::(anonymous namespace)::getNormalMnemonic(llvm::StringRef, llvm::StringRef, bool, bool) (PrinterCapstone.cpp:2700)
==566185== by 0x4ECBBE: llvm::(anonymous namespace)::printInsnNameMapEnumEntry(llvm::StringRef const&, std::unique_ptr<MatchableInfo, std::default_delete > const&, llvm::raw_string_ostream&, llvm::raw_string_ostream&) (PrinterCapstone.cpp:3401)

To fix the bug, std::string is used instead of StringRef.

With this pull request, the invalid memory accesses are gone.

Sometimes there are corrupted characters in the generated files. It is
due to the result of Regex::sub being a std::string, which is passed
to the constructor of StringRef and then dropped prior to the use of the
StringRef. This is a use-after-free bug, found by valgrind:

==566185== Invalid read of size 1
==566185==    at 0x6738C5: smatcher (regengine.inc:164)
==566185==    by 0x6791E0: llvm_regexec (regexec.c:159)
==566185==    by 0x5FAE8F: llvm::Regex::match(llvm::StringRef, llvm::SmallVectorImpl<llvm::StringRef>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) const (Regex.cpp:105)
==566185==    by 0x4E6BE8: llvm::(anonymous namespace)::normalizedMnemonic(llvm::StringRef const&, bool, bool, llvm::StringRef) (PrinterCapstone.cpp:2683)
==566185==    by 0x4E6E17: llvm::(anonymous namespace)::getNormalMnemonic(llvm::StringRef, llvm::StringRef, bool, bool) (PrinterCapstone.cpp:2700)
==566185==    by 0x4ECBBE: llvm::(anonymous namespace)::printInsnNameMapEnumEntry(llvm::StringRef const&, std::unique_ptr<MatchableInfo, std::default_delete<MatchableInfo> > const&, llvm::raw_string_ostream&, llvm::raw_string_ostream&) (PrinterCapstone.cpp:3401)
==566185==    by 0x4EF42B: llvm::PrinterCapstone::asmMatcherEmitMatchTable(llvm::CodeGenTarget const&, AsmMatcherInfo&, llvm::StringToOffsetTable&, unsigned int) const (PrinterCapstone.cpp:3676)
==566185==    by 0x16F550: (anonymous namespace)::AsmMatcherEmitter::run() (AsmMatcherEmitter.cpp:2258)
==566185==    by 0x16FB75: (anonymous namespace)::EmitAsmMatcher(llvm::RecordKeeper&, llvm::raw_ostream&) (AsmMatcherEmitter.cpp:2306)
==566185==    by 0x6BEC3B: llvm::TableGenMain(char const*, std::function<bool (llvm::raw_ostream&, llvm::RecordKeeper&)>) (Main.cpp:136)
==566185==    by 0x56F387: main (TableGen.cpp:84)
==566185==  Address 0x951cf00 is 0 bytes inside a block of size 31 free'd
==566185==    at 0x484499B: operator delete(void*, unsigned long) (vg_replace_malloc.c:935)
==566185==    by 0x139604: std::__new_allocator<char>::deallocate(char*, unsigned long) (new_allocator.h:158)
==566185==    by 0x138074: std::allocator_traits<std::allocator<char> >::deallocate(std::allocator<char>&, char*, unsigned long) (alloc_traits.h:496)
==566185==    by 0x136777: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_destroy(unsigned long) (basic_string.h:292)
==566185==    by 0x134F93: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose() (basic_string.h:286)
==566185==    by 0x133549: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (basic_string.h:795)
==566185==    by 0x4E6B85: llvm::(anonymous namespace)::normalizedMnemonic(llvm::StringRef const&, bool, bool, llvm::StringRef) (PrinterCapstone.cpp:2684)
==566185==    by 0x4E6E17: llvm::(anonymous namespace)::getNormalMnemonic(llvm::StringRef, llvm::StringRef, bool, bool) (PrinterCapstone.cpp:2700)
==566185==    by 0x4ECBBE: llvm::(anonymous namespace)::printInsnNameMapEnumEntry(llvm::StringRef const&, std::unique_ptr<MatchableInfo, std::default_delete<MatchableInfo> > const&, llvm::raw_string_ostream&, llvm::raw_string_ostream&) (PrinterCapstone.cpp:3401)
==566185==    by 0x4EF42B: llvm::PrinterCapstone::asmMatcherEmitMatchTable(llvm::CodeGenTarget const&, AsmMatcherInfo&, llvm::StringToOffsetTable&, unsigned int) const (PrinterCapstone.cpp:3676)
==566185==    by 0x16F550: (anonymous namespace)::AsmMatcherEmitter::run() (AsmMatcherEmitter.cpp:2258)
==566185==    by 0x16FB75: (anonymous namespace)::EmitAsmMatcher(llvm::RecordKeeper&, llvm::raw_ostream&) (AsmMatcherEmitter.cpp:2306)
==566185==  Block was alloc'd at
==566185==    at 0x4841F2F: operator new(unsigned long) (vg_replace_malloc.c:422)
==566185==    by 0x138958: std::__new_allocator<char>::allocate(unsigned long, void const*) (new_allocator.h:137)
==566185==    by 0x13707F: std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) (alloc_traits.h:464)
==566185==    by 0x1354B5: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long) (basic_string.tcc:155)
==566185==    by 0x136F84: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long) (basic_string.tcc:328)
==566185==    by 0x1353D5: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long) (basic_string.tcc:420)
==566185==    by 0x1338FC: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::append(char const*, unsigned long) (basic_string.h:1422)
==566185==    by 0x14EB57: llvm::operator+=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, llvm::StringRef) (StringRef.h:900)
==566185==    by 0x5FB8C1: llvm::Regex::sub(llvm::StringRef, llvm::StringRef, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) const (Regex.cpp:224)
==566185==    by 0x4E6B3D: llvm::(anonymous namespace)::normalizedMnemonic(llvm::StringRef const&, bool, bool, llvm::StringRef) (PrinterCapstone.cpp:2684)
==566185==    by 0x4E6E17: llvm::(anonymous namespace)::getNormalMnemonic(llvm::StringRef, llvm::StringRef, bool, bool) (PrinterCapstone.cpp:2700)
==566185==    by 0x4ECBBE: llvm::(anonymous namespace)::printInsnNameMapEnumEntry(llvm::StringRef const&, std::unique_ptr<MatchableInfo, std::default_delete<MatchableInfo> > const&, llvm::raw_string_ostream&, llvm::raw_string_ostream&) (PrinterCapstone.cpp:3401)

To fix the bug, std::string is used instead of StringRef.
Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch!

@Rot127 Rot127 merged commit b020af5 into capstone-engine:auto-sync-18 May 16, 2025
3 checks passed
@jiegec jiegec deleted the fix-use-after-free branch May 16, 2025 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants