Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analyze: CLI options #1057

Merged
merged 11 commits into from
Mar 16, 2024
63 changes: 59 additions & 4 deletions c2rust-analyze/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,62 @@
# Usage

Build `c2rust-analyze`:

```sh
cargo build --release
```

Then, in the directory of a cargo project you wish to rewrite, run
`c2rust-analyze` on the project:

```sh
cargo run --bin c2rust-analyze -- tests/filecheck/insertion_sort.rs -L "$(rustc --print target-libdir)" --crate-type rlib
.../path/to/c2rust/target/release/c2rust-analyze build |& tee c2rust-analyze.log
```

This should produce a large amount of debug output, including a table at the
end listing the type and expression rewrites the analysis has inferred for the
`insertion_sort` function.
`c2rust-analyze` is currently at a prototype stage and produces verbose debug
output by default; the use of `tee` to capture the output to a log file allows
inspecting the results even when they exceed the length of the terminal
scrollback buffer.

`c2rust-analyze` does not modify the target project's source code by default;
it only prints the rewritten code to standard output. Look for `=====
BEGIN/END =====` markers in the output to see the proposed rewritten code for
each file, or rerun with the `--rewrite-in-place` option (that is,
`c2rust-analyze --rewrite-in-place build`) to apply the rewrites directly to
the source files.

`c2rust-analyze` may take a long time to run even on medium-sized codebases.
In particular, running the Polonius analysis on very large functions may take
several minutes (though Polonius results are cached after the first run). For
testing, it may be useful to comment out some modules from `lib.rs` to speed up
the analysis.


## Known limitations

The automated safety rewrites in `c2rust-analyze` only apply to a small subset
of unsafe Rust code. When `c2rust-analyze` encounters unsupported code, it
will report an error and skip rewriting the function in question.

Other notable limitations:

* `c2rust-analyze` does not remove the `unsafe` keyword from function
definitions, even when it succeeds at removing all unsafe operations from the
function. The user must remove the `unsafe` keyword manually where it is
appropriate to do so.

Note that even if a function contains only safe operations, it might still
need to be marked `unsafe` if it could break an invariant that other code
relies on for safety. For example, `Vec::set_len` only writes to the
`self.len` field (a safe operation), but it can be used to violate the
invariant `self.len <= self.cap`, which `Vec::as_slice` relies on for safety.

* In non-amalgamated builds, where cross-module function calls use `extern "C"
{ fn foo(); }` in the calling module and `#[no_mangle] fn foo() { ... }` in
the callee, `c2rust-analyze` may rewrite the signature of the `#[no_mangle]`
function definition in a way that's incompatible with the corresponding
`extern "C"` declaration in another module. This can lead to segfaults or
other undefined behavior at run time. This can be avoided by using an
amalgamated build of the C code (where all functions are placed in one
module), or by manually editing the function definition and/or declaration
after rewriting to ensure that the signatures match up.
116 changes: 104 additions & 12 deletions c2rust-analyze/src/analyze.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ use rustc_hir::def_id::CrateNum;
use rustc_hir::def_id::DefId;
use rustc_hir::def_id::DefIndex;
use rustc_hir::def_id::LocalDefId;
use rustc_hir::definitions::DefPathData;
use rustc_index::vec::IndexVec;
use rustc_middle::mir::visit::Visitor;
use rustc_middle::mir::AggregateKind;
Expand All @@ -57,6 +58,7 @@ use rustc_middle::ty::TyCtxt;
use rustc_middle::ty::TyKind;
use rustc_middle::ty::WithOptConstParam;
use rustc_span::Span;
use rustc_span::Symbol;
use std::collections::HashMap;
use std::collections::HashSet;
use std::env;
Expand Down Expand Up @@ -433,9 +435,8 @@ fn parse_def_id(s: &str) -> Result<DefId, String> {
Ok(def_id)
}

fn read_fixed_defs_list(path: &str) -> io::Result<HashSet<DefId>> {
fn read_fixed_defs_list(fixed_defs: &mut HashSet<DefId>, path: &str) -> io::Result<()> {
let f = BufReader::new(File::open(path)?);
let mut def_ids = HashSet::new();
for (i, line) in f.lines().enumerate() {
let line = line?;
let line = line.trim();
Expand All @@ -446,9 +447,77 @@ fn read_fixed_defs_list(path: &str) -> io::Result<HashSet<DefId>> {
let def_id = parse_def_id(&line).unwrap_or_else(|e| {
panic!("failed to parse {} line {}: {}", path, i + 1, e);
});
def_ids.insert(def_id);
fixed_defs.insert(def_id);
}
Ok(def_ids)
Ok(())
}

/// Examine each `DefId` in the crate, and add to `fixed_defs` any that doesn't match at least one
/// prefix in `prefixes`. For example, if `prefixes` is `foo,bar::baz`, only `foo`, `bar::baz`,
/// and their descendants will be eligible for rewriting; all other `DefId`s will be added to
/// `fixed_defs`.
fn check_rewrite_path_prefixes(tcx: TyCtxt, fixed_defs: &mut HashSet<DefId>, prefixes: &str) {
let hir = tcx.hir();
let prefixes: HashSet<Vec<Symbol>> = prefixes
.split(',')
// Exclude empty paths. This allows for leading/trailing commas or double commas within
// the list, which may result when building the list programmatically.
.filter(|prefix| prefix.len() > 0)
.map(|prefix| prefix.split("::").map(Symbol::intern).collect::<Vec<_>>())
.collect();
let sym_impl = Symbol::intern("{impl}");
// Buffer for accumulating the path to a particular def.
let mut path_buf = Vec::with_capacity(10);
for ldid in tcx.hir_crate_items(()).definitions() {
let def_path = hir.def_path(ldid);

// Traverse `def_path`, adding each `Symbol` to `path_buf`. We check after each push
// whether `path_buf` matches something in `prefixes`, which has the effect of checking
// every prefix of the path of `ldid`.
path_buf.clear();
let mut matched = false;
for ddpd in &def_path.data {
match ddpd.data {
// We ignore these when building the `Symbol` path.
DefPathData::CrateRoot
| DefPathData::ForeignMod
| DefPathData::Use
| DefPathData::GlobalAsm
| DefPathData::ClosureExpr
| DefPathData::Ctor
| DefPathData::AnonConst
| DefPathData::ImplTrait => continue,
DefPathData::TypeNs(sym)
| DefPathData::ValueNs(sym)
| DefPathData::MacroNs(sym)
| DefPathData::LifetimeNs(sym) => {
path_buf.push(sym);
}
DefPathData::Impl => {
path_buf.push(sym_impl);
}
}
if prefixes.contains(&path_buf) {
matched = true;
break;
}
}

if !matched {
fixed_defs.insert(ldid.to_def_id());
}
}
}

fn get_fixed_defs(tcx: TyCtxt) -> io::Result<HashSet<DefId>> {
let mut fixed_defs = HashSet::new();
if let Ok(path) = env::var("C2RUST_ANALYZE_FIXED_DEFS_LIST") {
read_fixed_defs_list(&mut fixed_defs, &path)?;
}
if let Ok(prefixes) = env::var("C2RUST_ANALYZE_REWRITE_PATHS") {
check_rewrite_path_prefixes(tcx, &mut fixed_defs, &prefixes);
}
Ok(fixed_defs)
}

fn run(tcx: TyCtxt) {
Expand All @@ -458,11 +527,7 @@ fn run(tcx: TyCtxt) {
}

// Load the list of fixed defs early, so any errors are reported immediately.
let fixed_defs = if let Ok(path) = env::var("C2RUST_ANALYZE_FIXED_DEFS_LIST") {
read_fixed_defs_list(&path).unwrap()
} else {
HashSet::new()
};
let fixed_defs = get_fixed_defs(tcx).unwrap();

let mut gacx = GlobalAnalysisCtxt::new(tcx);
let mut func_info = HashMap::new();
Expand Down Expand Up @@ -918,7 +983,14 @@ fn run(tcx: TyCtxt) {

// Items in the "fixed defs" list have all pointers in their types set to `FIXED`. For
// testing, putting #[c2rust_analyze_test::fixed_signature] on an item has the same effect.
//
// Functions in the list are also added to `gacx.fns_fixed`.
for ldid in tcx.hir_crate_items(()).definitions() {
// TODO (HACK): `Clone::clone` impls are omitted from `fn_sigs` and cause a panic below.
if is_impl_clone(tcx, ldid.to_def_id()) {
continue;
}

let def_fixed = fixed_defs.contains(&ldid.to_def_id())
|| util::has_test_attr(tcx, ldid, TestAttr::FixedSignature);
match tcx.def_kind(ldid.to_def_id()) {
Expand All @@ -928,6 +1000,7 @@ fn run(tcx: TyCtxt) {
None => panic!("missing fn_sig for {:?}", ldid),
};
make_sig_fixed(&mut gasn, lsig);
gacx.fns_fixed.insert(ldid.to_def_id());
}

DefKind::Struct | DefKind::Enum | DefKind::Union => {
Expand Down Expand Up @@ -1163,6 +1236,14 @@ fn run(tcx: TyCtxt) {
// Generate rewrites for all functions.
let mut all_rewrites = Vec::new();

let mut manual_shim_casts = rewrite::ManualShimCasts::No;
if let Ok(val) = env::var("C2RUST_ANALYZE_USE_MANUAL_SHIMS") {
if val == "1" {
manual_shim_casts = rewrite::ManualShimCasts::Yes;
}
}
let manual_shim_casts = manual_shim_casts;

// It may take multiple tries to reach a state where all rewrites succeed.
loop {
func_reports.clear();
Expand All @@ -1187,7 +1268,7 @@ fn run(tcx: TyCtxt) {
}

for &ldid in &all_fn_ldids {
if gacx.fn_failed(ldid.to_def_id()) {
if gacx.fn_skip_rewrite(ldid.to_def_id()) {
continue;
}

Expand Down Expand Up @@ -1249,7 +1330,12 @@ fn run(tcx: TyCtxt) {
let mut any_failed = false;
for def_id in shim_fn_def_ids {
let r = panic_detail::catch_unwind(AssertUnwindSafe(|| {
all_rewrites.push(rewrite::gen_shim_definition_rewrite(&gacx, &gasn, def_id));
all_rewrites.push(rewrite::gen_shim_definition_rewrite(
&gacx,
&gasn,
def_id,
manual_shim_casts,
));
}));
match r {
Ok(()) => {}
Expand Down Expand Up @@ -1421,7 +1507,13 @@ fn run(tcx: TyCtxt) {
// ----------------------------------

// Apply rewrite to all functions at once.
rewrite::apply_rewrites(tcx, all_rewrites);
let mut update_files = rewrite::UpdateFiles::No;
if let Ok(val) = env::var("C2RUST_ANALYZE_REWRITE_IN_PLACE") {
if val == "1" {
update_files = rewrite::UpdateFiles::Yes;
}
}
rewrite::apply_rewrites(tcx, all_rewrites, update_files);

// ----------------------------------
// Report caught panics
Expand Down
26 changes: 25 additions & 1 deletion c2rust-analyze/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,9 @@ pub struct GlobalAnalysisCtxt<'tcx> {
/// `DefId`s of functions where analysis failed, and a [`PanicDetail`] explaining the reason
/// for each failure.
pub fns_failed: HashMap<DefId, PanicDetail>,
/// `DefId`s of functions that were marked "fixed" (non-rewritable) through command-line
/// arguments.
pub fns_fixed: HashSet<DefId>,

pub field_ltys: HashMap<DefId, LTy<'tcx>>,

Expand Down Expand Up @@ -698,6 +701,7 @@ impl<'tcx> GlobalAnalysisCtxt<'tcx> {
.map(|known_fn| (known_fn.name, known_fn))
.collect(),
fns_failed: HashMap::new(),
fns_fixed: HashSet::new(),
field_ltys: HashMap::new(),
static_tys: HashMap::new(),
addr_of_static: HashMap::new(),
Expand Down Expand Up @@ -759,6 +763,7 @@ impl<'tcx> GlobalAnalysisCtxt<'tcx> {
ref mut fn_sigs,
known_fns: _,
fns_failed: _,
fns_fixed: _,
ref mut field_ltys,
ref mut static_tys,
ref mut addr_of_static,
Expand Down Expand Up @@ -815,7 +820,7 @@ impl<'tcx> GlobalAnalysisCtxt<'tcx> {
}
}

pub fn fn_failed(&mut self, did: DefId) -> bool {
pub fn fn_failed(&self, did: DefId) -> bool {
self.fns_failed.contains_key(&did)
}

Expand All @@ -831,6 +836,25 @@ impl<'tcx> GlobalAnalysisCtxt<'tcx> {
self.fns_failed.keys().copied()
}

pub fn fn_skip_rewrite(&self, did: DefId) -> bool {
self.fn_failed(did) || self.fns_fixed.contains(&did)
}

/// Iterate over the `DefId`s of all functions that should skip rewriting.
pub fn iter_fns_skip_rewrite<'a>(&'a self) -> impl Iterator<Item = DefId> + 'a {
// This let binding avoids a lifetime error with the closure and return-position `impl
// Trait`.
let fns_fixed = &self.fns_fixed;
// If the same `DefId` is in both `fns_failed` and `fns_fixed`, be sure to return it only
// once.
fns_fixed.iter().copied().chain(
self.fns_failed
.keys()
.copied()
.filter(move |did| !fns_fixed.contains(&did)),
)
}

pub fn known_fn(&self, def_id: DefId) -> Option<&'static KnownFn> {
let symbol = self.tcx.symbol_name(Instance::mono(self.tcx, def_id));
self.known_fns.get(symbol.name).copied()
Expand Down
48 changes: 47 additions & 1 deletion c2rust-analyze/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ use analyze::AnalysisCallbacks;
use anyhow::anyhow;
use anyhow::ensure;
use anyhow::Context;
use clap::Parser;
use clap::{ArgAction, Parser};
use rustc_driver::RunCompiler;
use rustc_driver::TimePassesCallbacks;
use rustc_session::config::CrateType;
Expand Down Expand Up @@ -73,6 +73,30 @@ struct Args {
#[clap(long)]
rustflags: Option<OsString>,

/// Comma-separated list of paths to rewrite. Any item whose path does not start with a prefix
/// from this list will be marked non-rewritable (`FIXED`).
#[clap(long, action(ArgAction::Append))]
rewrite_paths: Vec<OsString>,
/// Rewrite source files on disk. The default is to print the rewritten source code to stdout
/// as part of the tool's debug output.
#[clap(long)]
rewrite_in_place: bool,
/// Use `todo!()` placeholders in shims for casts that must be implemented manually.
///
/// When a function requires a shim, and the shim requires a cast that can't be generated
/// automatically, the default is to cancel rewriting of the function. With this option,
/// rewriting proceeds as normal, and shim generation emits `todo!()` in place of each
/// unsupported cast.
#[clap(long)]
use_manual_shims: bool,

/// Read a list of defs that should be marked non-rewritable (`FIXED`) from this file path.
/// Run `c2rust-analyze` without this option and check the debug output for a full list of defs
/// in the crate being analyzed; the file passed to this option should list a subset of those
/// defs.
#[clap(long)]
fixed_defs_list: Option<PathBuf>,

/// `cargo` args.
cargo_args: Vec<OsString>,
}
Expand Down Expand Up @@ -327,6 +351,10 @@ where
fn cargo_wrapper(rustc_wrapper: &Path) -> anyhow::Result<()> {
let Args {
rustflags,
rewrite_paths,
rewrite_in_place,
use_manual_shims,
fixed_defs_list,
cargo_args,
} = Args::parse();

Expand Down Expand Up @@ -362,6 +390,24 @@ fn cargo_wrapper(rustc_wrapper: &Path) -> anyhow::Result<()> {
.env(RUSTC_WRAPPER_VAR, rustc_wrapper)
.env(RUST_SYSROOT_VAR, &sysroot)
.env("RUSTFLAGS", &rustflags);

if let Some(ref fixed_defs_list) = fixed_defs_list {
cmd.env("C2RUST_ANALYZE_FIXED_DEFS_LIST", fixed_defs_list);
}

if rewrite_paths.len() > 0 {
let rewrite_paths = rewrite_paths.join(OsStr::new(","));
cmd.env("C2RUST_ANALYZE_REWRITE_PATHS", rewrite_paths);
}

if rewrite_in_place {
cmd.env("C2RUST_ANALYZE_REWRITE_IN_PLACE", "1");
}

if use_manual_shims {
cmd.env("C2RUST_ANALYZE_USE_MANUAL_SHIMS", "1");
}

Ok(())
})?;

Expand Down
Loading
Loading