RFC: `c"…"` string literals #3348

m-ou-se · 2022-11-15T13:12:10Z

m-ou-se · 2022-11-15T13:23:08Z

Three weeks ago, the lang team said they would be interested in potentially doing this in the future. So here's an RFC. :)

text/3348-c-str-literal.md

clarfonthey · 2022-11-15T13:43:40Z

I'm on board. I'd even consider that a future extension might be to allow os"..." string literals, but that seems probably more iffy since it'd be the first case of a language item not being available in no_std environments. (I think?)

One other potential thing to thing about is whether c"..." string patterns should be allowed. Like, completely outside of the realm of constant patterns, if c"..." would be considered a valid pattern for macros, etc.

BurntSushi · 2022-11-15T13:53:34Z

text/3348-c-str-literal.md

+
+Accepted escape codes: [Quote](https://doc.rust-lang.org/reference/tokens.html#quote-escapes) & [Unicode](https://doc.rust-lang.org/reference/tokens.html#unicode-escapes) & [Byte](https://doc.rust-lang.org/reference/tokens.html#byte-escapes).
+
+Unicode characters are accepted and encoded as UTF-8. That is, `c"🦀"`, `c"\u{1F980}"` and `c"\xf0\x9f\xa6\x80"` are all accepted and equivalent.


I wish byte string literals had this support too, so big 👍 on this!

It might be worth proposing that in a separate RFC. That would also resolve one unresolved question of concat_bytes, if we accept that mixing UTF-8 and non-UTF-8 in byte strings is okay.

Wrote an RFC for that: #3349

m-ou-se · 2022-11-15T14:03:59Z

I'd even consider that a future extension might be to allow os"..." string literals

I was hoping to make things like os!"..." possible without extending the language for each prefix: #3267. But that proposal turned out to be quite controversial and was rejected.

An alternative would be to allow literals like "…" to implicitly convert to more than just &str (just like how 123 can be u32 or i64, etc. etc.). Some kind of const FromLiteral trait or something, once we have const traits. Then "…" could implicitly become a &CStr, and 123 a BigNum, etc. Not sure how exactly that feature would work though, but I'll mention it in the alternatives section.

afetisov · 2022-11-15T14:21:31Z

One concern I have is that if single-letter prefixes become common, extending the language with new prefixes can become confusing. Although, if br and cr are treated as fixed literals rather than composition, this may be a non-issue.

text/3348-c-str-literal.md

Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>

nagisa · 2022-11-15T17:25:44Z

I have two rhetorical questions with regards to the RFC text:

What does the dependence of this feature on the standard library types means for #[no_core] crates? Would it be possible to do something/anything that would make #[no_core] crates utilizing the c"" literals to work out of the box still?
What does the defaulting to UTF-8 encoding mean when interacting with C source that targets non-UTF-8 locales (lets say the linked-in C code is encoded in JIS, and the environment is also set up for JIS?) How does that interact with whatever reasonable assumptions a developer might make about c""?

m-ou-se · 2022-11-15T17:56:12Z

What does the dependence of this feature on the standard library types means for #[no_core] crates? Would it be possible to do something/anything that would make #[no_core] crates utilizing the c"" literals to work out of the box still?

Do we even support no_core? I suppose it just means that they'd have to define the CStr lang item if they want to use c"" syntax. I think we could make not the type but a constructor function the lang item, such that they can decide themselves what to do with the [u8; N]. (In core, that'd basically be CStr::from_bytes_with_nul_unchecked.)

What does the defaulting to UTF-8 encoding mean when interacting with C source that targets non-UTF-8 locales (lets say the linked-in C code is encoded in JIS, and the environment is also set up for JIS?) How does that interact with whatever reasonable assumptions a developer might make about c""?

The exact same as would happen when using regular string literals. For example, libc::puts("我名字叫玛拉。".as_ptr() as _) is already possible. It'll just pass the string as UTF-8 encoded bytes. 🤷‍♀️

text/3348-c-str-literal.md

joshtriplett · 2022-11-15T19:45:17Z

text/3348-c-str-literal.md

+
+- Also add `c'…'` C character literals? (`u8`, `i8`, `c_char`, or something more flexible?)
+
+- Should we make `&CStr` a thin pointer before stabilizing this? (If so, how?)


I think this should be a blocker on stabilization, yeah.

I don't see how this feature is blocked by that at all really. It produces an &'static CStr regardless of what &CStr itself is made of.

@Kixiron To be clear, I think considering that question should be a blocker for stabilization.

Given that a major use case of this will be FFI, it seems important that we have a simple, not-error-prone way of passing a C string to C functions. If we decide that &CStr wasn't that mechanism, then we should decide what that mechanism should be, and make sure c"..." works well with that.

rfcbot · 2022-11-29T19:25:20Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2022-12-09T19:36:42Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

CAD97 · 2022-12-13T09:08:07Z

Just a minor additional note: I want to second that even if c"…" and bc"…" both create &CStr, having the former carry a guarantee of WF UTF-8 is beneficial to readers of the code that the former is known to be UTF-8 encoded (and the latter probably intended to contain non-UTF-8 encoded data). This includes procedural macros which are capable of seeing the prefix used and using the UTF-8 guarantee for 3rd party guaranteed UTF-8 CStr variants like e.g. cstr8::CStr8 (disclaimer: my own crate) and interop with C++ std::u8string/std::u8string_view.

Just having c"…" be &CStr and allow arbitrary nonnul bytes is probably the more practical choice. The proc macro which would've used the guaranteed-UTF-8 can just as easily take a normal string literal and convert it to a c"…" literal internally like it would today (but benefitting from the automatic interior-nul checking).

(Polymorphic string literals is probably the ideal long-term position, but having c", c8", c16", u", u8", u16", u32", char" (etc. or w/e) prefixes to explicitly disambiguate which string type from whatever string types this theoretical future std provides is still reasonable and a good idea. (Super explicit: not proposing any of these at this time.))

However, as a data point, the windows crate provides c!("…") as just concat!("…", "\0").as_ptr(), and despite the lack of interior-nul checking, the guaranteed-UTF-8 is useful. (They also currently provide w! for the same thing but for UTF-16, and h! for HSTRING.) Asking the team working on the windows crate how they'd ideally like to utilize c"…" is probably worth doing sometime before stabilization. (Not to prioritize windows over Linux or macOS; it's just what I'm familiar with. It's probably worth asking the Rust-for-Linux and Android people for their input as well.)

tmandry · 2022-12-14T23:23:01Z

Huzzah! The @rust-lang/lang team has decided to accept this RFC.

To track further discussion, subscribe to the tracking issue here:
rust-lang/rust#105723

…r-errors Implement RFC 3348, `c"foo"` literals RFC: rust-lang/rfcs#3348 Tracking issue: rust-lang#105723

Implement RFC 3348, `c"foo"` literals RFC: rust-lang/rfcs#3348 Tracking issue: #105723

rust-lang/rfcs#3348 rust-lang/rust#105723 rust-lang/rust#117472

m-ou-se added T-lang Relevant to the language team, which will review and decide on the RFC. A-syntax Syntax related proposals & ideas labels Nov 15, 2022

Add c_str_literal rfc.

9fdd8f1

m-ou-se force-pushed the c-str-literal branch from 6323fbc to 9fdd8f1 Compare November 15, 2022 13:13

This was referenced Nov 15, 2022

Add cstr! macro for creating &'static CStrs. rust-lang/libs-team#103

Closed

Add cstr! macro. rust-lang/rust#101607

Closed

m-ou-se added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Nov 15, 2022

afetisov reviewed Nov 15, 2022

View reviewed changes

text/3348-c-str-literal.md Outdated Show resolved Hide resolved

clarfonthey reviewed Nov 15, 2022

View reviewed changes