Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using variable size IME events #1628

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Using variable size IME events #1628

wants to merge 2 commits into from

Conversation

jimon
Copy link
Contributor

@jimon jimon commented Jan 11, 2023

Description

Continuation of work from #1120

Unity historically was using fixed size IME event type with 64 characters, which is not sufficient in some cases, like complex emojis.
Some years ago a new IME event type was added that instead contains a variable size array of characters that solves the problem above.

Native code reports both events at a same time on all platforms where IME is implemented.

Changes made

  • Adding new event type.
  • Using it for IME composition events.
  • Modifying IMECompositionString to be a small string optimization container, where below 64 characters it will not allocate, but above will do a managed allocation.
  • Slightly breaking the API as IMECompositionString cannot be used in the old event anymore due to adding a new field there.

@unity-cla-assistant
Copy link

unity-cla-assistant commented Jan 11, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ Dmytro Ivanov
❌ StayTalm


Dmytro Ivanov seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Collaborator

@lyndon-unity lyndon-unity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks sensible but a couple of queries changes before I think I can approve.


public FourCC typeStatic => Type;

public static IMECompositionEvent Create(int deviceId, string compositionString, double time)
{
var inputEvent = new IMECompositionEvent();
inputEvent.baseEvent = new InputEvent(Type, InputEvent.kBaseEventSize + sizeof(int) + (sizeof(char) * kIMECharBufferSize), deviceId, time);
inputEvent.compositionString = new IMECompositionString(compositionString);
inputEvent.length = compositionString.Length > kIMECharBufferSize ? kIMECharBufferSize : compositionString.Length;
fixed(char* dst = compositionString)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have both compositionString member and parameter to function isn't one of the src or dst pointing to the wrong one. I think we need dst = buffer

ev->baseEvent = new InputEvent(Type, sizeInBytes, deviceId, time);
ev->length = str.Length;

if (str.Length > 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need braces around the next 2 lines ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need braces the same way as:

if(...)
  if(...)
    doSomething();

doesn't need braces either.

But maybe I can add them to make code formatter happy.


public char this[int index]
{
get
{
if (m_ManagedString != null)
return m_ManagedString[index];

if (index >= Count || index < 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go before the m_ManagedString line above for safety?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter that much, in this case [ ] operator of the string will do the check anyway.

{
Debug.Assert(characters != null);
fixed(char* dst = m_FixedBuffer)
UnsafeUtility.MemCpy(dst, characters, m_Size * sizeof(char));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering here if this is a call to native code and if spinning a loop and staying in managed is faster.


InputSystem.QueueEvent(new InputEventPtr((InputEvent*)ev));

eventBuffer.Dispose();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit of digging to understand InputSystem.QueueEvent it seems it makes a copy of the input event and therefore this dispose is valid. No feedback here. Just noting for my own reminder.


public FourCC typeStatic => Type;

public static IMECompositionEvent Create(int deviceId, string compositionString, double time)
{
var inputEvent = new IMECompositionEvent();
inputEvent.baseEvent = new InputEvent(Type, InputEvent.kBaseEventSize + sizeof(int) + (sizeof(char) * kIMECharBufferSize), deviceId, time);
inputEvent.compositionString = new IMECompositionString(compositionString);
inputEvent.length = compositionString.Length > kIMECharBufferSize ? kIMECharBufferSize : compositionString.Length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't immediately clear to me that this was a clamp. Thought it was a bug at first.

inputEvent.length = Math.Min(compositionString.Length, kIMECharBufferSize);

might be clearer.

{
size = 0;
return;
m_ManagedString = new string(characters, 0, length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a string like this, with a pointer to memory that we don't own, seems like a bad ideaTM. In the happy path through InputManager, that memory points to the native memory for the event buffer, which is going to be overwritten in the next frame. If the user keeps hold of the IMECompositionString outside the event handler, that's probably a crash. So I guess we need to make a copy here. Maybe use native memory, implement IDisposable on the struct, and add leak detection in the editor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that constructor would make a copy of the buffer, but I could be wrong.

public const int Type = 0x494D4553;

[FieldOffset(0)]
public InputEvent baseEvent;

[FieldOffset(InputEvent.kBaseEventSize)]
public IMECompositionString compositionString;
private int length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead of breaking the API here, we leave IMECompositionString with the same data layout as before, but when the string is greater than kIMECharBufferSize characters long, we (ab)use the first 8 bytes of the fixed buffer to point at a string in some centrally managed storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need some versioning then, which is not optimal. I would really prefer to use a normal managed string for such case and not reinvent C++ in C#. Question is more how not to break API here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, using a normal string is the "right" way, but we can't do that and not break the API, so I see the choice as delay this PR until version 2 or make a hack like this that would keep the original data layout. I'm not sure what you mean by versioning? I don't see where we'd need something like that.

Copy link
Contributor Author

@jimon jimon Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my undertraining a managed referenced can't be aliased with unmanaged memory, otherwise GC can't collect it.
So because IMECompositionString is not IDisposable, we can't have a finalizer on it, meaning it would need to point to some other place that stores a string.

In that case lifetime of place storing a string and potential usages of it are uncorrelated, meaning someone could store a copy of IMECompositionString for a long time, and then query it's content.

For such cases a standard solution in C++ is to store a pointer + version of the data, and if handle has different version than a central location, then data is lost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards IMECompositionString2 and calling it a day, we can clean up when mystical version 2 will come if ever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMECompositionString2 would be problematic because of Keyboard. I don't think we can do that, unless you'd also have Keyboard.onIMECompositionChangeV2.

Yep, I'm saying let it point to some other place. Actually it wouldn't even have to be a pointer, just an integer lookup into some table of strings. And there's no real issue of versioning. Access would be strictly through the IMECompositionString, and lets just make it IDisposable (we need that anyway because of the issue I mentioned above about building a string from a pointer. Don't think that makes a copy.). You might need to do a swap back or something on disposal and update the index in the affected composition string, but that should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants