BUG in Transformer2D

Hi there,

First of all thanks for putting out the code, great work! I have noticed one a thing in the code.

In transformer_2d code where update forward method to add prompt/width/height information
https://github.com/donahowe/AutoStudio/blob/9c9820fe48374f2c564b9a7a3348f1abbc118350/model/transformer_2d.py#L213-L227

which we then `refer` in the following attention block

https://github.com/donahowe/AutoStudio/blob/9c9820fe48374f2c564b9a7a3348f1abbc118350/model/transformer_2d.py#L319-L331

However, before that line, this following code gets executed:

https://github.com/donahowe/AutoStudio/blob/9c9820fe48374f2c564b9a7a3348f1abbc118350/model/transformer_2d.py#L285-L302

Since we pass continuous input to this code, the height and width parameters are updated. I was wondering if this was a bug? If width and height meant to be inferred from the tensor dimensions, perhaps we don't need to pass it here?



	def forward(
	self,
	hidden_states: torch.Tensor,
	encoder_hidden_states: Optional[torch.Tensor] = None,
	timestep: Optional[torch.LongTensor] = None,
	class_labels: Optional[torch.LongTensor] = None,
	cross_attention_kwargs: Dict[str, Any] = None,
	attention_mask: Optional[torch.Tensor] = None,
	encoder_attention_mask: Optional[torch.Tensor] = None,
	return_dict: bool = True,
	prompt_book_info: list = None,
	layout_mask=None,
	height=None,
	width=None,
	):

	hidden_states, cross_attn_prob = block(
	hidden_states,
	attention_mask=attention_mask,
	encoder_hidden_states=encoder_hidden_states,
	encoder_attention_mask=encoder_attention_mask,
	timestep=timestep,
	cross_attention_kwargs=cross_attention_kwargs,
	class_labels=class_labels,
	prompt_book_info=prompt_book_info,
	layout_mask=layout_mask,
	height=height,
	width=width,
	)

	if self.is_input_continuous:
	batch, _, height, width = hidden_states.shape
	residual = hidden_states

	hidden_states = self.norm(hidden_states)
	if not self.use_linear_projection:
	hidden_states = self.proj_in(hidden_states, lora_scale)
	inner_dim = hidden_states.shape[1]
	hidden_states = hidden_states.permute(0, 2, 3, 1).reshape(batch, height * width, inner_dim)
	else:
	inner_dim = hidden_states.shape[1]
	hidden_states = hidden_states.permute(0, 2, 3, 1).reshape(batch, height * width, inner_dim)
	hidden_states = self.proj_in(hidden_states, scale=lora_scale)

	elif self.is_input_vectorized:
	hidden_states = self.latent_image_embedding(hidden_states)
	elif self.is_input_patches:
	hidden_states = self.pos_embed(hidden_states)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG in Transformer2D #48

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

BUG in Transformer2D #48

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions