There’s a way to do this in Auto1111 (sort of):
This feels pretty janky, though. I think you could do it better (and in one shot) in comfyUI by processing the partially generated latent, feeding that result to a controlnet preprocessor node, then adding the resulting controlnet conditioning plus the original half-finished latent to a new ksampler node. You’d then finish generation (continuing from the original latent) at whatever step you split off.
Agreed on the Auto1111 UI; I like the idea of ComfyUI but making quick changes + testing rapidly feels like a pain. I always feel like I must be doing something wrong. I do appreciate how easy it is to replicate a workflow, though.
What are you running SDXL in? I tried it in comfy UI yesterday and it seems really powerful, but it seems like it always takes a long time to mess around with images. I haven’t tried it in SD.Next or Auto1111 yet.
LOL. I didn’t immediately get this one. Well done.
Thanks for reporting on that! It’s honestly rare to hear anyone using one, so real-world info is sparse haha. I was seriously considering an RTX 7900 series, but skipped it because of reading a few scattered experiences like yours. Maybe someday I’ll switch to Linux haha.
This is the what I’m aware of for ROCm: AMD: Partial RDNA 3 Video Card Support Coming to Future ROCm Releases. TL;DR is that it’s still not clearly committed with a date, and consumer GPU support is pretty weak.
There’s DirectML, which is what SD.Next (Vlad Diffusion) and some others use in Windows. I think it works OK, but can be slow, and it seemed to have a lot of bugs and limited support from perusing the issues lists (though I could be wrong there). I haven’t tried it, so others may know better. For perspective, I analyzed the public Vladmandic SD benchmark data and saw 0 7900 XT(X) results using Windows. It seems like almost nobody uses windows + AMD.
Is anyone running SD on AMD GPUs in Windows? The AMD benches seem to all be from Linux because of ROCm, presumably, but I’d be curious to know how much performance loss comes from using DirectML on, say, a 7900XT in Windows.
It kind of lends it a “steam on water” vibe, which works pretty well for me
I don’t recognize that fork of it - what are the differences there? I’ve been using vladmandic’s fork for a while and found it quite good. I still have the original kicking around as well, but don’t use it much.
This is great haha
Yeah, I glanced over it and couldn’t immediately see why the laptop one was benchmarking faster. There were only 7 samples or something for the laptop one, though, so it could just be a fluke. Maybe the laptop folks are using only the best optimization or something. I’ll keep playing with it when I get some spare time.
I’m a Windows caveman over here, but you should definitely post your Linux findings on here when you’re ready! Also, if you have suggestions for how to slice this, I’d be happy to take a stab at it. This was a quick thing I did while ogling a GPU upgrade, so it’s not my best work haha.
I’ve always loved that external GPUs exist, even though I’ve never been in a situation where they were a realistic choice for me.
Yeah, the data is definitely not perfect. If I get a chance, I’ll poke around and see if maybe it’s one person throwing off the results. Maybe next time I’ll toss “n=##” or something on top of the bars to show just how many samples exist for each card. I also eventually want to filter by optimization, etc. for the next visualization, though I’m not sure what the best way is to do that except for maybe just doing “best for each card” or something.
The link in the header post includes the methodology used to gather the data, but I don’t think it fully answers your first question. I imagine there are a few important variables and a few ways to get variation in results for otherwise identical hardware (e.g. running odd versions of things or having the wrong/no optimizations selected). I tried to mitigate that by using medians and only taking cards with 5+ samples, but it’s certainly not perfect. At least things seem to trend as you’d expect.
I’m really glad vladmandic made the extension & data available. It was super tough to find a graph like this that was remotely up-to-date. Maybe I’ll try filtering by some other things in the near future, like optimization method, benchmark age (e.g. eliminating stuff prior to 2023), or VRAM amount.
For your last question, I’m not sure the host bus configuration is recorded – you can see the entirety of what’s in a benchmark dataset by scrolling through the database, and I don’t see it. I suspect PCIE config does matter for a card on a given system, but that its impact is likely smaller than the choice of GPU itself. I’d definitely be curious to see how it breaks down, though, as my MoBo doesn’t support PCIE 4.0, for example.
Happy to help!
Note: to view full-res, use right-click --> open image in new tab.
Oxidation (and other processes) do affect coffee flavor, and grinding it up increases surface area / exposure to oxygen, speeding that up. Putting it in the fridge seems to also worsen flavor, but the freezer seems to be pretty reliable. Here’s a nice video discussing this by a weird coffee person (James Hoffmann): Should you freeze coffee beans?
Also, KGLW, nice!