It would be nice to be able to bring to light the price gouging that is taking place in Canada with regards to grocery stores.

  • Daniel Quinn@lemmy.ca
    link
    fedilink
    arrow-up
    18
    ·
    1 year ago

    The project is Open source, so you might be able to leverage it for Canadian data. All you need is:

    • Understanding of the expected format for the project
    • Access to data from Canadian retailers. This can be acquired via APIs (these are usually free) or by scraping their sites.
    • JohnnyCanuck@lemmy.ca
      link
      fedilink
      arrow-up
      12
      ·
      1 year ago

      At the bottom of the chain on mastodon the creator says they use the search APIs of the store websites. I wouldn’t have expected those to be easily accessible!

      • Daniel Quinn@lemmy.ca
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        Yeah a lot of chains even have a documented, developer-friendly API. If that’s not available though, you can usually figure out the API just by looking at the calls your browser makes when visiting a page. Most sites use a REST API for catalog pages that’s then rendered out with JavaScript.

        If that doesn’t work, then you can usually scrape everything with Selenium. It’s a little harder to do, but still quite manageable, though that usually has to be a background job, as it’s slow.

  • nyan@lemmy.cafe
    link
    fedilink
    English
    arrow-up
    15
    ·
    1 year ago

    The issue with this sort of thing is primarily one of data entry, rather than “tech savvy” as such. Defining the database is easy compared to getting the data in there.

    Quick options would include parsing the information out of the stores’ websites (possible, but if Javascript is involved you may be looking at puppeting a browser with Selenium, which isn’t fast and can get tedious, and the approach depends on the websites being complete, accurate, and up-to-date), or hacking or snooping on the stores’ own mobile apps (if they have them) to get price information in a usable format. Approaches like this are inherantly brittle, as even trivial changes made from the grocery chains’ end can cause them to break. Scraping information without a defined API or the cooperation of the owner of the data is a moving target. From experience, I can tell you that it gets annoying fast.

    In the case of the Austrian government, they probably wanted that cooperation and defined API. Which would have required careful negotiations with each company and paid programmers looking at the corporate databases. That would have increased their cost and lengthened their projected timeframe. Corruption and corporate greed did the rest.

    • spacecowboy@sh.itjust.worksOP
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      So essentially it was possible over there due to proper/favourable conditions, whereas here it would be much more difficult?

      • elmicha@feddit.de
        link
        fedilink
        arrow-up
        6
        ·
        1 year ago

        the responsible minister claimed it’s an immense task and will take til autumn. It will  only include 16 product categories (think flour, milk,etc.). And it will only be updated once a week.

        I mean that’s pretty pathetic. Better than nothing, but “only updated once a week” sounds like “the intern who has to enter the prices works only for 20 hours”, not like they created an API and told the grocery chains to upload their prices.

      • nyan@lemmy.cafe
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        Unknown. I don’t use the grocery chains’ websites (I’m of the “go to the nearest physical store and figure it out once there” persuasion), so I don’t know what the complexity level would be. It’s possible that they’re all older-school sites where you can lift the data straight from the HTML, which is relatively fast.

    • blakcod@lemmy.ca
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      Are we looking to expand what’s here on the Grocery Tracker to incorporate what they are doing with the Austrian site?

      I’d also like to look at other pinch points of government heel dragging. Housing, energy, medical, transportation, telecom, news etc. We all see these government contracts go out for seven figures and it’s always shown to be blown out of proportion.

      A nice added bonus to the project in Austria was someone giving historical data. It would be great to have a similar leg up for Canada.

  • Rentlar@lemmy.ca
    link
    fedilink
    arrow-up
    7
    ·
    1 year ago

    With the advent of e-ink price tags, it wouldn’t be surprising to me if something similar goes on with the prices of generic and medium tier items.

    We’d have to see if an API is available somewhere first.

    • Voroxpete@sh.itjust.works
      link
      fedilink
      arrow-up
      7
      ·
      edit-2
      1 year ago

      Even without an API it should be possible, in theory, to just parse the data directly from their websites.

      This also gives the grocery stores less of a leg to stand on in terms of legal or practical recourse. They chose to create a publicly browsable database of their prices; all you’re doing is browsing it.

    • TemporaryBoyfriend@lemmy.ca
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      Huh. Now you’ve got me wondering… Could you leave a device hidden in store that receives the IR signals that program the tags, capture that information, then parse it out later? You could literally log prices changes at the shelves, in real time.

  • Otter@lemmy.caM
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    If you do get started with this, I’d love to follow along and find a place where I can help. If you guys make a community or mastodon account for example, please link it :)

  • Sethayy@sh.itjust.works
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    Would a system identifying products from a recipt work for this? combined with other data sources (like web scraping) it would make it a lot easier to crowdsource the data, even if only sortaa technically inclined people do it

    • nyan@lemmy.cafe
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      It would help to some extent, but to really get people to buy in you’d need an app to do the heavy lifting (that is, it’s easier to get people to snap a photo of their receipt than to type the info in one character at a time). Some people might still be willing to do it without, but how many?

      You’d also have to relate the abbreviations that often appear on grocery receipts back to the items they represent, which is more data entry.

      • Sethayy@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        Yeah, I was thinking something along the lines of lots n lots of easy shitty data (ex. anyone who can take a picture), some pretty good data (ex. hand labeled receipts), some 100% reliable data (scraped/api) then some sort of system to correlate the 3, especially when prices match identically between receipt and api a fair sized database could create itself.

        Also would need some sort of processing center to handle the many image processing requests, but maybe that could be done client side

  • howrar@lemmy.ca
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    Pretty easy to get something basic set up if you get enough people to crowd-source data with photos of stuff in grocery stores and their receipts, along with some scraping to get data that’s available online. It’s a project that’s been on my backlog for a while, but I can bump it up if others want to join me in making this.

  • Touching_Grass@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    Can’t, this would be illegal within a year. Scraping data is already taboo. How fucking dumb is that.

    Its why I hate the ’ starving artist worried about AI scrapping’ stories. It will be used to usher in stronger laws to prevent us from scraping this data. Its a double edged sword.