Purging old Github issues

gpatel-fr · June 23, 2023, 7:19am

Hello

For advice and feedback before triggering it.

I intend to add an auto-closing feature to the Github project ‘issues’. There are more than 900 issues, a number of them of not much use because the original poster has not provided enough info and lost interest.

The approach is to add (manually) a tag when the information given is not really enough to reproduce the problem reliably. At this moment a timer is triggered on the issue, and if there is no activity in a 2 weeks period, the issue is tagged as ‘stale’. At this point all participants are warned by a mail. If a comment is added to the issue, the timer is reset and the ‘stale’ tag is removed.
After 2 weeks of an issue being stale, the issue is closed.
If the issue has been tagged with another tag (except a few tags such as ‘duplicate’…), it is not tagged ‘stale’. This is to avoid issues tagged ‘bug’ for example to be closed automatically.

For people interested in details, here is the Github script:

name: Close inactive issues
on:
  schedule:
    - cron: "30 1 * * *"

jobs:
  close-issues:
    runs-on: ubuntu-latest
    permissions:
      issues: write
    steps:
      - uses: actions/stale@v5
        with:
          days-before-issue-stale: 14
          days-before-issue-close: 14
          stale-issue-label: "stale"
          stale-issue-message: "This issue is stale because it has been open for 14 days with no activity."
          close-issue-message: "This issue was closed because it has been inactive for 14 days since being marked as stale."
          days-before-pr-stale: -1
          days-before-pr-close: -1
          exempt-all-assignees: true
          exempt-milestones: true
          only-labels: "pending user feedback"
          exempt-issue-labels: "bug,enhancement,good first issue,backend enhancement,backend issue,backup corruption,bounty,bugreport attached,core logic,docker,filters,help wanted,linux,localization,MacOS,mono,performance issue,reproduced,server side,ssl/tls issue,Synology,tests,translation,UI,windows"
          repo-token: ${{ secrets.GITHUB_TOKEN }}

ts678 · June 23, 2023, 2:03pm

Thanks. I’ll also guess at the process questions behind this. What are we trying to achieve?
How does this interact with current or future workflows maybe using a variety of volunteers?

One time I went through all of them. I left a few comments, probably didn’t do much to labels.
I closed at least one that was hopelessly vague. Little chance of someone else recognizing it.

For very old issues, there’s also a question of whether it’s been fixed since. How to see that?
One way is, if it’s well enough known or clearly stated, someone will comment or reference it.
Even an errant close will eventually resurface as a new issue if its problem is still being seen.

When I do an occasional issues check, my query is often is:issue sort:updated-desc so
shows closed issues that are getting attention. I’m not sure how to avoid that. Issue template

https://github.com/duplicati/duplicati/blob/master/.github/ISSUE_TEMPLATE.md does request

I have searched open and closed issues for duplicates.

almost seems to encourage hitting closed issues. Maybe template can say what we’d like done.
I think pull request templates are possible too, again if we can figure out what we’d like in them.

Because the forum does it always, sorting by recent activity in Issues feels quite natural to me,
letting inactive things sort of sink beyond sight (and probably out of activity) when inactive, and
perhaps awakening later for another burst (possibly with others who go by the same strategy).
Think of it as cache. It provides focus, but is also susceptible to churn if too much tries to get in.

This feels like an attempt to apply some agile ideas such as limit work in progress, then finish it.
There’s always the backlog, but working on everything at once works worse than selecting work.
There’s usually a separation of duties. For our Issues, maybe some folks could front-end triage?
That’s a lot like forum support, attempting to isolate the problem. After that the coders can go in.

Issue as filed often lacks nice steps, but with some guidance or others trying, it may be possible.
Rare configurations can pose problems, maybe less so if more Duplicati helpers help with repro.
Sometimes it’s possible to get the user to test on a configuration that’s more generally available.
One point from this is it’s not always a lack of information. It might be a lack of something else…

This might need refining. There are 46 labels at the moment, and exempt-issue-labels has 25.
Starting conservatively is fine, but we don’t want to inadvertently label this scheme into oblivion…

My usual process is simpler but more manual. Label pending user feedback too long might get
issue closed for lack thereof. This works better for newer issues than ones from many years back.

I suppose one could apply tags manually to hundreds of issues, Will tag indicate what’s expected?
I see messages explaining stale and closed. Can the initial tag generate one too? What will it say?

Jojo-1000 · June 23, 2023, 3:08pm

There is definitely a way to add mouseover descriptions to labels (currently dependencies and DO NOT MERGE have one), but I’m not sure how.

gpatel-fr · June 23, 2023, 9:16pm

There is a ‘description’ field when clicking edit labels. I have added one to the tag ‘Pending user feedback’.

gpatel-fr · June 23, 2023, 9:33pm

well, reducing the number of useless issues. If a poster don’t want to help the project, and it can’t be solved easily, the issue has no point to exist. It should have been a forum post. The forum is all right for people wanting help, but Github is for development.
The idea is for volunteers to actually use tags. If an issue is a bug, tag it as a bug. If the issue is unclear, tag it as ‘needs info’.
By the way, I am not clear on how you are able to tag issues while I don’t see you in the developer list. This is something that is escaping me in the project configuration (I don’t have access to the project settings).

The main problem that I want to address is that after a few (sometimes no) exchange, the original poster disappears. They have solved their problem in another way and forgotten all about it. That’s nice for them, but old garbage needs to be cleaned.
For me the right thing to do is to tag the issue immediately when asking a precision to the original poster. If there is no answer, the issue will disappear all by itself.

What do you think should be done instead ?

What’s the ‘initial tag’ ? the ‘needs user info’ ? why should it generate a mail ? The user will get a mail after 2 weeks, no need to spam immediately I think. I don’t think the standard Github action allows it anyway.

ts678 · June 24, 2023, 12:56am

Then the issue template should clarify that. Projects do sometimes steer people one way or another.
Sometimes it won’t work, but if it’s a goal, then we should at least try to get people heading that way.

I often try to get people to file an issue because the forum isn’t a bug tracker, so the bug will get lost.
Sometimes they’ll file, sometimes not. If it’s not worth their time then maybe it’s not worth ours either.

but

so doesn’t having about half the labels prevent a stale mean label goal interferes with purge goal?

https://github.com/actions/stale#exempt-issue-labels says (is it possible that I’m reading this wrong)

Comma separated list of labels that can be assigned to issues to exclude them from being marked as stale (e.g: question,bug)

and if an issue never gets to stale because of its bug label, it never gets to automatic close, correct?

I don’t know how the list is implemented, but likely underneath are GitHub roles. I took only triage because write seemed needless risk while I’m trying to work out how to actually drive GitHub well.

Then it looks like you want something like the GitHub example, adjusted for the label that we like:

any-of-labels: ‘needs-more-info,needs-demo’

An allow-list of label(s) to only process the issues or the pull requests that contain one of these label(s). It can be a comma separated list of labels (e.g: answered,needs-rebase).

The example is a bit confusing, but I think it stops the timer on developers when they post a question. Answered question restarts the timer. We’re trying to use the timer to keep the user responses going.

The other problem with stuff such as any-of-labels is forgetting to remove needs-more-info-type label.
Maybe a few accidents will solve that, plus I guess that user still has an option to reopen if necessary.

So they know what we want, and can maybe work on it. Why throw in a needless 2 weeks delay in it?
Answer is possibly as you say, that GitHub can’t do it. Maybe whoever works the issue must explain?

I’ve not use this stale mechanism before, but it sounds like a lot of the issues will be stale when this is rolled out. Maybe it should be rolled out in phases, e.g. huge days-before-stale. Practice on the oldest.

I was looking over the last-updated-in-2016 issues. Found one that we’re talking about right now. Dup.
Noticed lots of enhancement labels. Should those perhaps be exempted? How was current list made?

EDIT:

I now notice config uses only-labels: "pending user feedback" which is similar to any-of-labels, actually identical when there’s only one. So I suppose the question is still what gets exempt, and why?

gpatel-fr · June 24, 2023, 10:15am

This is exactly what I think. Most managed projects do that, because when a project grows it gets a forum. Small projects can use issues for discussion.

IMO the goal should not to purge issues for the sake of purging issues, the goal should be that active issues are issues worth reading. If an issue is tagged a bug by a developer, it’s deemed to be worth the time to read it -until it is fixed by a code change (closed), or it’s reevaluated as not a bug (untagged), or it’s closed because other code changes have made the bug irrelevant (closed as obsolete).
One always can read and search closed issues, but by default it should be a waste of time.

either they have been asked questions, or they have deliberately ignored the issue template. I’d not hesitate as tagging an issue as needed user input in this case (except some very exceptional cases like this one: "duplicati" org on Docker Hub is going to go away · Issue #4906 · duplicati/duplicati · GitHub).
For standard ‘I’m asking for help’ issue, why asking another time what is the Duplicati version when it was already asked in the issue template ? They know that getting details right is necessary. The delay is because people are not always available and can be busy with their life. If they are very busy but still want to do a proper answer, they can post a comment saying so, it will avoid issue closing.

If I have gotten the github action right, they will be exempted as it should be (IMO of course). If a developer (or a triager) has deemed an issue an enhancement, it is interesting to read and should not be closed automatically. Removing the tag ‘user info needed’ is nice but not necessary.

ts678 · June 24, 2023, 1:00pm

Thanks. I’m getting the picture more now. Before saying more, now that I looked into config method,
exempt-milestones: true looks suspicious, as it accepts a comma-separated list of milestone names.

I’m also a bit nervous about double quote parsing. e.g. exempt-issue-labels, a comma-separated list, however what’s the syntax for allowing an embedded space? For example, in search, putting outside double quotes around the whole thing seems to confuse a label search in an interesting way, where

181 from is:issue is:open label:“bug”
2 from is:issue is:open label:“good first issue”
0 from is:issue is:open label:“bug,good first issue”
183 from is:issue is:open label:bug,“good first issue”

Use quotation marks for queries with whitespace is not clear on where quotes go, when a list is given.
Maybe debug-only for a dry run would be safer if you’re unsure how outside-quoted exempt list works.

Back to objectives, the challenge seems to be to keep a single issues queue somewhat organized, so there’s less useless clutter, and what’s left is clearer as to what it is, and what the next steps might be.

Volunteers can push issues as far as they are able. We should get triage role for more people though, partly due to scale and partly due to variations in skills and personal equipment that might be needed.

This is a good way for new volunteers to learn their way around, ideally getting onto the developer list.
There will be some uncertainty about when they’ve gone as far as they can/will, so just wait a little bit before next level person jumps in. I try to do this anyway. If someone goes silent, maybe they’re done.

Volunteers who can “do it all” might want to let others do what they can instead of jumping in instantly.
In the forum, I can’t do it all either by skill or by time, so I often let topics age a day before I start them.

Rather than specify exact rules, provide tools. If someone would rather just answer and close an issue that’s really support, maybe just do it rather than send person to forum. If a user responds nicely, there isn’t really any need to label pending user feedback and have to unlabel it by hand if user responds.

If user is very unresponsive, the penalty is stale then close, however there’s a generous exemption for issues that have already gained some value due to work done or type of issue as seen by issue labels.

Old issues aren’t auto-purged. People still need to churn through them a bit, maybe rescue some, then label “pending user feedback” with or without more direction. For really old issues, 2 weeks later is fine.

The stale process might just serve as an automated last call and close once someone decides to use it.

Does that sound about right so far? There’s more to talk about, e.g. how reproducibility enters into plan. Sometimes one is pretty sure it’s not support, not enhancement, is code, but struggles to reproduce it…

gpatel-fr · June 24, 2023, 2:45pm

what’s sure is that a comma is not allowed in a label, so I don’t quite see the necessity to surround labels with quotes in the Github action. Searching for issues is a different matter since we are passing a whole command line (there are spaces between is:issue and is:open and label…)

if people are actually volunteering, why not indeed.

The goal is certainly not to do unnecessary tag management; for example last issue currently, The request was aborted: Could not create SSL/TLS secure channel. #4979 the user certainly answered nicely and even closed the issue when it was solved, so there is no need to remove the label - the issue is closed.
In fairness, you could argue that this is a limitation of the scheme: this issue seems to point to a new problem that has not been seen before (not by me in any case), so closing it hides an information. Well, yes. Life is not generally quite perfect.

Yes what’s asked from the user is some honest effort to give necessary information to repro, but even if it’s done, repro is not always so easy to do, so in this case removing the tag seems best indeed. No tag at all is a valid issue status meaning about the same thing as ‘null’ for a variable (unknown state), but it should not be seen too often.

Jojo-1000 · June 26, 2023, 7:13am

Speaking of labels, I think it would also be nice to tag pull requests with the kind of changes that are made. For example, there are labels for core logic, UI, new backend and translation that could apply to pull requests.

This would hopefully simplify the review process and allow less critical changes (such as UI) to be tested by others than @gpatel-fr, so you can focus on the more dangerous changes.

gpatel-fr · August 25, 2023, 8:59pm

I have finally created the PR to add this functionality.

Well, for less than 100 PR, that seems a bit overkill.

Jojo-1000 · August 31, 2023, 7:09pm

I noticed that issues with open pull requests are also marked as stale and will be closed (e.g. #4970). Do you think that will be a problem? I presume that issues with pull requests have enough info to be worth keeping them open.
Is there a way to exclude those, or is it too much of a bother? Obviously the pull requests will still stay open, but it might be confusing that the issues they are supposed to fix are closed already.

gpatel-fr · August 31, 2023, 8:12pm

I think the issue was tagged before your PR. I have untagged it. I guess that the proper way is to add a tag ‘bug’ when one find something to fix in an issue, it will prevent the issue to be handled just as well.

ts678 · January 24, 2024, 11:49am

I’ve been using it, sort of going by some recent talk like

One early noise-reduction was to label enhancement so one can search using -label:enhancement.

There is now a load of pending user feedback that will probably get stale treatment per this soon.
Those can also be filtered out in a search, but some will need manual action due to their ancient label.

There were a lot of issues from years ago that got a bug label but no more. Sometimes a request gets response or Close from originator. I guess we decide on manual actions if lack of response continues.

Feel free to review any or all of these sooner if desired, and maybe triage role should be spread wider.

gpatel-fr · January 24, 2024, 2:45pm

I tried
is:issue is:open label:“pending user feedback” -label:stale
without looking at the details of the issues. The ‘not reproducible’ label should not block making an issue stale, obviously. Maybe some other labels should not block it either.

why not, when we will be deluged by eager would-be issue triagers.

ts678 · January 27, 2024, 12:25pm

Although I don’t see any deluge yet, sometimes asking candidates can help spread the work.
Repository roles for organizations shows the hierarchy. I was only willing to accept Triage…

Even with that, I didn’t use it much, until this recent effort to try to clean up the issue backlog.
Having Triage doesn’t mean one has to triage everything, but it grants ability, as does Write.

Unfortunately I just finished a round, but the ones which don’t get stale handling need review.
An easy way to find those (if not in a rush) is to wait some more weeks, and see what’s left…

Although there’s lots in progress, I was conservative and tried to give some reason for asking.
Issues that need special setups could sometimes not be tested, and I didn’t just fish for status.

People go away over the years, so simply closing a very old issue due to no response is risky.
Countering that theory is that if it’s still happening, we’ll hear about it from someone sometime.

Another class of issue I didn’t hit yet is “flaky” issues. Easier ones could be solidly tested again.
So I’m kind of done until someone figures out some next phase in how to clean up the clutter…

EDIT 1:

One goal of having less noise is so we can see what’s left, which will help create release plans.
There are a few known issues that aren’t filed as issues, so I’m also adding a few new issues…

EDIT 2:

but I’m not ambitious enough to file an issue for every forum issue, so bugs can be hiding there.

ts678 · January 27, 2024, 10:56pm

We had a chat in one issue where bug label blocked the stale handling, probably as usually intended.

Sometimes an issue is kind of vague or possibly intermittent (see “flaky” comment above). I’ll remove previous bug label from a few old issues where I’m pretty sure the problem is solidly gone at this time.

Behind those are some more where I suspect it’s gone, but can’t really slap a not reproducible on, however when that’s added, it sounds like the intent is to remove labels as needed to allow stale work.

gpatel-fr · January 27, 2024, 11:24pm

That’s a matter of appreciation on how strongly you suspect it’s gone.

ts678 · January 27, 2024, 11:38pm

If I don’t feel strongly enough, it doesn’t get a not reproducible label.
The point was that if it gets that, then remove what’s needed for a stale.

EDIT: This can be ramped up further if desired. It’s still not at maximum.
One team I worked on (which did have a QA team) decided to just close
sufficiently old bug reports, and have them be filed again if necessary…

Figuring out if an issue still exists is difficult, and personnel may change.
New issues that can be jumped on fast are sometimes easiest to work…

EDIT 2:

An intermediate-level way to allow stale handling without a specific test
would be to remove blocking labels without calling it not reproducible.

EDIT 3: I did some more of that category so we’ll see if anyone responds.

EDIT 4: Basically, this undoes the initial, typically unverified guess of bug.

ts678 · January 28, 2024, 2:46pm

Hearing no objection, I removed some more bug labels. The main add-on annoyance was removing the miscellaneous and unpredictable stale-blockers (one can read stale.yml file) that were added as well.

These are sometimes helpful, but not when one’s trying to undo things. Meanwhile I’m not adding more. Someday maybe we’ll get more ambitious, but I’m once again trying to close off my round of labelling…

What I’m being generous on is enhancement, because that’s an easy way to let tailored views be done. Enhancements are mostly lower priority to bugs. There are a few I might lobby for, but can’t really label.

GitHub Issues is not a very powerful bug tracker, but if Issues are cut down to a more reasonable size, review of what’s open can probably let us do things like release planning, or maybe even leaving Beta.