You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

datasets_document.py 49KB

2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
Introduce Plugins (#13836) Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: -LAN- <laipz8200@outlook.com> Signed-off-by: xhe <xw897002528@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: takatost <takatost@gmail.com> Co-authored-by: kurokobo <kuro664@gmail.com> Co-authored-by: Novice Lee <novicelee@NoviPro.local> Co-authored-by: zxhlyh <jasonapring2015@outlook.com> Co-authored-by: AkaraChen <akarachen@outlook.com> Co-authored-by: Yi <yxiaoisme@gmail.com> Co-authored-by: Joel <iamjoel007@gmail.com> Co-authored-by: JzoNg <jzongcode@gmail.com> Co-authored-by: twwu <twwu@dify.ai> Co-authored-by: Hiroshi Fujita <fujita-h@users.noreply.github.com> Co-authored-by: AkaraChen <85140972+AkaraChen@users.noreply.github.com> Co-authored-by: NFish <douxc512@gmail.com> Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com> Co-authored-by: 非法操作 <hjlarry@163.com> Co-authored-by: Novice <857526207@qq.com> Co-authored-by: Hiroki Nagai <82458324+nagaihiroki-git@users.noreply.github.com> Co-authored-by: Gen Sato <52241300+halogen22@users.noreply.github.com> Co-authored-by: eux <euxuuu@gmail.com> Co-authored-by: huangzhuo1949 <167434202+huangzhuo1949@users.noreply.github.com> Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com> Co-authored-by: lotsik <lotsik@mail.ru> Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> Co-authored-by: nite-knite <nkCoding@gmail.com> Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: gakkiyomi <gakkiyomi@aliyun.com> Co-authored-by: CN-P5 <heibai2006@gmail.com> Co-authored-by: CN-P5 <heibai2006@qq.com> Co-authored-by: Chuehnone <1897025+chuehnone@users.noreply.github.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Kevin9703 <51311316+Kevin9703@users.noreply.github.com> Co-authored-by: -LAN- <laipz8200@outlook.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: mbo <himabo@gmail.com> Co-authored-by: mabo <mabo@aeyes.ai> Co-authored-by: Warren Chen <warren.chen830@gmail.com> Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com> Co-authored-by: jiandanfeng <chenjh3@wangsu.com> Co-authored-by: zhu-an <70234959+xhdd123321@users.noreply.github.com> Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com> Co-authored-by: 海狸大師 <86974027+yenslife@users.noreply.github.com> Co-authored-by: Xu Song <xusong.vip@gmail.com> Co-authored-by: rayshaw001 <396301947@163.com> Co-authored-by: Ding Jiatong <dingjiatong@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: JasonVV <jasonwangiii@outlook.com> Co-authored-by: le0zh <newlight@qq.com> Co-authored-by: zhuxinliang <zhuxinliang@didiglobal.com> Co-authored-by: k-zaku <zaku99@outlook.jp> Co-authored-by: luckylhb90 <luckylhb90@gmail.com> Co-authored-by: hobo.l <hobo.l@binance.com> Co-authored-by: jiangbo721 <365065261@qq.com> Co-authored-by: 刘江波 <jiangbo721@163.com> Co-authored-by: Shun Miyazawa <34241526+miya@users.noreply.github.com> Co-authored-by: EricPan <30651140+Egfly@users.noreply.github.com> Co-authored-by: crazywoola <427733928@qq.com> Co-authored-by: sino <sino2322@gmail.com> Co-authored-by: Jhvcc <37662342+Jhvcc@users.noreply.github.com> Co-authored-by: lowell <lowell.hu@zkteco.in> Co-authored-by: Boris Polonsky <BorisPolonsky@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com> Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com> Co-authored-by: IWAI, Masaharu <iwaim.sub@gmail.com> Co-authored-by: Yueh-Po Peng (Yabi) <94939112+y10ab1@users.noreply.github.com> Co-authored-by: Jason <ggbbddjm@gmail.com> Co-authored-by: Xin Zhang <sjhpzx@gmail.com> Co-authored-by: yjc980121 <3898524+yjc980121@users.noreply.github.com> Co-authored-by: heyszt <36215648+hieheihei@users.noreply.github.com> Co-authored-by: Abdullah AlOsaimi <osaimiacc@gmail.com> Co-authored-by: Abdullah AlOsaimi <189027247+osaimi@users.noreply.github.com> Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: Hash Brown <hi@xzd.me> Co-authored-by: zuodongxu <192560071+zuodongxu@users.noreply.github.com> Co-authored-by: Masashi Tomooka <tmokmss@users.noreply.github.com> Co-authored-by: aplio <ryo.091219@gmail.com> Co-authored-by: Obada Khalili <54270856+obadakhalili@users.noreply.github.com> Co-authored-by: Nam Vu <zuzoovn@gmail.com> Co-authored-by: Kei YAMAZAKI <1715090+kei-yamazaki@users.noreply.github.com> Co-authored-by: TechnoHouse <13776377+deephbz@users.noreply.github.com> Co-authored-by: Riddhimaan-Senapati <114703025+Riddhimaan-Senapati@users.noreply.github.com> Co-authored-by: MaFee921 <31881301+2284730142@users.noreply.github.com> Co-authored-by: te-chan <t-nakanome@sakura-is.co.jp> Co-authored-by: HQidea <HQidea@users.noreply.github.com> Co-authored-by: Joshbly <36315710+Joshbly@users.noreply.github.com> Co-authored-by: xhe <xw897002528@gmail.com> Co-authored-by: weiwenyan-dev <154779315+weiwenyan-dev@users.noreply.github.com> Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com> Co-authored-by: engchina <12236799+engchina@users.noreply.github.com> Co-authored-by: engchina <atjapan2015@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 呆萌闷油瓶 <253605712@qq.com> Co-authored-by: Kemal <kemalmeler@outlook.com> Co-authored-by: Lazy_Frog <4590648+lazyFrogLOL@users.noreply.github.com> Co-authored-by: Yi Xiao <54782454+YIXIAO0@users.noreply.github.com> Co-authored-by: Steven sun <98230804+Tuyohai@users.noreply.github.com> Co-authored-by: steven <sunzwj@digitalchina.com> Co-authored-by: Kalo Chin <91766386+fdb02983rhy@users.noreply.github.com> Co-authored-by: Katy Tao <34019945+KatyTao@users.noreply.github.com> Co-authored-by: depy <42985524+h4ckdepy@users.noreply.github.com> Co-authored-by: 胡春东 <gycm520@gmail.com> Co-authored-by: Junjie.M <118170653@qq.com> Co-authored-by: MuYu <mr.muzea@gmail.com> Co-authored-by: Naoki Takashima <39912547+takatea@users.noreply.github.com> Co-authored-by: Summer-Gu <37869445+gubinjie@users.noreply.github.com> Co-authored-by: Fei He <droxer.he@gmail.com> Co-authored-by: ybalbert001 <120714773+ybalbert001@users.noreply.github.com> Co-authored-by: Yuanbo Li <ybalbert@amazon.com> Co-authored-by: douxc <7553076+douxc@users.noreply.github.com> Co-authored-by: liuzhenghua <1090179900@qq.com> Co-authored-by: Wu Jiayang <62842862+Wu-Jiayang@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: kimjion <45935338+kimjion@users.noreply.github.com> Co-authored-by: AugNSo <song.tiankai@icloud.com> Co-authored-by: llinvokerl <38915183+llinvokerl@users.noreply.github.com> Co-authored-by: liusurong.lsr <liusurong.lsr@alibaba-inc.com> Co-authored-by: Vasu Negi <vasu-negi@users.noreply.github.com> Co-authored-by: Hundredwz <1808096180@qq.com> Co-authored-by: Xiyuan Chen <52963600+GareArc@users.noreply.github.com>
8 months ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
Introduce Plugins (#13836) Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: -LAN- <laipz8200@outlook.com> Signed-off-by: xhe <xw897002528@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: takatost <takatost@gmail.com> Co-authored-by: kurokobo <kuro664@gmail.com> Co-authored-by: Novice Lee <novicelee@NoviPro.local> Co-authored-by: zxhlyh <jasonapring2015@outlook.com> Co-authored-by: AkaraChen <akarachen@outlook.com> Co-authored-by: Yi <yxiaoisme@gmail.com> Co-authored-by: Joel <iamjoel007@gmail.com> Co-authored-by: JzoNg <jzongcode@gmail.com> Co-authored-by: twwu <twwu@dify.ai> Co-authored-by: Hiroshi Fujita <fujita-h@users.noreply.github.com> Co-authored-by: AkaraChen <85140972+AkaraChen@users.noreply.github.com> Co-authored-by: NFish <douxc512@gmail.com> Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com> Co-authored-by: 非法操作 <hjlarry@163.com> Co-authored-by: Novice <857526207@qq.com> Co-authored-by: Hiroki Nagai <82458324+nagaihiroki-git@users.noreply.github.com> Co-authored-by: Gen Sato <52241300+halogen22@users.noreply.github.com> Co-authored-by: eux <euxuuu@gmail.com> Co-authored-by: huangzhuo1949 <167434202+huangzhuo1949@users.noreply.github.com> Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com> Co-authored-by: lotsik <lotsik@mail.ru> Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> Co-authored-by: nite-knite <nkCoding@gmail.com> Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: gakkiyomi <gakkiyomi@aliyun.com> Co-authored-by: CN-P5 <heibai2006@gmail.com> Co-authored-by: CN-P5 <heibai2006@qq.com> Co-authored-by: Chuehnone <1897025+chuehnone@users.noreply.github.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Kevin9703 <51311316+Kevin9703@users.noreply.github.com> Co-authored-by: -LAN- <laipz8200@outlook.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: mbo <himabo@gmail.com> Co-authored-by: mabo <mabo@aeyes.ai> Co-authored-by: Warren Chen <warren.chen830@gmail.com> Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com> Co-authored-by: jiandanfeng <chenjh3@wangsu.com> Co-authored-by: zhu-an <70234959+xhdd123321@users.noreply.github.com> Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com> Co-authored-by: 海狸大師 <86974027+yenslife@users.noreply.github.com> Co-authored-by: Xu Song <xusong.vip@gmail.com> Co-authored-by: rayshaw001 <396301947@163.com> Co-authored-by: Ding Jiatong <dingjiatong@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: JasonVV <jasonwangiii@outlook.com> Co-authored-by: le0zh <newlight@qq.com> Co-authored-by: zhuxinliang <zhuxinliang@didiglobal.com> Co-authored-by: k-zaku <zaku99@outlook.jp> Co-authored-by: luckylhb90 <luckylhb90@gmail.com> Co-authored-by: hobo.l <hobo.l@binance.com> Co-authored-by: jiangbo721 <365065261@qq.com> Co-authored-by: 刘江波 <jiangbo721@163.com> Co-authored-by: Shun Miyazawa <34241526+miya@users.noreply.github.com> Co-authored-by: EricPan <30651140+Egfly@users.noreply.github.com> Co-authored-by: crazywoola <427733928@qq.com> Co-authored-by: sino <sino2322@gmail.com> Co-authored-by: Jhvcc <37662342+Jhvcc@users.noreply.github.com> Co-authored-by: lowell <lowell.hu@zkteco.in> Co-authored-by: Boris Polonsky <BorisPolonsky@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com> Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com> Co-authored-by: IWAI, Masaharu <iwaim.sub@gmail.com> Co-authored-by: Yueh-Po Peng (Yabi) <94939112+y10ab1@users.noreply.github.com> Co-authored-by: Jason <ggbbddjm@gmail.com> Co-authored-by: Xin Zhang <sjhpzx@gmail.com> Co-authored-by: yjc980121 <3898524+yjc980121@users.noreply.github.com> Co-authored-by: heyszt <36215648+hieheihei@users.noreply.github.com> Co-authored-by: Abdullah AlOsaimi <osaimiacc@gmail.com> Co-authored-by: Abdullah AlOsaimi <189027247+osaimi@users.noreply.github.com> Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: Hash Brown <hi@xzd.me> Co-authored-by: zuodongxu <192560071+zuodongxu@users.noreply.github.com> Co-authored-by: Masashi Tomooka <tmokmss@users.noreply.github.com> Co-authored-by: aplio <ryo.091219@gmail.com> Co-authored-by: Obada Khalili <54270856+obadakhalili@users.noreply.github.com> Co-authored-by: Nam Vu <zuzoovn@gmail.com> Co-authored-by: Kei YAMAZAKI <1715090+kei-yamazaki@users.noreply.github.com> Co-authored-by: TechnoHouse <13776377+deephbz@users.noreply.github.com> Co-authored-by: Riddhimaan-Senapati <114703025+Riddhimaan-Senapati@users.noreply.github.com> Co-authored-by: MaFee921 <31881301+2284730142@users.noreply.github.com> Co-authored-by: te-chan <t-nakanome@sakura-is.co.jp> Co-authored-by: HQidea <HQidea@users.noreply.github.com> Co-authored-by: Joshbly <36315710+Joshbly@users.noreply.github.com> Co-authored-by: xhe <xw897002528@gmail.com> Co-authored-by: weiwenyan-dev <154779315+weiwenyan-dev@users.noreply.github.com> Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com> Co-authored-by: engchina <12236799+engchina@users.noreply.github.com> Co-authored-by: engchina <atjapan2015@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 呆萌闷油瓶 <253605712@qq.com> Co-authored-by: Kemal <kemalmeler@outlook.com> Co-authored-by: Lazy_Frog <4590648+lazyFrogLOL@users.noreply.github.com> Co-authored-by: Yi Xiao <54782454+YIXIAO0@users.noreply.github.com> Co-authored-by: Steven sun <98230804+Tuyohai@users.noreply.github.com> Co-authored-by: steven <sunzwj@digitalchina.com> Co-authored-by: Kalo Chin <91766386+fdb02983rhy@users.noreply.github.com> Co-authored-by: Katy Tao <34019945+KatyTao@users.noreply.github.com> Co-authored-by: depy <42985524+h4ckdepy@users.noreply.github.com> Co-authored-by: 胡春东 <gycm520@gmail.com> Co-authored-by: Junjie.M <118170653@qq.com> Co-authored-by: MuYu <mr.muzea@gmail.com> Co-authored-by: Naoki Takashima <39912547+takatea@users.noreply.github.com> Co-authored-by: Summer-Gu <37869445+gubinjie@users.noreply.github.com> Co-authored-by: Fei He <droxer.he@gmail.com> Co-authored-by: ybalbert001 <120714773+ybalbert001@users.noreply.github.com> Co-authored-by: Yuanbo Li <ybalbert@amazon.com> Co-authored-by: douxc <7553076+douxc@users.noreply.github.com> Co-authored-by: liuzhenghua <1090179900@qq.com> Co-authored-by: Wu Jiayang <62842862+Wu-Jiayang@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: kimjion <45935338+kimjion@users.noreply.github.com> Co-authored-by: AugNSo <song.tiankai@icloud.com> Co-authored-by: llinvokerl <38915183+llinvokerl@users.noreply.github.com> Co-authored-by: liusurong.lsr <liusurong.lsr@alibaba-inc.com> Co-authored-by: Vasu Negi <vasu-negi@users.noreply.github.com> Co-authored-by: Hundredwz <1808096180@qq.com> Co-authored-by: Xiyuan Chen <52963600+GareArc@users.noreply.github.com>
8 months ago
Introduce Plugins (#13836) Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: -LAN- <laipz8200@outlook.com> Signed-off-by: xhe <xw897002528@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: takatost <takatost@gmail.com> Co-authored-by: kurokobo <kuro664@gmail.com> Co-authored-by: Novice Lee <novicelee@NoviPro.local> Co-authored-by: zxhlyh <jasonapring2015@outlook.com> Co-authored-by: AkaraChen <akarachen@outlook.com> Co-authored-by: Yi <yxiaoisme@gmail.com> Co-authored-by: Joel <iamjoel007@gmail.com> Co-authored-by: JzoNg <jzongcode@gmail.com> Co-authored-by: twwu <twwu@dify.ai> Co-authored-by: Hiroshi Fujita <fujita-h@users.noreply.github.com> Co-authored-by: AkaraChen <85140972+AkaraChen@users.noreply.github.com> Co-authored-by: NFish <douxc512@gmail.com> Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com> Co-authored-by: 非法操作 <hjlarry@163.com> Co-authored-by: Novice <857526207@qq.com> Co-authored-by: Hiroki Nagai <82458324+nagaihiroki-git@users.noreply.github.com> Co-authored-by: Gen Sato <52241300+halogen22@users.noreply.github.com> Co-authored-by: eux <euxuuu@gmail.com> Co-authored-by: huangzhuo1949 <167434202+huangzhuo1949@users.noreply.github.com> Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com> Co-authored-by: lotsik <lotsik@mail.ru> Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> Co-authored-by: nite-knite <nkCoding@gmail.com> Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: gakkiyomi <gakkiyomi@aliyun.com> Co-authored-by: CN-P5 <heibai2006@gmail.com> Co-authored-by: CN-P5 <heibai2006@qq.com> Co-authored-by: Chuehnone <1897025+chuehnone@users.noreply.github.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Kevin9703 <51311316+Kevin9703@users.noreply.github.com> Co-authored-by: -LAN- <laipz8200@outlook.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: mbo <himabo@gmail.com> Co-authored-by: mabo <mabo@aeyes.ai> Co-authored-by: Warren Chen <warren.chen830@gmail.com> Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com> Co-authored-by: jiandanfeng <chenjh3@wangsu.com> Co-authored-by: zhu-an <70234959+xhdd123321@users.noreply.github.com> Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com> Co-authored-by: 海狸大師 <86974027+yenslife@users.noreply.github.com> Co-authored-by: Xu Song <xusong.vip@gmail.com> Co-authored-by: rayshaw001 <396301947@163.com> Co-authored-by: Ding Jiatong <dingjiatong@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: JasonVV <jasonwangiii@outlook.com> Co-authored-by: le0zh <newlight@qq.com> Co-authored-by: zhuxinliang <zhuxinliang@didiglobal.com> Co-authored-by: k-zaku <zaku99@outlook.jp> Co-authored-by: luckylhb90 <luckylhb90@gmail.com> Co-authored-by: hobo.l <hobo.l@binance.com> Co-authored-by: jiangbo721 <365065261@qq.com> Co-authored-by: 刘江波 <jiangbo721@163.com> Co-authored-by: Shun Miyazawa <34241526+miya@users.noreply.github.com> Co-authored-by: EricPan <30651140+Egfly@users.noreply.github.com> Co-authored-by: crazywoola <427733928@qq.com> Co-authored-by: sino <sino2322@gmail.com> Co-authored-by: Jhvcc <37662342+Jhvcc@users.noreply.github.com> Co-authored-by: lowell <lowell.hu@zkteco.in> Co-authored-by: Boris Polonsky <BorisPolonsky@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com> Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com> Co-authored-by: IWAI, Masaharu <iwaim.sub@gmail.com> Co-authored-by: Yueh-Po Peng (Yabi) <94939112+y10ab1@users.noreply.github.com> Co-authored-by: Jason <ggbbddjm@gmail.com> Co-authored-by: Xin Zhang <sjhpzx@gmail.com> Co-authored-by: yjc980121 <3898524+yjc980121@users.noreply.github.com> Co-authored-by: heyszt <36215648+hieheihei@users.noreply.github.com> Co-authored-by: Abdullah AlOsaimi <osaimiacc@gmail.com> Co-authored-by: Abdullah AlOsaimi <189027247+osaimi@users.noreply.github.com> Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: Hash Brown <hi@xzd.me> Co-authored-by: zuodongxu <192560071+zuodongxu@users.noreply.github.com> Co-authored-by: Masashi Tomooka <tmokmss@users.noreply.github.com> Co-authored-by: aplio <ryo.091219@gmail.com> Co-authored-by: Obada Khalili <54270856+obadakhalili@users.noreply.github.com> Co-authored-by: Nam Vu <zuzoovn@gmail.com> Co-authored-by: Kei YAMAZAKI <1715090+kei-yamazaki@users.noreply.github.com> Co-authored-by: TechnoHouse <13776377+deephbz@users.noreply.github.com> Co-authored-by: Riddhimaan-Senapati <114703025+Riddhimaan-Senapati@users.noreply.github.com> Co-authored-by: MaFee921 <31881301+2284730142@users.noreply.github.com> Co-authored-by: te-chan <t-nakanome@sakura-is.co.jp> Co-authored-by: HQidea <HQidea@users.noreply.github.com> Co-authored-by: Joshbly <36315710+Joshbly@users.noreply.github.com> Co-authored-by: xhe <xw897002528@gmail.com> Co-authored-by: weiwenyan-dev <154779315+weiwenyan-dev@users.noreply.github.com> Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com> Co-authored-by: engchina <12236799+engchina@users.noreply.github.com> Co-authored-by: engchina <atjapan2015@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 呆萌闷油瓶 <253605712@qq.com> Co-authored-by: Kemal <kemalmeler@outlook.com> Co-authored-by: Lazy_Frog <4590648+lazyFrogLOL@users.noreply.github.com> Co-authored-by: Yi Xiao <54782454+YIXIAO0@users.noreply.github.com> Co-authored-by: Steven sun <98230804+Tuyohai@users.noreply.github.com> Co-authored-by: steven <sunzwj@digitalchina.com> Co-authored-by: Kalo Chin <91766386+fdb02983rhy@users.noreply.github.com> Co-authored-by: Katy Tao <34019945+KatyTao@users.noreply.github.com> Co-authored-by: depy <42985524+h4ckdepy@users.noreply.github.com> Co-authored-by: 胡春东 <gycm520@gmail.com> Co-authored-by: Junjie.M <118170653@qq.com> Co-authored-by: MuYu <mr.muzea@gmail.com> Co-authored-by: Naoki Takashima <39912547+takatea@users.noreply.github.com> Co-authored-by: Summer-Gu <37869445+gubinjie@users.noreply.github.com> Co-authored-by: Fei He <droxer.he@gmail.com> Co-authored-by: ybalbert001 <120714773+ybalbert001@users.noreply.github.com> Co-authored-by: Yuanbo Li <ybalbert@amazon.com> Co-authored-by: douxc <7553076+douxc@users.noreply.github.com> Co-authored-by: liuzhenghua <1090179900@qq.com> Co-authored-by: Wu Jiayang <62842862+Wu-Jiayang@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: kimjion <45935338+kimjion@users.noreply.github.com> Co-authored-by: AugNSo <song.tiankai@icloud.com> Co-authored-by: llinvokerl <38915183+llinvokerl@users.noreply.github.com> Co-authored-by: liusurong.lsr <liusurong.lsr@alibaba-inc.com> Co-authored-by: Vasu Negi <vasu-negi@users.noreply.github.com> Co-authored-by: Hundredwz <1808096180@qq.com> Co-authored-by: Xiyuan Chen <52963600+GareArc@users.noreply.github.com>
8 months ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174
  1. import json
  2. import logging
  3. from argparse import ArgumentTypeError
  4. from collections.abc import Sequence
  5. from typing import Literal, cast
  6. from flask import request
  7. from flask_login import current_user
  8. from flask_restx import Resource, fields, marshal, marshal_with, reqparse
  9. from sqlalchemy import asc, desc, select
  10. from werkzeug.exceptions import Forbidden, NotFound
  11. import services
  12. from controllers.console import api, console_ns
  13. from controllers.console.app.error import (
  14. ProviderModelCurrentlyNotSupportError,
  15. ProviderNotInitializeError,
  16. ProviderQuotaExceededError,
  17. )
  18. from controllers.console.datasets.error import (
  19. ArchivedDocumentImmutableError,
  20. DocumentAlreadyFinishedError,
  21. DocumentIndexingError,
  22. IndexingEstimateError,
  23. InvalidActionError,
  24. InvalidMetadataError,
  25. )
  26. from controllers.console.wraps import (
  27. account_initialization_required,
  28. cloud_edition_billing_rate_limit_check,
  29. cloud_edition_billing_resource_check,
  30. setup_required,
  31. )
  32. from core.errors.error import (
  33. LLMBadRequestError,
  34. ModelCurrentlyNotSupportError,
  35. ProviderTokenNotInitError,
  36. QuotaExceededError,
  37. )
  38. from core.indexing_runner import IndexingRunner
  39. from core.model_manager import ModelManager
  40. from core.model_runtime.entities.model_entities import ModelType
  41. from core.model_runtime.errors.invoke import InvokeAuthorizationError
  42. from core.plugin.impl.exc import PluginDaemonClientSideError
  43. from core.rag.extractor.entity.datasource_type import DatasourceType
  44. from core.rag.extractor.entity.extract_setting import ExtractSetting
  45. from extensions.ext_database import db
  46. from fields.document_fields import (
  47. dataset_and_document_fields,
  48. document_fields,
  49. document_status_fields,
  50. document_with_segments_fields,
  51. )
  52. from libs.datetime_utils import naive_utc_now
  53. from libs.login import login_required
  54. from models import Dataset, DatasetProcessRule, Document, DocumentSegment, UploadFile
  55. from models.dataset import DocumentPipelineExecutionLog
  56. from services.dataset_service import DatasetService, DocumentService
  57. from services.entities.knowledge_entities.knowledge_entities import KnowledgeConfig
  58. logger = logging.getLogger(__name__)
  59. class DocumentResource(Resource):
  60. def get_document(self, dataset_id: str, document_id: str) -> Document:
  61. dataset = DatasetService.get_dataset(dataset_id)
  62. if not dataset:
  63. raise NotFound("Dataset not found.")
  64. try:
  65. DatasetService.check_dataset_permission(dataset, current_user)
  66. except services.errors.account.NoPermissionError as e:
  67. raise Forbidden(str(e))
  68. document = DocumentService.get_document(dataset_id, document_id)
  69. if not document:
  70. raise NotFound("Document not found.")
  71. if document.tenant_id != current_user.current_tenant_id:
  72. raise Forbidden("No permission.")
  73. return document
  74. def get_batch_documents(self, dataset_id: str, batch: str) -> Sequence[Document]:
  75. dataset = DatasetService.get_dataset(dataset_id)
  76. if not dataset:
  77. raise NotFound("Dataset not found.")
  78. try:
  79. DatasetService.check_dataset_permission(dataset, current_user)
  80. except services.errors.account.NoPermissionError as e:
  81. raise Forbidden(str(e))
  82. documents = DocumentService.get_batch_documents(dataset_id, batch)
  83. if not documents:
  84. raise NotFound("Documents not found.")
  85. return documents
  86. @console_ns.route("/datasets/process-rule")
  87. class GetProcessRuleApi(Resource):
  88. @api.doc("get_process_rule")
  89. @api.doc(description="Get dataset document processing rules")
  90. @api.doc(params={"document_id": "Document ID (optional)"})
  91. @api.response(200, "Process rules retrieved successfully")
  92. @setup_required
  93. @login_required
  94. @account_initialization_required
  95. def get(self):
  96. req_data = request.args
  97. document_id = req_data.get("document_id")
  98. # get default rules
  99. mode = DocumentService.DEFAULT_RULES["mode"]
  100. rules = DocumentService.DEFAULT_RULES["rules"]
  101. limits = DocumentService.DEFAULT_RULES["limits"]
  102. if document_id:
  103. # get the latest process rule
  104. document = db.get_or_404(Document, document_id)
  105. dataset = DatasetService.get_dataset(document.dataset_id)
  106. if not dataset:
  107. raise NotFound("Dataset not found.")
  108. try:
  109. DatasetService.check_dataset_permission(dataset, current_user)
  110. except services.errors.account.NoPermissionError as e:
  111. raise Forbidden(str(e))
  112. # get the latest process rule
  113. dataset_process_rule = (
  114. db.session.query(DatasetProcessRule)
  115. .where(DatasetProcessRule.dataset_id == document.dataset_id)
  116. .order_by(DatasetProcessRule.created_at.desc())
  117. .limit(1)
  118. .one_or_none()
  119. )
  120. if dataset_process_rule:
  121. mode = dataset_process_rule.mode
  122. rules = dataset_process_rule.rules_dict
  123. return {"mode": mode, "rules": rules, "limits": limits}
  124. @console_ns.route("/datasets/<uuid:dataset_id>/documents")
  125. class DatasetDocumentListApi(Resource):
  126. @api.doc("get_dataset_documents")
  127. @api.doc(description="Get documents in a dataset")
  128. @api.doc(
  129. params={
  130. "dataset_id": "Dataset ID",
  131. "page": "Page number (default: 1)",
  132. "limit": "Number of items per page (default: 20)",
  133. "keyword": "Search keyword",
  134. "sort": "Sort order (default: -created_at)",
  135. "fetch": "Fetch full details (default: false)",
  136. }
  137. )
  138. @api.response(200, "Documents retrieved successfully")
  139. @setup_required
  140. @login_required
  141. @account_initialization_required
  142. def get(self, dataset_id):
  143. dataset_id = str(dataset_id)
  144. page = request.args.get("page", default=1, type=int)
  145. limit = request.args.get("limit", default=20, type=int)
  146. search = request.args.get("keyword", default=None, type=str)
  147. sort = request.args.get("sort", default="-created_at", type=str)
  148. # "yes", "true", "t", "y", "1" convert to True, while others convert to False.
  149. try:
  150. fetch_val = request.args.get("fetch", default="false")
  151. if isinstance(fetch_val, bool):
  152. fetch = fetch_val
  153. else:
  154. if fetch_val.lower() in ("yes", "true", "t", "y", "1"):
  155. fetch = True
  156. elif fetch_val.lower() in ("no", "false", "f", "n", "0"):
  157. fetch = False
  158. else:
  159. raise ArgumentTypeError(
  160. f"Truthy value expected: got {fetch_val} but expected one of yes/no, true/false, t/f, y/n, 1/0 "
  161. f"(case insensitive)."
  162. )
  163. except (ArgumentTypeError, ValueError, Exception):
  164. fetch = False
  165. dataset = DatasetService.get_dataset(dataset_id)
  166. if not dataset:
  167. raise NotFound("Dataset not found.")
  168. try:
  169. DatasetService.check_dataset_permission(dataset, current_user)
  170. except services.errors.account.NoPermissionError as e:
  171. raise Forbidden(str(e))
  172. query = select(Document).filter_by(dataset_id=str(dataset_id), tenant_id=current_user.current_tenant_id)
  173. if search:
  174. search = f"%{search}%"
  175. query = query.where(Document.name.like(search))
  176. if sort.startswith("-"):
  177. sort_logic = desc
  178. sort = sort[1:]
  179. else:
  180. sort_logic = asc
  181. if sort == "hit_count":
  182. sub_query = (
  183. db.select(DocumentSegment.document_id, db.func.sum(DocumentSegment.hit_count).label("total_hit_count"))
  184. .group_by(DocumentSegment.document_id)
  185. .subquery()
  186. )
  187. query = query.outerjoin(sub_query, sub_query.c.document_id == Document.id).order_by(
  188. sort_logic(db.func.coalesce(sub_query.c.total_hit_count, 0)),
  189. sort_logic(Document.position),
  190. )
  191. elif sort == "created_at":
  192. query = query.order_by(
  193. sort_logic(Document.created_at),
  194. sort_logic(Document.position),
  195. )
  196. else:
  197. query = query.order_by(
  198. desc(Document.created_at),
  199. desc(Document.position),
  200. )
  201. paginated_documents = db.paginate(select=query, page=page, per_page=limit, max_per_page=100, error_out=False)
  202. documents = paginated_documents.items
  203. if fetch:
  204. for document in documents:
  205. completed_segments = (
  206. db.session.query(DocumentSegment)
  207. .where(
  208. DocumentSegment.completed_at.isnot(None),
  209. DocumentSegment.document_id == str(document.id),
  210. DocumentSegment.status != "re_segment",
  211. )
  212. .count()
  213. )
  214. total_segments = (
  215. db.session.query(DocumentSegment)
  216. .where(DocumentSegment.document_id == str(document.id), DocumentSegment.status != "re_segment")
  217. .count()
  218. )
  219. document.completed_segments = completed_segments
  220. document.total_segments = total_segments
  221. data = marshal(documents, document_with_segments_fields)
  222. else:
  223. data = marshal(documents, document_fields)
  224. response = {
  225. "data": data,
  226. "has_more": len(documents) == limit,
  227. "limit": limit,
  228. "total": paginated_documents.total,
  229. "page": page,
  230. }
  231. return response
  232. @setup_required
  233. @login_required
  234. @account_initialization_required
  235. @marshal_with(dataset_and_document_fields)
  236. @cloud_edition_billing_resource_check("vector_space")
  237. @cloud_edition_billing_rate_limit_check("knowledge")
  238. def post(self, dataset_id):
  239. dataset_id = str(dataset_id)
  240. dataset = DatasetService.get_dataset(dataset_id)
  241. if not dataset:
  242. raise NotFound("Dataset not found.")
  243. # The role of the current user in the ta table must be admin, owner, or editor
  244. if not current_user.is_dataset_editor:
  245. raise Forbidden()
  246. try:
  247. DatasetService.check_dataset_permission(dataset, current_user)
  248. except services.errors.account.NoPermissionError as e:
  249. raise Forbidden(str(e))
  250. parser = reqparse.RequestParser()
  251. parser.add_argument(
  252. "indexing_technique", type=str, choices=Dataset.INDEXING_TECHNIQUE_LIST, nullable=False, location="json"
  253. )
  254. parser.add_argument("data_source", type=dict, required=False, location="json")
  255. parser.add_argument("process_rule", type=dict, required=False, location="json")
  256. parser.add_argument("duplicate", type=bool, default=True, nullable=False, location="json")
  257. parser.add_argument("original_document_id", type=str, required=False, location="json")
  258. parser.add_argument("doc_form", type=str, default="text_model", required=False, nullable=False, location="json")
  259. parser.add_argument("retrieval_model", type=dict, required=False, nullable=False, location="json")
  260. parser.add_argument("embedding_model", type=str, required=False, nullable=True, location="json")
  261. parser.add_argument("embedding_model_provider", type=str, required=False, nullable=True, location="json")
  262. parser.add_argument(
  263. "doc_language", type=str, default="English", required=False, nullable=False, location="json"
  264. )
  265. args = parser.parse_args()
  266. knowledge_config = KnowledgeConfig(**args)
  267. if not dataset.indexing_technique and not knowledge_config.indexing_technique:
  268. raise ValueError("indexing_technique is required.")
  269. # validate args
  270. DocumentService.document_create_args_validate(knowledge_config)
  271. try:
  272. documents, batch = DocumentService.save_document_with_dataset_id(dataset, knowledge_config, current_user)
  273. dataset = DatasetService.get_dataset(dataset_id)
  274. except ProviderTokenNotInitError as ex:
  275. raise ProviderNotInitializeError(ex.description)
  276. except QuotaExceededError:
  277. raise ProviderQuotaExceededError()
  278. except ModelCurrentlyNotSupportError:
  279. raise ProviderModelCurrentlyNotSupportError()
  280. return {"dataset": dataset, "documents": documents, "batch": batch}
  281. @setup_required
  282. @login_required
  283. @account_initialization_required
  284. @cloud_edition_billing_rate_limit_check("knowledge")
  285. def delete(self, dataset_id):
  286. dataset_id = str(dataset_id)
  287. dataset = DatasetService.get_dataset(dataset_id)
  288. if dataset is None:
  289. raise NotFound("Dataset not found.")
  290. # check user's model setting
  291. DatasetService.check_dataset_model_setting(dataset)
  292. try:
  293. document_ids = request.args.getlist("document_id")
  294. DocumentService.delete_documents(dataset, document_ids)
  295. except services.errors.document.DocumentIndexingError:
  296. raise DocumentIndexingError("Cannot delete document during indexing.")
  297. return {"result": "success"}, 204
  298. @console_ns.route("/datasets/init")
  299. class DatasetInitApi(Resource):
  300. @api.doc("init_dataset")
  301. @api.doc(description="Initialize dataset with documents")
  302. @api.expect(
  303. api.model(
  304. "DatasetInitRequest",
  305. {
  306. "upload_file_id": fields.String(required=True, description="Upload file ID"),
  307. "indexing_technique": fields.String(description="Indexing technique"),
  308. "process_rule": fields.Raw(description="Processing rules"),
  309. "data_source": fields.Raw(description="Data source configuration"),
  310. },
  311. )
  312. )
  313. @api.response(201, "Dataset initialized successfully", dataset_and_document_fields)
  314. @api.response(400, "Invalid request parameters")
  315. @setup_required
  316. @login_required
  317. @account_initialization_required
  318. @marshal_with(dataset_and_document_fields)
  319. @cloud_edition_billing_resource_check("vector_space")
  320. @cloud_edition_billing_rate_limit_check("knowledge")
  321. def post(self):
  322. # The role of the current user in the ta table must be admin, owner, dataset_operator, or editor
  323. if not current_user.is_dataset_editor:
  324. raise Forbidden()
  325. parser = reqparse.RequestParser()
  326. parser.add_argument(
  327. "indexing_technique",
  328. type=str,
  329. choices=Dataset.INDEXING_TECHNIQUE_LIST,
  330. required=True,
  331. nullable=False,
  332. location="json",
  333. )
  334. parser.add_argument("data_source", type=dict, required=True, nullable=True, location="json")
  335. parser.add_argument("process_rule", type=dict, required=True, nullable=True, location="json")
  336. parser.add_argument("doc_form", type=str, default="text_model", required=False, nullable=False, location="json")
  337. parser.add_argument(
  338. "doc_language", type=str, default="English", required=False, nullable=False, location="json"
  339. )
  340. parser.add_argument("retrieval_model", type=dict, required=False, nullable=False, location="json")
  341. parser.add_argument("embedding_model", type=str, required=False, nullable=True, location="json")
  342. parser.add_argument("embedding_model_provider", type=str, required=False, nullable=True, location="json")
  343. args = parser.parse_args()
  344. knowledge_config = KnowledgeConfig(**args)
  345. if knowledge_config.indexing_technique == "high_quality":
  346. if knowledge_config.embedding_model is None or knowledge_config.embedding_model_provider is None:
  347. raise ValueError("embedding model and embedding model provider are required for high quality indexing.")
  348. try:
  349. model_manager = ModelManager()
  350. model_manager.get_model_instance(
  351. tenant_id=current_user.current_tenant_id,
  352. provider=args["embedding_model_provider"],
  353. model_type=ModelType.TEXT_EMBEDDING,
  354. model=args["embedding_model"],
  355. )
  356. except InvokeAuthorizationError:
  357. raise ProviderNotInitializeError(
  358. "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
  359. )
  360. except ProviderTokenNotInitError as ex:
  361. raise ProviderNotInitializeError(ex.description)
  362. # validate args
  363. DocumentService.document_create_args_validate(knowledge_config)
  364. try:
  365. dataset, documents, batch = DocumentService.save_document_without_dataset_id(
  366. tenant_id=current_user.current_tenant_id, knowledge_config=knowledge_config, account=current_user
  367. )
  368. except ProviderTokenNotInitError as ex:
  369. raise ProviderNotInitializeError(ex.description)
  370. except QuotaExceededError:
  371. raise ProviderQuotaExceededError()
  372. except ModelCurrentlyNotSupportError:
  373. raise ProviderModelCurrentlyNotSupportError()
  374. response = {"dataset": dataset, "documents": documents, "batch": batch}
  375. return response
  376. @console_ns.route("/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/indexing-estimate")
  377. class DocumentIndexingEstimateApi(DocumentResource):
  378. @api.doc("estimate_document_indexing")
  379. @api.doc(description="Estimate document indexing cost")
  380. @api.doc(params={"dataset_id": "Dataset ID", "document_id": "Document ID"})
  381. @api.response(200, "Indexing estimate calculated successfully")
  382. @api.response(404, "Document not found")
  383. @api.response(400, "Document already finished")
  384. @setup_required
  385. @login_required
  386. @account_initialization_required
  387. def get(self, dataset_id, document_id):
  388. dataset_id = str(dataset_id)
  389. document_id = str(document_id)
  390. document = self.get_document(dataset_id, document_id)
  391. if document.indexing_status in {"completed", "error"}:
  392. raise DocumentAlreadyFinishedError()
  393. data_process_rule = document.dataset_process_rule
  394. data_process_rule_dict = data_process_rule.to_dict()
  395. response = {"tokens": 0, "total_price": 0, "currency": "USD", "total_segments": 0, "preview": []}
  396. if document.data_source_type == "upload_file":
  397. data_source_info = document.data_source_info_dict
  398. if data_source_info and "upload_file_id" in data_source_info:
  399. file_id = data_source_info["upload_file_id"]
  400. file = (
  401. db.session.query(UploadFile)
  402. .where(UploadFile.tenant_id == document.tenant_id, UploadFile.id == file_id)
  403. .first()
  404. )
  405. # raise error if file not found
  406. if not file:
  407. raise NotFound("File not found.")
  408. extract_setting = ExtractSetting(
  409. datasource_type=DatasourceType.FILE.value, upload_file=file, document_model=document.doc_form
  410. )
  411. indexing_runner = IndexingRunner()
  412. try:
  413. estimate_response = indexing_runner.indexing_estimate(
  414. current_user.current_tenant_id,
  415. [extract_setting],
  416. data_process_rule_dict,
  417. document.doc_form,
  418. "English",
  419. dataset_id,
  420. )
  421. return estimate_response.model_dump(), 200
  422. except LLMBadRequestError:
  423. raise ProviderNotInitializeError(
  424. "No Embedding Model available. Please configure a valid provider "
  425. "in the Settings -> Model Provider."
  426. )
  427. except ProviderTokenNotInitError as ex:
  428. raise ProviderNotInitializeError(ex.description)
  429. except PluginDaemonClientSideError as ex:
  430. raise ProviderNotInitializeError(ex.description)
  431. except Exception as e:
  432. raise IndexingEstimateError(str(e))
  433. return response, 200
  434. @console_ns.route("/datasets/<uuid:dataset_id>/batch/<string:batch>/indexing-estimate")
  435. class DocumentBatchIndexingEstimateApi(DocumentResource):
  436. @setup_required
  437. @login_required
  438. @account_initialization_required
  439. def get(self, dataset_id, batch):
  440. dataset_id = str(dataset_id)
  441. batch = str(batch)
  442. documents = self.get_batch_documents(dataset_id, batch)
  443. if not documents:
  444. return {"tokens": 0, "total_price": 0, "currency": "USD", "total_segments": 0, "preview": []}, 200
  445. data_process_rule = documents[0].dataset_process_rule
  446. data_process_rule_dict = data_process_rule.to_dict()
  447. extract_settings = []
  448. for document in documents:
  449. if document.indexing_status in {"completed", "error"}:
  450. raise DocumentAlreadyFinishedError()
  451. data_source_info = document.data_source_info_dict
  452. if document.data_source_type == "upload_file":
  453. if not data_source_info:
  454. continue
  455. file_id = data_source_info["upload_file_id"]
  456. file_detail = (
  457. db.session.query(UploadFile)
  458. .where(UploadFile.tenant_id == current_user.current_tenant_id, UploadFile.id == file_id)
  459. .first()
  460. )
  461. if file_detail is None:
  462. raise NotFound("File not found.")
  463. extract_setting = ExtractSetting(
  464. datasource_type=DatasourceType.FILE.value, upload_file=file_detail, document_model=document.doc_form
  465. )
  466. extract_settings.append(extract_setting)
  467. elif document.data_source_type == "notion_import":
  468. if not data_source_info:
  469. continue
  470. extract_setting = ExtractSetting(
  471. datasource_type=DatasourceType.NOTION.value,
  472. notion_info={
  473. "credential_id": data_source_info["credential_id"],
  474. "notion_workspace_id": data_source_info["notion_workspace_id"],
  475. "notion_obj_id": data_source_info["notion_page_id"],
  476. "notion_page_type": data_source_info["type"],
  477. "tenant_id": current_user.current_tenant_id,
  478. },
  479. document_model=document.doc_form,
  480. )
  481. extract_settings.append(extract_setting)
  482. elif document.data_source_type == "website_crawl":
  483. if not data_source_info:
  484. continue
  485. extract_setting = ExtractSetting(
  486. datasource_type=DatasourceType.WEBSITE.value,
  487. website_info={
  488. "provider": data_source_info["provider"],
  489. "job_id": data_source_info["job_id"],
  490. "url": data_source_info["url"],
  491. "tenant_id": current_user.current_tenant_id,
  492. "mode": data_source_info["mode"],
  493. "only_main_content": data_source_info["only_main_content"],
  494. },
  495. document_model=document.doc_form,
  496. )
  497. extract_settings.append(extract_setting)
  498. else:
  499. raise ValueError("Data source type not support")
  500. indexing_runner = IndexingRunner()
  501. try:
  502. response = indexing_runner.indexing_estimate(
  503. current_user.current_tenant_id,
  504. extract_settings,
  505. data_process_rule_dict,
  506. document.doc_form,
  507. "English",
  508. dataset_id,
  509. )
  510. return response.model_dump(), 200
  511. except LLMBadRequestError:
  512. raise ProviderNotInitializeError(
  513. "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
  514. )
  515. except ProviderTokenNotInitError as ex:
  516. raise ProviderNotInitializeError(ex.description)
  517. except PluginDaemonClientSideError as ex:
  518. raise ProviderNotInitializeError(ex.description)
  519. except Exception as e:
  520. raise IndexingEstimateError(str(e))
  521. @console_ns.route("/datasets/<uuid:dataset_id>/batch/<string:batch>/indexing-status")
  522. class DocumentBatchIndexingStatusApi(DocumentResource):
  523. @setup_required
  524. @login_required
  525. @account_initialization_required
  526. def get(self, dataset_id, batch):
  527. dataset_id = str(dataset_id)
  528. batch = str(batch)
  529. documents = self.get_batch_documents(dataset_id, batch)
  530. documents_status = []
  531. for document in documents:
  532. completed_segments = (
  533. db.session.query(DocumentSegment)
  534. .where(
  535. DocumentSegment.completed_at.isnot(None),
  536. DocumentSegment.document_id == str(document.id),
  537. DocumentSegment.status != "re_segment",
  538. )
  539. .count()
  540. )
  541. total_segments = (
  542. db.session.query(DocumentSegment)
  543. .where(DocumentSegment.document_id == str(document.id), DocumentSegment.status != "re_segment")
  544. .count()
  545. )
  546. # Create a dictionary with document attributes and additional fields
  547. document_dict = {
  548. "id": document.id,
  549. "indexing_status": "paused" if document.is_paused else document.indexing_status,
  550. "processing_started_at": document.processing_started_at,
  551. "parsing_completed_at": document.parsing_completed_at,
  552. "cleaning_completed_at": document.cleaning_completed_at,
  553. "splitting_completed_at": document.splitting_completed_at,
  554. "completed_at": document.completed_at,
  555. "paused_at": document.paused_at,
  556. "error": document.error,
  557. "stopped_at": document.stopped_at,
  558. "completed_segments": completed_segments,
  559. "total_segments": total_segments,
  560. }
  561. documents_status.append(marshal(document_dict, document_status_fields))
  562. data = {"data": documents_status}
  563. return data
  564. @console_ns.route("/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/indexing-status")
  565. class DocumentIndexingStatusApi(DocumentResource):
  566. @api.doc("get_document_indexing_status")
  567. @api.doc(description="Get document indexing status")
  568. @api.doc(params={"dataset_id": "Dataset ID", "document_id": "Document ID"})
  569. @api.response(200, "Indexing status retrieved successfully")
  570. @api.response(404, "Document not found")
  571. @setup_required
  572. @login_required
  573. @account_initialization_required
  574. def get(self, dataset_id, document_id):
  575. dataset_id = str(dataset_id)
  576. document_id = str(document_id)
  577. document = self.get_document(dataset_id, document_id)
  578. completed_segments = (
  579. db.session.query(DocumentSegment)
  580. .where(
  581. DocumentSegment.completed_at.isnot(None),
  582. DocumentSegment.document_id == str(document_id),
  583. DocumentSegment.status != "re_segment",
  584. )
  585. .count()
  586. )
  587. total_segments = (
  588. db.session.query(DocumentSegment)
  589. .where(DocumentSegment.document_id == str(document_id), DocumentSegment.status != "re_segment")
  590. .count()
  591. )
  592. # Create a dictionary with document attributes and additional fields
  593. document_dict = {
  594. "id": document.id,
  595. "indexing_status": "paused" if document.is_paused else document.indexing_status,
  596. "processing_started_at": document.processing_started_at,
  597. "parsing_completed_at": document.parsing_completed_at,
  598. "cleaning_completed_at": document.cleaning_completed_at,
  599. "splitting_completed_at": document.splitting_completed_at,
  600. "completed_at": document.completed_at,
  601. "paused_at": document.paused_at,
  602. "error": document.error,
  603. "stopped_at": document.stopped_at,
  604. "completed_segments": completed_segments,
  605. "total_segments": total_segments,
  606. }
  607. return marshal(document_dict, document_status_fields)
  608. @console_ns.route("/datasets/<uuid:dataset_id>/documents/<uuid:document_id>")
  609. class DocumentApi(DocumentResource):
  610. METADATA_CHOICES = {"all", "only", "without"}
  611. @api.doc("get_document")
  612. @api.doc(description="Get document details")
  613. @api.doc(
  614. params={
  615. "dataset_id": "Dataset ID",
  616. "document_id": "Document ID",
  617. "metadata": "Metadata inclusion (all/only/without)",
  618. }
  619. )
  620. @api.response(200, "Document retrieved successfully")
  621. @api.response(404, "Document not found")
  622. @setup_required
  623. @login_required
  624. @account_initialization_required
  625. def get(self, dataset_id, document_id):
  626. dataset_id = str(dataset_id)
  627. document_id = str(document_id)
  628. document = self.get_document(dataset_id, document_id)
  629. metadata = request.args.get("metadata", "all")
  630. if metadata not in self.METADATA_CHOICES:
  631. raise InvalidMetadataError(f"Invalid metadata value: {metadata}")
  632. if metadata == "only":
  633. response = {"id": document.id, "doc_type": document.doc_type, "doc_metadata": document.doc_metadata_details}
  634. elif metadata == "without":
  635. dataset_process_rules = DatasetService.get_process_rules(dataset_id)
  636. document_process_rules = document.dataset_process_rule.to_dict() if document.dataset_process_rule else {}
  637. data_source_info = document.data_source_detail_dict
  638. response = {
  639. "id": document.id,
  640. "position": document.position,
  641. "data_source_type": document.data_source_type,
  642. "data_source_info": data_source_info,
  643. "dataset_process_rule_id": document.dataset_process_rule_id,
  644. "dataset_process_rule": dataset_process_rules,
  645. "document_process_rule": document_process_rules,
  646. "name": document.name,
  647. "created_from": document.created_from,
  648. "created_by": document.created_by,
  649. "created_at": document.created_at.timestamp(),
  650. "tokens": document.tokens,
  651. "indexing_status": document.indexing_status,
  652. "completed_at": int(document.completed_at.timestamp()) if document.completed_at else None,
  653. "updated_at": int(document.updated_at.timestamp()) if document.updated_at else None,
  654. "indexing_latency": document.indexing_latency,
  655. "error": document.error,
  656. "enabled": document.enabled,
  657. "disabled_at": int(document.disabled_at.timestamp()) if document.disabled_at else None,
  658. "disabled_by": document.disabled_by,
  659. "archived": document.archived,
  660. "segment_count": document.segment_count,
  661. "average_segment_length": document.average_segment_length,
  662. "hit_count": document.hit_count,
  663. "display_status": document.display_status,
  664. "doc_form": document.doc_form,
  665. "doc_language": document.doc_language,
  666. }
  667. else:
  668. dataset_process_rules = DatasetService.get_process_rules(dataset_id)
  669. document_process_rules = document.dataset_process_rule.to_dict()
  670. data_source_info = document.data_source_detail_dict
  671. response = {
  672. "id": document.id,
  673. "position": document.position,
  674. "data_source_type": document.data_source_type,
  675. "data_source_info": data_source_info,
  676. "dataset_process_rule_id": document.dataset_process_rule_id,
  677. "dataset_process_rule": dataset_process_rules,
  678. "document_process_rule": document_process_rules,
  679. "name": document.name,
  680. "created_from": document.created_from,
  681. "created_by": document.created_by,
  682. "created_at": document.created_at.timestamp(),
  683. "tokens": document.tokens,
  684. "indexing_status": document.indexing_status,
  685. "completed_at": int(document.completed_at.timestamp()) if document.completed_at else None,
  686. "updated_at": int(document.updated_at.timestamp()) if document.updated_at else None,
  687. "indexing_latency": document.indexing_latency,
  688. "error": document.error,
  689. "enabled": document.enabled,
  690. "disabled_at": int(document.disabled_at.timestamp()) if document.disabled_at else None,
  691. "disabled_by": document.disabled_by,
  692. "archived": document.archived,
  693. "doc_type": document.doc_type,
  694. "doc_metadata": document.doc_metadata_details,
  695. "segment_count": document.segment_count,
  696. "average_segment_length": document.average_segment_length,
  697. "hit_count": document.hit_count,
  698. "display_status": document.display_status,
  699. "doc_form": document.doc_form,
  700. "doc_language": document.doc_language,
  701. }
  702. return response, 200
  703. @setup_required
  704. @login_required
  705. @account_initialization_required
  706. @cloud_edition_billing_rate_limit_check("knowledge")
  707. def delete(self, dataset_id, document_id):
  708. dataset_id = str(dataset_id)
  709. document_id = str(document_id)
  710. dataset = DatasetService.get_dataset(dataset_id)
  711. if dataset is None:
  712. raise NotFound("Dataset not found.")
  713. # check user's model setting
  714. DatasetService.check_dataset_model_setting(dataset)
  715. document = self.get_document(dataset_id, document_id)
  716. try:
  717. DocumentService.delete_document(document)
  718. except services.errors.document.DocumentIndexingError:
  719. raise DocumentIndexingError("Cannot delete document during indexing.")
  720. return {"result": "success"}, 204
  721. @console_ns.route("/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/processing/<string:action>")
  722. class DocumentProcessingApi(DocumentResource):
  723. @api.doc("update_document_processing")
  724. @api.doc(description="Update document processing status (pause/resume)")
  725. @api.doc(
  726. params={"dataset_id": "Dataset ID", "document_id": "Document ID", "action": "Action to perform (pause/resume)"}
  727. )
  728. @api.response(200, "Processing status updated successfully")
  729. @api.response(404, "Document not found")
  730. @api.response(400, "Invalid action")
  731. @setup_required
  732. @login_required
  733. @account_initialization_required
  734. @cloud_edition_billing_rate_limit_check("knowledge")
  735. def patch(self, dataset_id, document_id, action: Literal["pause", "resume"]):
  736. dataset_id = str(dataset_id)
  737. document_id = str(document_id)
  738. document = self.get_document(dataset_id, document_id)
  739. # The role of the current user in the ta table must be admin, owner, dataset_operator, or editor
  740. if not current_user.is_dataset_editor:
  741. raise Forbidden()
  742. if action == "pause":
  743. if document.indexing_status != "indexing":
  744. raise InvalidActionError("Document not in indexing state.")
  745. document.paused_by = current_user.id
  746. document.paused_at = naive_utc_now()
  747. document.is_paused = True
  748. db.session.commit()
  749. elif action == "resume":
  750. if document.indexing_status not in {"paused", "error"}:
  751. raise InvalidActionError("Document not in paused or error state.")
  752. document.paused_by = None
  753. document.paused_at = None
  754. document.is_paused = False
  755. db.session.commit()
  756. return {"result": "success"}, 200
  757. @console_ns.route("/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/metadata")
  758. class DocumentMetadataApi(DocumentResource):
  759. @api.doc("update_document_metadata")
  760. @api.doc(description="Update document metadata")
  761. @api.doc(params={"dataset_id": "Dataset ID", "document_id": "Document ID"})
  762. @api.expect(
  763. api.model(
  764. "UpdateDocumentMetadataRequest",
  765. {
  766. "doc_type": fields.String(description="Document type"),
  767. "doc_metadata": fields.Raw(description="Document metadata"),
  768. },
  769. )
  770. )
  771. @api.response(200, "Document metadata updated successfully")
  772. @api.response(404, "Document not found")
  773. @api.response(403, "Permission denied")
  774. @setup_required
  775. @login_required
  776. @account_initialization_required
  777. def put(self, dataset_id, document_id):
  778. dataset_id = str(dataset_id)
  779. document_id = str(document_id)
  780. document = self.get_document(dataset_id, document_id)
  781. req_data = request.get_json()
  782. doc_type = req_data.get("doc_type")
  783. doc_metadata = req_data.get("doc_metadata")
  784. # The role of the current user in the ta table must be admin, owner, dataset_operator, or editor
  785. if not current_user.is_dataset_editor:
  786. raise Forbidden()
  787. if doc_type is None or doc_metadata is None:
  788. raise ValueError("Both doc_type and doc_metadata must be provided.")
  789. if doc_type not in DocumentService.DOCUMENT_METADATA_SCHEMA:
  790. raise ValueError("Invalid doc_type.")
  791. if not isinstance(doc_metadata, dict):
  792. raise ValueError("doc_metadata must be a dictionary.")
  793. metadata_schema: dict = cast(dict, DocumentService.DOCUMENT_METADATA_SCHEMA[doc_type])
  794. document.doc_metadata = {}
  795. if doc_type == "others":
  796. document.doc_metadata = doc_metadata
  797. else:
  798. for key, value_type in metadata_schema.items():
  799. value = doc_metadata.get(key)
  800. if value is not None and isinstance(value, value_type):
  801. document.doc_metadata[key] = value
  802. document.doc_type = doc_type
  803. document.updated_at = naive_utc_now()
  804. db.session.commit()
  805. return {"result": "success", "message": "Document metadata updated."}, 200
  806. @console_ns.route("/datasets/<uuid:dataset_id>/documents/status/<string:action>/batch")
  807. class DocumentStatusApi(DocumentResource):
  808. @setup_required
  809. @login_required
  810. @account_initialization_required
  811. @cloud_edition_billing_resource_check("vector_space")
  812. @cloud_edition_billing_rate_limit_check("knowledge")
  813. def patch(self, dataset_id, action: Literal["enable", "disable", "archive", "un_archive"]):
  814. dataset_id = str(dataset_id)
  815. dataset = DatasetService.get_dataset(dataset_id)
  816. if dataset is None:
  817. raise NotFound("Dataset not found.")
  818. # The role of the current user in the ta table must be admin, owner, or editor
  819. if not current_user.is_dataset_editor:
  820. raise Forbidden()
  821. # check user's model setting
  822. DatasetService.check_dataset_model_setting(dataset)
  823. # check user's permission
  824. DatasetService.check_dataset_permission(dataset, current_user)
  825. document_ids = request.args.getlist("document_id")
  826. try:
  827. DocumentService.batch_update_document_status(dataset, document_ids, action, current_user)
  828. except services.errors.document.DocumentIndexingError as e:
  829. raise InvalidActionError(str(e))
  830. except ValueError as e:
  831. raise InvalidActionError(str(e))
  832. except NotFound as e:
  833. raise NotFound(str(e))
  834. return {"result": "success"}, 200
  835. @console_ns.route("/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/processing/pause")
  836. class DocumentPauseApi(DocumentResource):
  837. @setup_required
  838. @login_required
  839. @account_initialization_required
  840. @cloud_edition_billing_rate_limit_check("knowledge")
  841. def patch(self, dataset_id, document_id):
  842. """pause document."""
  843. dataset_id = str(dataset_id)
  844. document_id = str(document_id)
  845. dataset = DatasetService.get_dataset(dataset_id)
  846. if not dataset:
  847. raise NotFound("Dataset not found.")
  848. document = DocumentService.get_document(dataset.id, document_id)
  849. # 404 if document not found
  850. if document is None:
  851. raise NotFound("Document Not Exists.")
  852. # 403 if document is archived
  853. if DocumentService.check_archived(document):
  854. raise ArchivedDocumentImmutableError()
  855. try:
  856. # pause document
  857. DocumentService.pause_document(document)
  858. except services.errors.document.DocumentIndexingError:
  859. raise DocumentIndexingError("Cannot pause completed document.")
  860. return {"result": "success"}, 204
  861. @console_ns.route("/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/processing/resume")
  862. class DocumentRecoverApi(DocumentResource):
  863. @setup_required
  864. @login_required
  865. @account_initialization_required
  866. @cloud_edition_billing_rate_limit_check("knowledge")
  867. def patch(self, dataset_id, document_id):
  868. """recover document."""
  869. dataset_id = str(dataset_id)
  870. document_id = str(document_id)
  871. dataset = DatasetService.get_dataset(dataset_id)
  872. if not dataset:
  873. raise NotFound("Dataset not found.")
  874. document = DocumentService.get_document(dataset.id, document_id)
  875. # 404 if document not found
  876. if document is None:
  877. raise NotFound("Document Not Exists.")
  878. # 403 if document is archived
  879. if DocumentService.check_archived(document):
  880. raise ArchivedDocumentImmutableError()
  881. try:
  882. # pause document
  883. DocumentService.recover_document(document)
  884. except services.errors.document.DocumentIndexingError:
  885. raise DocumentIndexingError("Document is not in paused status.")
  886. return {"result": "success"}, 204
  887. @console_ns.route("/datasets/<uuid:dataset_id>/retry")
  888. class DocumentRetryApi(DocumentResource):
  889. @setup_required
  890. @login_required
  891. @account_initialization_required
  892. @cloud_edition_billing_rate_limit_check("knowledge")
  893. def post(self, dataset_id):
  894. """retry document."""
  895. parser = reqparse.RequestParser()
  896. parser.add_argument("document_ids", type=list, required=True, nullable=False, location="json")
  897. args = parser.parse_args()
  898. dataset_id = str(dataset_id)
  899. dataset = DatasetService.get_dataset(dataset_id)
  900. retry_documents = []
  901. if not dataset:
  902. raise NotFound("Dataset not found.")
  903. for document_id in args["document_ids"]:
  904. try:
  905. document_id = str(document_id)
  906. document = DocumentService.get_document(dataset.id, document_id)
  907. # 404 if document not found
  908. if document is None:
  909. raise NotFound("Document Not Exists.")
  910. # 403 if document is archived
  911. if DocumentService.check_archived(document):
  912. raise ArchivedDocumentImmutableError()
  913. # 400 if document is completed
  914. if document.indexing_status == "completed":
  915. raise DocumentAlreadyFinishedError()
  916. retry_documents.append(document)
  917. except Exception:
  918. logger.exception("Failed to retry document, document id: %s", document_id)
  919. continue
  920. # retry document
  921. DocumentService.retry_document(dataset_id, retry_documents)
  922. return {"result": "success"}, 204
  923. @console_ns.route("/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/rename")
  924. class DocumentRenameApi(DocumentResource):
  925. @setup_required
  926. @login_required
  927. @account_initialization_required
  928. @marshal_with(document_fields)
  929. def post(self, dataset_id, document_id):
  930. # The role of the current user in the ta table must be admin, owner, editor, or dataset_operator
  931. if not current_user.is_dataset_editor:
  932. raise Forbidden()
  933. dataset = DatasetService.get_dataset(dataset_id)
  934. DatasetService.check_dataset_operator_permission(current_user, dataset)
  935. parser = reqparse.RequestParser()
  936. parser.add_argument("name", type=str, required=True, nullable=False, location="json")
  937. args = parser.parse_args()
  938. try:
  939. document = DocumentService.rename_document(dataset_id, document_id, args["name"])
  940. except services.errors.document.DocumentIndexingError:
  941. raise DocumentIndexingError("Cannot delete document during indexing.")
  942. return document
  943. @console_ns.route("/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/website-sync")
  944. class WebsiteDocumentSyncApi(DocumentResource):
  945. @setup_required
  946. @login_required
  947. @account_initialization_required
  948. def get(self, dataset_id, document_id):
  949. """sync website document."""
  950. dataset_id = str(dataset_id)
  951. dataset = DatasetService.get_dataset(dataset_id)
  952. if not dataset:
  953. raise NotFound("Dataset not found.")
  954. document_id = str(document_id)
  955. document = DocumentService.get_document(dataset.id, document_id)
  956. if not document:
  957. raise NotFound("Document not found.")
  958. if document.tenant_id != current_user.current_tenant_id:
  959. raise Forbidden("No permission.")
  960. if document.data_source_type != "website_crawl":
  961. raise ValueError("Document is not a website document.")
  962. # 403 if document is archived
  963. if DocumentService.check_archived(document):
  964. raise ArchivedDocumentImmutableError()
  965. # sync document
  966. DocumentService.sync_website_document(dataset_id, document)
  967. return {"result": "success"}, 200
  968. class DocumentPipelineExecutionLogApi(DocumentResource):
  969. @setup_required
  970. @login_required
  971. @account_initialization_required
  972. def get(self, dataset_id, document_id):
  973. dataset_id = str(dataset_id)
  974. document_id = str(document_id)
  975. dataset = DatasetService.get_dataset(dataset_id)
  976. if not dataset:
  977. raise NotFound("Dataset not found.")
  978. document = DocumentService.get_document(dataset.id, document_id)
  979. if not document:
  980. raise NotFound("Document not found.")
  981. log = (
  982. db.session.query(DocumentPipelineExecutionLog)
  983. .filter_by(document_id=document_id)
  984. .order_by(DocumentPipelineExecutionLog.created_at.desc())
  985. .first()
  986. )
  987. if not log:
  988. return {
  989. "datasource_info": None,
  990. "datasource_type": None,
  991. "input_data": None,
  992. "datasource_node_id": None,
  993. }, 200
  994. return {
  995. "datasource_info": json.loads(log.datasource_info),
  996. "datasource_type": log.datasource_type,
  997. "input_data": log.input_data,
  998. "datasource_node_id": log.datasource_node_id,
  999. }, 200
  1000. api.add_resource(GetProcessRuleApi, "/datasets/process-rule")
  1001. api.add_resource(DatasetDocumentListApi, "/datasets/<uuid:dataset_id>/documents")
  1002. api.add_resource(DatasetInitApi, "/datasets/init")
  1003. api.add_resource(
  1004. DocumentIndexingEstimateApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/indexing-estimate"
  1005. )
  1006. api.add_resource(DocumentBatchIndexingEstimateApi, "/datasets/<uuid:dataset_id>/batch/<string:batch>/indexing-estimate")
  1007. api.add_resource(DocumentBatchIndexingStatusApi, "/datasets/<uuid:dataset_id>/batch/<string:batch>/indexing-status")
  1008. api.add_resource(DocumentIndexingStatusApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/indexing-status")
  1009. api.add_resource(DocumentApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>")
  1010. api.add_resource(
  1011. DocumentProcessingApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/processing/<string:action>"
  1012. )
  1013. api.add_resource(DocumentMetadataApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/metadata")
  1014. api.add_resource(DocumentStatusApi, "/datasets/<uuid:dataset_id>/documents/status/<string:action>/batch")
  1015. api.add_resource(DocumentPauseApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/processing/pause")
  1016. api.add_resource(DocumentRecoverApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/processing/resume")
  1017. api.add_resource(DocumentRetryApi, "/datasets/<uuid:dataset_id>/retry")
  1018. api.add_resource(DocumentRenameApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/rename")
  1019. api.add_resource(WebsiteDocumentSyncApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/website-sync")
  1020. api.add_resource(
  1021. DocumentPipelineExecutionLogApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/pipeline-execution-log"
  1022. )